Special Issue "Advanced Intelligent Imaging Technology"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 March 2019

Special Issue Editor

Guest Editor
Prof. Joonki Paik

Department of Image, Graduate School of Advanced Imaging Science,Chung-Ang University, Seoul 156-756, Korea
Website | E-Mail
Phone: +82-10-7123-6846
Interests: image enhancement and restoration; computational imaging; intelligent surveillance systems

Special Issue Information

Dear Colleagues,

A general pipeline of visual information processing includes: i) image sensing and acquisition, ii) pre-processing, iii) feature detection or metric estimation, and iv) high-level decision. State-of-the-art artificial intelligence technology caused a quantum leap in performance improvements to each step of visual information processing.

Artificial intelligence-based image signal processing (ISP) technology can drastically enhance the acquired digital images through demosaicing, denoising, deblurring, super resolution, and the wide dynamic range using deep neural networks. Feature detection and image analyses are the most popular application areas of artificial intelligence. An intelligent imaging system can solve various problems that are unsolvable without using the intelligence or learning.

An objective of this Special Issue is to highlight innovative developments of intelligent imaging technology related with various state-of-the-art image acquisition, pre-processing, feature detection, and image analysis using machine learning and artificial intelligence. In addition, any applications that combine two or more intelligent imaging methods are another important research area. Topics include, but are not limited to:

  • Computational photography for intelligent imaging
  • Visual inspection using machine learning and artificial intelligence
  • Depth estimation and three-dimensional analysis
  • Image processing and computer vision algorithms for advanced driver assistance systems (ADAS)
  • Wide-area, intelligent surveillance systems using multiple camera network
  • Advanced image signal processor (ISP) based on artificial intelligence
  • Deep neural networks for inverse imaging problems
  • Multiple camera collaboration based on reinforcement learning
  • Fusion of hybrid sensors for intelligent imaging systems

Prof. Joonki Paik
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1500 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep neural network (DNN)
  • artificial neural network (ANN)
  • intelligent surveillance systems
  • computational photography
  • computational imaging
  • image signal processor (ISP)
  • camera network
  • visual inspection...

Published Papers (43 papers)

View options order results:
result details:
Displaying articles 1-43
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Effective Crack Damage Detection Using Multilayer Sparse Feature Representation and Incremental Extreme Learning Machine
Appl. Sci. 2019, 9(3), 614; https://doi.org/10.3390/app9030614
Received: 16 December 2018 / Revised: 28 January 2019 / Accepted: 3 February 2019 / Published: 12 February 2019
PDF Full-text (2779 KB) | HTML Full-text | XML Full-text
Abstract
Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine [...] Read more.
Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine (ELM), which has both favorable feature learning and classification capabilities. Specifically, by cropping and using a sliding window operation and image rotation, a large number of crack and non-crack patches are obtained from the collected concrete images. With the existing image patches, the defect region features can be quickly calculated by the multilayer sparse ELM autoencoder networks. Then, the online incremental ELM classified network is used to recognize the crack defect features. Unlike the commonly-used deep learning-based methods, the presented ELM-based crack detection model can be trained efficiently without tediously fine-tuning the entire-network parameters. Moreover, according to the ELM theory, the proposed crack detector works universally for defect feature extraction and detection. In the experiments, when compared with other recently developed crack detectors, the proposed concrete crack detection model can offer outstanding training efficiency and favorable crack detecting accuracy. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Graphical abstract

Open AccessArticle A 3D Object Detection Based on Multi-Modality Sensors of USV
Appl. Sci. 2019, 9(3), 535; https://doi.org/10.3390/app9030535
Received: 30 December 2018 / Revised: 22 January 2019 / Accepted: 1 February 2019 / Published: 5 February 2019
PDF Full-text (2369 KB) | HTML Full-text | XML Full-text
Abstract
Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs [...] Read more.
Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs in complicated marine environment. We propose a 3D object detection Depth Neural Network based on multi-modality data of USVs. This model includes a modified Proposal Generation Network and Deep Fusion Detection Network. The Proposal Generation Network improves feature extraction. Meanwhile, the Deep Fusion Detection Network enhances the fusion performance and can achieve more accurate results of object detection. The model was tested on both the KITTI 3D object detection dataset (A project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago) and a self-collected offshore dataset. The model shows excellent performance in a small memory condition. The results further prove that the method based on deep learning can give good accuracy in conditions of complicated surface in marine environment. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle A Fast Sparse Coding Method for Image Classification
Appl. Sci. 2019, 9(3), 505; https://doi.org/10.3390/app9030505
Received: 12 December 2018 / Revised: 22 January 2019 / Accepted: 29 January 2019 / Published: 1 February 2019
PDF Full-text (377 KB) | HTML Full-text | XML Full-text
Abstract
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the [...] Read more.
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the mutual dependence among local features results in highly variable sparse codes even for similar features. To overcome the shortcomings of previous sparse coding algorithm, we present an image classification method, which replaces the sparse dictionary with a stable dictionary learned via low computational complexity clustering, more specifically, a k-medoids cluster method optimized by k-means++. The proposed method can reduce the learning complexity and improve the feature’s stability. In the experiments, we compared the effectiveness of our method with the existing ScSPM method and its improved versions. We evaluated our approach on two diverse datasets: Caltech-101 and UIUC-Sports. The results show that our method can increase the accuracy of spatial pyramid matching, which suggests that our method is capable of improving performance of sparse coding features. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Harbor Extraction Based on Edge-Preserve and Edge Categories in High Spatial Resolution Remote-Sensing Images
Appl. Sci. 2019, 9(3), 420; https://doi.org/10.3390/app9030420
Received: 18 December 2018 / Revised: 15 January 2019 / Accepted: 19 January 2019 / Published: 26 January 2019
PDF Full-text (6912 KB) | HTML Full-text | XML Full-text
Abstract
Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is [...] Read more.
Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is proposed for high spatial resolution remote-sensing images. In the preprocessing stage, we propose a local edge preservation algorithm (LEPA) to remove redundant details and reduce useless edges. After acquiring the local edge-preserve images, in order to reduce the redundant matched keypoints and improve the accuracy of the target candidate extraction method, we propose a scale-invariant feature transform (SIFT) keypoints extraction method based on edge categories (EC-SIFT): this method greatly reduces the redundancy of SIFT keypoint and improves the computational complexity of the target extraction system. Finally, the harbor extraction algorithm uses the Support Vector Machine (SVM) classifier to identify the harbor target. The experimental results show that the proposed algorithm effectively removes redundant details and improves the accuracy and efficiency of harbor target extraction. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Arabic Cursive Text Recognition from Natural Scene Images
Appl. Sci. 2019, 9(2), 236; https://doi.org/10.3390/app9020236
Received: 29 November 2018 / Revised: 26 December 2018 / Accepted: 31 December 2018 / Published: 10 January 2019
PDF Full-text (3908 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene [...] Read more.
This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle New Evolutionary-Based Techniques for Image Registration
Appl. Sci. 2019, 9(1), 176; https://doi.org/10.3390/app9010176
Received: 12 December 2018 / Revised: 26 December 2018 / Accepted: 29 December 2018 / Published: 5 January 2019
PDF Full-text (609 KB) | HTML Full-text | XML Full-text
Abstract
The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly [...] Read more.
The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly method that proved to have excellent accuracy and fair run times. In order to speed up the computation, we propose two variants of Accelerated Particle Swarm Optimization (APSO) method. The resulted algorithms are significantly faster than the firefly-based ones, but the recognition rates are a little bit lower. In order to find a trade-off between the recognition rate and the computational complexity of the algorithms, we developed a hybrid method that combines the ability of auto-adaptive Evolution Strategies (ES) search to discover a global optimum solution with the strong quick convergence ability of APSO. The accuracy and the efficiency of the resulted algorithms have been experimentally proved by conducting a long series of tests on various pairs of signature images. The comparative analysis concerning the quality of the proposed methods together with conclusions and suggestions for further developments are provided in the final part of the paper. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Using the Guided Fireworks Algorithm for Local Backlight Dimming
Appl. Sci. 2019, 9(1), 129; https://doi.org/10.3390/app9010129
Received: 24 November 2018 / Revised: 18 December 2018 / Accepted: 25 December 2018 / Published: 1 January 2019
PDF Full-text (6101 KB) | HTML Full-text | XML Full-text
Abstract
Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we [...] Read more.
Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we focused on improving the quality of the displayed image, and take local backlight dimming as an optimization problem. In order to better evaluate the image quality, we used the structural similarity (SSIM) index as the image quality evaluation method, and built the model for the local dimming problem. To solve this optimization problem, we designed the local dimming algorithm based on the Fireworks Algorithm (FWA), which is a new evolutionary computation (EC) algorithm. To further improve the solution quality, we introduced a guiding strategy into the FWA and proposed an improved algorithm named the Guided Fireworks Algorithm (GFWA). Experimental results showed that the GFWA had a higher performance in local backlight dimming compared with the Look-Up Table (LUT) algorithm, the Improved Shuffled Frog Leaping Algorithm (ISFLA), and the FWA. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Improvement in Classification Performance Based on Target Vector Modification for All-Transfer Deep Learning
Appl. Sci. 2019, 9(1), 128; https://doi.org/10.3390/app9010128
Received: 22 October 2018 / Revised: 29 November 2018 / Accepted: 26 December 2018 / Published: 1 January 2019
PDF Full-text (7180 KB) | HTML Full-text | XML Full-text
Abstract
This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not [...] Read more.
This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not available. Transfer learning can provide a solution to this problem. Previous methods regularize all layers, including the output layer, by estimating the relation vectors, which are then used instead of one-hot target vectors of the target domain. These vectors are estimated by averaging the target domain data of each target domain label in the output space. This method improves the classification performance, but it does not consider the relation between the relation vectors. From this point of view, we propose a relation vector modification based on constrained pairwise repulsive forces. High pairwise repulsive forces provide large distances between the relation vectors. In addition, the risk of divergence is mitigated by the constraint based on distributions of the output vectors of the target domain data. We apply our method to two simulation experiments and a disease classification using two-dimensional electrophoresis images. The experimental results show that reusing all layers through our estimation method is effective, especially for a significantly small number of the target domain data. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Graphical abstract

Open AccessArticle Accelerating Image Classification using Feature Map Similarity in Convolutional Neural Networks
Appl. Sci. 2019, 9(1), 108; https://doi.org/10.3390/app9010108
Received: 23 November 2018 / Revised: 18 December 2018 / Accepted: 24 December 2018 / Published: 29 December 2018
PDF Full-text (6702 KB) | HTML Full-text | XML Full-text
Abstract
Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, [...] Read more.
Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, which can classify input images based on similarity with previously input images. Because the feature maps extracted from the CNN kernel represent the intensity of features, images with a similar intensity can be classified into the same class. In this study, we cache class labels and feature vectors extracted from feature maps for images classified by the CNN. Then, when a new image is input, its class label is output based on its similarity with the cached feature vectors. This process can be performed at each layer; hence, if the classification is successful, there is no need to perform the remaining convolution layer operations. This reduces the required classification time. We performed experiments to measure and evaluate the cache hit rate, precision, and classification time. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle The Optical Barcode Detection and Recognition Method Based on Visible Light Communication Using Machine Learning
Appl. Sci. 2018, 8(12), 2425; https://doi.org/10.3390/app8122425
Received: 1 October 2018 / Revised: 2 November 2018 / Accepted: 13 November 2018 / Published: 29 November 2018
PDF Full-text (12877 KB) | HTML Full-text | XML Full-text
Abstract
Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on [...] Read more.
Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on VLC using machine learning is proposed. Different from traditional encoding and decoding VLC, we develop a new VLC system with a form of modulation and recognition. We create different features for different LEDs to make it an Optical Barcode (OBC) based on a Complementary Metal-Oxide-Semiconductor (CMOS) senor and a pulse-width modulation (PWM) method. The features are extracted using image processing and then support vector machine (SVM) and artificial neural networks (ANN) are introduced into the scheme, which are employed as a classifier. The experimental results show that the proposed method can provide a huge number of unique LED-IDs with a high LED-ID recognition rate and its performance in dark and distant conditions is significantly better than traditional Quick Response (QR) codes. This is the first time the VLC is used in the field of Internet of Things (IoT) and it is an innovative application of RGB-LED to create features. Furthermore, with the development of camera technology, the number of unique LED-IDs and the maximum identifiable distance would increase. Therefore, this scheme can be used as an effective complement to QR codes in the future. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle A New Rotor Position Measurement Method for Permanent Magnet Spherical Motors
Appl. Sci. 2018, 8(12), 2415; https://doi.org/10.3390/app8122415
Received: 24 October 2018 / Revised: 23 November 2018 / Accepted: 25 November 2018 / Published: 28 November 2018
PDF Full-text (7986 KB) | HTML Full-text | XML Full-text
Abstract
This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were [...] Read more.
This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were arranged in a straight line with different distances between them, which were formed as three optical feature points (OFPs). The images of the three OFPs acquired by a high-speed camera were used to calculate the rotor position of PMSMs in the world coordinate frame. An experimental platform was built to verify the effectiveness of the proposed RPM method. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle An Image Segmentation Method Based on Improved Regularized Level Set Model
Appl. Sci. 2018, 8(12), 2393; https://doi.org/10.3390/app8122393
Received: 18 October 2018 / Revised: 12 November 2018 / Accepted: 23 November 2018 / Published: 26 November 2018
Cited by 1 | PDF Full-text (4365 KB) | HTML Full-text | XML Full-text
Abstract
When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is [...] Read more.
When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is presented. First, a new potential function is defined and introduced to reconstruct a new distance regularization term to solve this issue of periodically initializing the level set function. Second, by combining the distance regularization term with the internal and external energy terms, a new energy functional is developed. Then, the process of the new energy functional evolution is derived by using the calculus of variations and the steepest descent approach, and a partial differential equation is designed. Finally, an improved regularized level set-based image segmentation (IRLS-IS) method is proposed. Numerical experimental results demonstrate that the IRLS-IS method is not only effective and robust to segment noise and intensity-inhomogeneous images but can also analyze complex medical images well. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Impulse Noise Denoising Using Total Variation with Overlapping Group Sparsity and Lp-Pseudo-Norm Shrinkage
Appl. Sci. 2018, 8(11), 2317; https://doi.org/10.3390/app8112317
Received: 16 October 2018 / Revised: 29 October 2018 / Accepted: 5 November 2018 / Published: 20 November 2018
Cited by 1 | PDF Full-text (4404 KB) | HTML Full-text | XML Full-text
Abstract
Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with [...] Read more.
Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with impulse noise, to suppress the staircase effect of the TV model and enhance the dissimilarity between smooth and edge regions. In the traditional TV model, the L1-norm is always used to describe the statistics characteristic of impulse noise. In this paper, the Lp-pseudo-norm regularization term is employed here to replace the L1-norm. The new model introduces another degree of freedom, which better describes the sparsity of the image and improves the denoising result. Under the accelerated alternating direction method of multipliers (ADMM) framework, Fourier transform technology is introduced to transform the matrix operation from the spatial domain to the frequency domain, which improves the efficiency of the algorithm. Our model concerns the sparsity of the difference domain in the image: the neighborhood difference of each point is fully utilized to augment the difference between the smooth and edge regions. Experimental results show that the peak signal-to-noise ratio, the structural similarity, the visual effect, and the computational efficiency of this new model are improved compared with state-of-the-art denoising methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Use of Gradient-Based Shadow Detection for Estimating Environmental Illumination Distribution
Appl. Sci. 2018, 8(11), 2255; https://doi.org/10.3390/app8112255
Received: 18 October 2018 / Revised: 10 November 2018 / Accepted: 12 November 2018 / Published: 15 November 2018
PDF Full-text (3208 KB) | HTML Full-text | XML Full-text
Abstract
Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented [...] Read more.
Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented reality (AR) marker, a cubic reference object of a known size, is employed. The geometric elements (the corners and sides) of the AR marker constitute the candidate’s shadow boundary; they are obtained on a flat surface according to the relationship between the camera and the candidate’s light sources. We can then extract the shadow regions by collecting the local features that support the candidate’s shadow boundary in the image. To further verify the shadows passed by the local features-based matching, we examine whether significant brightness changes occurred in the intersection region between the shadows. Our proposed method can reduce the unwanted effects caused by the threshold values during edge-based shadow detection, as well as those caused by the sampling position during point-based illumination estimation. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Minimum Barrier Distance-Based Object Descriptor for Visual Tracking
Appl. Sci. 2018, 8(11), 2233; https://doi.org/10.3390/app8112233
Received: 10 September 2018 / Revised: 24 October 2018 / Accepted: 30 October 2018 / Published: 13 November 2018
PDF Full-text (2740 KB) | HTML Full-text | XML Full-text
Abstract
In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose [...] Read more.
In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose a robust object descriptor for visual tracking in this paper. First, we decompose the bounding box into non-overlapping patches and extract the color and gradient histograms features for each patch. Second, we adopt the minimum barrier distance (MBD) to calculate patch weights. Specifically, we consider the boundary patches as the background seeds and calculate the MBD from each patch to the seed set as the weight of each patch since the weight calculated by MBD can represent the difference between each patch and the background more effectively. Finally, we impose the weight on the extracted feature to get the descriptor of each patch and then incorporate our MBD-based descriptor into the structured support vector machine algorithm for tracking. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Graphical abstract

Open AccessArticle An Improved Neural Network Cascade for Face Detection in Large Scene Surveillance
Appl. Sci. 2018, 8(11), 2222; https://doi.org/10.3390/app8112222
Received: 30 September 2018 / Revised: 8 November 2018 / Accepted: 8 November 2018 / Published: 11 November 2018
PDF Full-text (9562 KB) | HTML Full-text | XML Full-text
Abstract
Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are [...] Read more.
Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are usually at a low resolution and thus miss many facial details. In this paper, we improve popular cascade algorithms by proposing a novel multi-resolution framework that utilizes parallel convolutional neural network cascades for detecting faces in large scene. This framework utilizes the face and head-with-shoulder information together to deal with the large area surveillance images. Comparing with popular cascade algorithms, our method outperforms them by a large margin. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Deep Learning Case Study for Automatic Bird Identification
Appl. Sci. 2018, 8(11), 2089; https://doi.org/10.3390/app8112089
Received: 27 September 2018 / Revised: 22 October 2018 / Accepted: 23 October 2018 / Published: 29 October 2018
Cited by 1 | PDF Full-text (8694 KB) | HTML Full-text | XML Full-text
Abstract
An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system [...] Read more.
An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system for automatic bird identification consists of a radar, a motorized video head and a single-lens reflex camera with a telephoto lens. A convolutional neural network trained with a deep learning algorithm is applied to the image classification. We also propose a data augmentation method in which images are rotated and converted in accordance with the desired color temperatures. The final identification is based on a fusion of parameters provided by the radar and the predictions of the image classifier. The sensitivity of this proposed system, on a dataset containing 9312 manually taken original images resulting in 2.44 × 106 augmented data set, is 0.9463 as an image classifier. The area under receiver operating characteristic curve for two key bird species is 0.9993 (the White-tailed Eagle) and 0.9496 (The Lesser Black-backed Gull), respectively. We proposed a novel system for automatic bird identification as a real world application. We demonstrated that our data augmentation method is suitable for image classification problem and it significantly increases the performance of the classifier. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle An Image-Based Fall Detection System for the Elderly
Appl. Sci. 2018, 8(10), 1995; https://doi.org/10.3390/app8101995
Received: 7 September 2018 / Revised: 2 October 2018 / Accepted: 16 October 2018 / Published: 20 October 2018
PDF Full-text (14005 KB) | HTML Full-text | XML Full-text
Abstract
Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for [...] Read more.
Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for both elderly people and their caregivers. In this work, we designed an Image-based FAll Detection System (IFADS) for nursing homes, where public areas are usually equipped with surveillance cameras. Unlike existing fall detection algorithms, we mainly focused on falls that occur while sitting down and standing up from a chair, because the two activities together account for a higher proportion of falls than forward walking. IFADS first applies an object detection algorithm to identify people in a video frame. Then, a posture recognition method is used to keep tracking the status of the people by checking the relative positions of the chair and the people. An alarm is triggered when a fall is detected. In order to evaluate the effectiveness of IFADS, we not only simulated different fall scenarios, but also adopted YouTube and Giphy videos that captured real falls. Our experimental results showed that IFADS achieved an average accuracy of 95.96%. Therefore, IFADS can be used by nursing homes to improve the quality of residential care facilities. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle A New Cost Function Combining Deep Neural Networks (DNNs) and l2,1-Norm with Extraction of Robust Facial and Superpixels Features in Age Estimation
Appl. Sci. 2018, 8(10), 1943; https://doi.org/10.3390/app8101943
Received: 21 August 2018 / Revised: 16 September 2018 / Accepted: 25 September 2018 / Published: 16 October 2018
PDF Full-text (3578 KB) | HTML Full-text | XML Full-text
Abstract
Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to [...] Read more.
Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to select age-related features for the age estimation task. A new cost function is proposed. To learn and train the new model, we provide the analysis and the proof for the convergence of the new cost function to solve minimization problem of deep neural networks (DNNs) and the l2,1-norm. High-level features are extracted from the facial images by using transfer learning, since there are currently not enough large age databases that can be used to train a deep learning network. Then, the extracted features are fed to the proposed model to select the most efficient age-related features. In addition, a new system that is based on DNN to jointly fine-tune two different DNNs with two different feature sets is developed. Experimental results show the effectiveness of the proposed methods and achieved the state-of-art performance on a public database. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity
Appl. Sci. 2018, 8(10), 1924; https://doi.org/10.3390/app8101924
Received: 10 August 2018 / Revised: 30 September 2018 / Accepted: 9 October 2018 / Published: 15 October 2018
PDF Full-text (4724 KB) | HTML Full-text | XML Full-text
Abstract
Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional [...] Read more.
Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) improve the performance significantly in various computer vision tasks, including action detection. In this paper, we make the most of different granular classifiers and propose to detect action from fine to coarse granularity, which is also in line with the people’s detection habits. Our action detection method is built in the ‘proposal then classification’ framework. We employ several neural network architectures as deep information extractor and segment-level (fine granular) and window-level (coarse granular) classifiers. Each of the proposal and classification steps is executed from the segment to window level. The experimental results show that our method not only achieves detection performance that is comparable to that of state-of-the-art methods, but also has a relatively balanced performance for different action categories. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Large-Scale Fine-Grained Bird Recognition Based on a Triplet Network and Bilinear Model
Appl. Sci. 2018, 8(10), 1906; https://doi.org/10.3390/app8101906
Received: 7 September 2018 / Revised: 7 October 2018 / Accepted: 11 October 2018 / Published: 13 October 2018
PDF Full-text (4360 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be [...] Read more.
The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be trained in an end-to-end manner, which effectively increases the inter-class distance of the network extraction features and improves the accuracy of bird recognition. When experimentally tested on 1096 birds in a custom-built dataset and on Caltech-UCSD (a public bird dataset), the model achieved an accuracy of 88.91% and 85.58%, respectively. The experimental results confirm the high generalization ability of our model in fine-grained image classification. Moreover, our model requires no additional manual annotation information such as object-labeling frames and part-labeling points, which guarantees good versatility and robustness in fine-grained bird recognition. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle A Novel Lightweight Approach for Video Retrieval on Mobile Augmented Reality Environment
Appl. Sci. 2018, 8(10), 1860; https://doi.org/10.3390/app8101860
Received: 24 August 2018 / Revised: 21 September 2018 / Accepted: 4 October 2018 / Published: 10 October 2018
PDF Full-text (2167 KB) | HTML Full-text | XML Full-text
Abstract
Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and [...] Read more.
Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and interact in real-time. Furthermore, augmented reality based virtual objects can be poorly textured. In order to resolve the above mentioned issues, in this research, we propose a novel, fast and robust approach for retrieving videos on the mobile augmented reality environment using an image and video queries. In the beginning, Top-K key-frames are extracted from the videos which significantly increases the efficiency. Secondly, we introduce a novel frame based feature extraction method, namely Pyramid Ternary Histogram of Oriented Gradient (PTHOG) to extract the shape feature from the virtual objects in an effective and efficient manner. Thirdly, we utilize the Double-Bit Quantization (DBQ) based hashing to accomplish the nearest neighbor search efficiently, which produce the candidate list of videos. Lastly, the similarity measure is performed to re-rank the videos which are obtained from the candidate list. An extensive experimental analysis is performed in order to verify our claims. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle A Method for Singular Points Detection Based on Faster-RCNN
Appl. Sci. 2018, 8(10), 1853; https://doi.org/10.3390/app8101853
Received: 12 September 2018 / Revised: 28 September 2018 / Accepted: 2 October 2018 / Published: 9 October 2018
PDF Full-text (16777 KB) | HTML Full-text | XML Full-text
Abstract
Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network [...] Read more.
Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network method) is proposed, which is a two-step process, and an orientation constraint is added in Faster-RCNN to obtain orientation information of singular points. Besides, we designed a convolutional neural network (ConvNet) for singular points detection according to the characteristics of fingerprint images and the existing works. Specifically, the proposed method could extract singular points directly from raw fingerprint images without traditional preprocessing. Experimental results demonstrate the effectiveness of the proposed method. In comparison with other detection algorithms, our method achieves 96.03% detection rate for core points and 98.33% detection rate for delta points on FVC2002 DB1 dataset while 90.75% for core points and 94.87% on NIST SD4 dataset, which outperform other algorithms. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Necessary Morphological Patches Extraction for Automatic Micro-Expression Recognition
Appl. Sci. 2018, 8(10), 1811; https://doi.org/10.3390/app8101811
Received: 5 September 2018 / Revised: 23 September 2018 / Accepted: 28 September 2018 / Published: 3 October 2018
PDF Full-text (2480 KB) | HTML Full-text | XML Full-text
Abstract
Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only [...] Read more.
Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only 1/25 s. Furthermore, different from macro-expressions, micro-expressions have considerable low intensity and inadequate contraction of the facial muscles. Based on these characteristics, automatic micro-expression detection and recognition are great challenges in the field of computer vision. In this paper, we propose a novel automatic facial expression recognition framework based on necessary morphological patches (NMPs) to better detect and identify micro expressions. Micro expression is a subconscious facial muscle response. It is not controlled by the rational thought of the brain. Therefore, it calls on a few facial muscles and has local properties. NMPs are the facial regions that must be involved when a micro expression occurs. NMPs were screened based on weighting the facial active patches instead of the holistic utilization of the entire facial area. Firstly, we manually define the active facial patches according to the facial landmark coordinates and the facial action coding system (FACS). Secondly, we use a LBP-TOP descriptor to extract features in these patches and the Entropy-Weight method to select NMP. Finally, we obtain the weighted LBP-TOP features of these NMP. We test on two recent publicly available datasets: CASME II and SMIC database that provided sufficient samples. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Visualization and Interpretation of Convolutional Neural Network Predictions in Detecting Pneumonia in Pediatric Chest Radiographs
Appl. Sci. 2018, 8(10), 1715; https://doi.org/10.3390/app8101715
Received: 25 August 2018 / Revised: 17 September 2018 / Accepted: 18 September 2018 / Published: 20 September 2018
Cited by 1 | PDF Full-text (8699 KB) | HTML Full-text | XML Full-text
Abstract
Pneumonia affects 7% of the global population, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is routinely performed to diagnose the disease. Computer-aided diagnostic (CADx) tools aim to supplement decision-making. These tools process the handcrafted and/or convolutional neural network [...] Read more.
Pneumonia affects 7% of the global population, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is routinely performed to diagnose the disease. Computer-aided diagnostic (CADx) tools aim to supplement decision-making. These tools process the handcrafted and/or convolutional neural network (CNN) extracted image features for visual recognition. However, CNNs are perceived as black boxes since their performance lack explanations. This is a serious bottleneck in applications involving medical screening/diagnosis since poorly interpreted model behavior could adversely affect the clinical decision. In this study, we evaluate, visualize, and explain the performance of customized CNNs to detect pneumonia and further differentiate between bacterial and viral types in pediatric CXRs. We present a novel visualization strategy to localize the region of interest (ROI) that is considered relevant for model predictions across all the inputs that belong to an expected class. We statistically validate the models’ performance toward the underlying tasks. We observe that the customized VGG16 model achieves 96.2% and 93.6% accuracy in detecting the disease and distinguishing between bacterial and viral pneumonia respectively. The model outperforms the state-of-the-art in all performance metrics and demonstrates reduced bias and improved generalization. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Associative Memories to Accelerate Approximate Nearest Neighbor Search
Appl. Sci. 2018, 8(9), 1676; https://doi.org/10.3390/app8091676
Received: 22 August 2018 / Revised: 13 September 2018 / Accepted: 14 September 2018 / Published: 16 September 2018
PDF Full-text (420 KB) | HTML Full-text | XML Full-text
Abstract
Nearest neighbor search is a very active field in machine learning. It appears in many application cases, including classification and object retrieval. In its naive implementation, the complexity of the search is linear in the product of the dimension and the cardinality of [...] Read more.
Nearest neighbor search is a very active field in machine learning. It appears in many application cases, including classification and object retrieval. In its naive implementation, the complexity of the search is linear in the product of the dimension and the cardinality of the collection of vectors into which the search is performed. Recently, many works have focused on reducing the dimension of vectors using quantization techniques or hashing, while providing an approximate result. In this paper, we focus instead on tackling the cardinality of the collection of vectors. Namely, we introduce a technique that partitions the collection of vectors and stores each part in its own associative memory. When a query vector is given to the system, associative memories are polled to identify which one contains the closest match. Then, an exhaustive search is conducted only on the part of vectors stored in the selected associative memory. We study the effectiveness of the system when messages to store are generated from i.i.d. uniform ±1 random variables or 0–1 sparse i.i.d. random variables. We also conduct experiments on both synthetic data and real data and show that it is possible to achieve interesting trade-offs between complexity and accuracy. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest
Appl. Sci. 2018, 8(9), 1608; https://doi.org/10.3390/app8091608
Received: 2 August 2018 / Revised: 3 September 2018 / Accepted: 3 September 2018 / Published: 11 September 2018
PDF Full-text (5856 KB) | HTML Full-text | XML Full-text
Abstract
Due to the close resemblance between overlapping and cancerous nuclei, the misinterpretation of overlapping nuclei can affect the final decision of cancer cell detection. Thus, it is essential to detect overlapping nuclei and distinguish them from single ones for subsequent quantitative analyses. This [...] Read more.
Due to the close resemblance between overlapping and cancerous nuclei, the misinterpretation of overlapping nuclei can affect the final decision of cancer cell detection. Thus, it is essential to detect overlapping nuclei and distinguish them from single ones for subsequent quantitative analyses. This paper presents a method for the automated detection and classification of overlapping nuclei from single nuclei appearing in cytology pleural effusion (CPE) images. The proposed system is comprised of three steps: nuclei candidate extraction, dominant feature extraction, and classification of single and overlapping nuclei. A maximum entropy thresholding method complemented by image enhancement and post-processing was employed for nuclei candidate extraction. For feature extraction, a new combination of 16 geometrical and 10 textural features was extracted from each nucleus region. A double-strategy random forest was performed as an ensemble feature selector to select the most relevant features, and an ensemble classifier to differentiate between overlapping nuclei and single ones using selected features. The proposed method was evaluated on 4000 nuclei from CPE images using various performance metrics. The results were 96.6% sensitivity, 98.7% specificity, 92.7% precision, 94.6% F1 score, 98.4% accuracy, 97.6% G-mean, and 99% area under curve. The computation time required to run the entire algorithm was just 5.17 s. The experiment results demonstrate that the proposed algorithm yields a superior performance to previous studies and other classifiers. The proposed algorithm can serve as a new supportive tool in the automated diagnosis of cancer cells from cytology images. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Multiple Network Fusion with Low-Rank Representation for Image-Based Age Estimation
Appl. Sci. 2018, 8(9), 1601; https://doi.org/10.3390/app8091601
Received: 9 August 2018 / Revised: 26 August 2018 / Accepted: 3 September 2018 / Published: 10 September 2018
PDF Full-text (1699 KB) | HTML Full-text | XML Full-text
Abstract
Image-based age estimation is a challenging task since there are ambiguities between the apparent age of face images and the actual ages of people. Therefore, data-driven methods are popular. To improve data utilization and estimation performance, we propose an image-based age estimation method. [...] Read more.
Image-based age estimation is a challenging task since there are ambiguities between the apparent age of face images and the actual ages of people. Therefore, data-driven methods are popular. To improve data utilization and estimation performance, we propose an image-based age estimation method. Theoretically speaking, the key idea of the proposed method is to integrate multi-modal features of face images. In order to achieve it, we propose a multi-modal learning framework, which is called Multiple Network Fusion with Low-Rank Representation (MNF-LRR). In this process, different deep neural network (DNN) structures, such as autoencoders, Convolutional Neural Networks (CNNs), Recursive Neural Networks (RNNs), and so on, can be used to extract semantic information of facial images. The outputs of these neural networks are then represented in a low-rank feature space. In this way, feature fusion is obtained in this space, and robust multi-modal image features can be computed. An experimental evaluation is conducted on two challenging face datasets for image-based age estimation extracted from the Internet Move Database (IMDB) and Wikipedia (WIKI). The results show the effectiveness of the proposed MNF-LRR. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Planning Lung Radiotherapy Incorporating Motion Freeze PET/CT Imaging
Appl. Sci. 2018, 8(9), 1583; https://doi.org/10.3390/app8091583
Received: 30 July 2018 / Revised: 4 September 2018 / Accepted: 5 September 2018 / Published: 7 September 2018
PDF Full-text (1609 KB) | HTML Full-text | XML Full-text
Abstract
Motion Freeze (MF), which integrates 100% of the signal of each respiratory phase in four-dimensional positron emission tomography (4D-PET) images and creates the MF-PET, is capable of eliminate the influences induced by respiratory motion and dispersing from three-dimensional PET (3D-PET) and 4D-PET images. [...] Read more.
Motion Freeze (MF), which integrates 100% of the signal of each respiratory phase in four-dimensional positron emission tomography (4D-PET) images and creates the MF-PET, is capable of eliminate the influences induced by respiratory motion and dispersing from three-dimensional PET (3D-PET) and 4D-PET images. In this study, the effectiveness of respiratory gated radiotherapy applying MF-PET (MF-Plan) in lung cancer patient was investigated and compared with three-dimensional intensity modulated radiotherapy (3D-Plan) and routine respiratory gated radiotherapy (4D-Plan) on the impact of target volume and dosimetry. Thirteen lung cancer patients were enrolled. The internal target volumes were generated with 40% of maximum standardized uptake value. The 3D-Plan, 4D-Plan, and MF-Plan were created for each patient to study the radiation to the targets and organs at risk. MF-Plans were associated with significant reductions in lung, heart, and spinal cord doses. The median reductions in lung V20, lung mean, heart mean doses, and spinal cord maximum dose compared with 3D-Plans were improved. When compared with 4D-Plans, the median reductions in lung V20, lung mean dose, heart mean dose, and spinal cord maximum dose were improved. Our results indicate that the MF-Plan may improve critical organ sparing in the lung, heart, and spinal cord, while maintaining high target coverage. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Automatic Metallic Surface Defect Detection and Recognition with Convolutional Neural Networks
Appl. Sci. 2018, 8(9), 1575; https://doi.org/10.3390/app8091575
Received: 13 August 2018 / Revised: 31 August 2018 / Accepted: 4 September 2018 / Published: 6 September 2018
Cited by 2 | PDF Full-text (3979 KB) | HTML Full-text | XML Full-text
Abstract
Automatic metallic surface defect inspection has received increased attention in relation to the quality control of industrial products. Metallic defect detection is usually performed against complex industrial scenarios, presenting an interesting but challenging problem. Traditional methods are based on image processing or shallow [...] Read more.
Automatic metallic surface defect inspection has received increased attention in relation to the quality control of industrial products. Metallic defect detection is usually performed against complex industrial scenarios, presenting an interesting but challenging problem. Traditional methods are based on image processing or shallow machine learning techniques, but these can only detect defects under specific detection conditions, such as obvious defect contours with strong contrast and low noise, at certain scales, or under specific illumination conditions. This paper discusses the automatic detection of metallic defects with a twofold procedure that accurately localizes and classifies defects appearing in input images captured from real industrial environments. A novel cascaded autoencoder (CASAE) architecture is designed for segmenting and localizing defects. The cascading network transforms the input defect image into a pixel-wise prediction mask based on semantic segmentation. The defect regions of segmented results are classified into their specific classes via a compact convolutional neural network (CNN). Metallic defects under various conditions can be successfully detected using an industrial dataset. The experimental results demonstrate that this method meets the robustness and accuracy requirements for metallic defect detection. Meanwhile, it can also be extended to other detection applications. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle PSI-CNN: A Pyramid-Based Scale-Invariant CNN Architecture for Face Recognition Robust to Various Image Resolutions
Appl. Sci. 2018, 8(9), 1561; https://doi.org/10.3390/app8091561
Received: 13 August 2018 / Revised: 31 August 2018 / Accepted: 1 September 2018 / Published: 5 September 2018
PDF Full-text (2427 KB) | HTML Full-text | XML Full-text
Abstract
Face recognition is one research area that has benefited from the recent popularity of deep learning, namely the convolutional neural network (CNN) model. Nevertheless, the recognition performance is still compromised by the model’s dependency on the scale of input images and the limited [...] Read more.
Face recognition is one research area that has benefited from the recent popularity of deep learning, namely the convolutional neural network (CNN) model. Nevertheless, the recognition performance is still compromised by the model’s dependency on the scale of input images and the limited number of feature maps in each layer of the network. To circumvent these issues, we propose PSI-CNN, a generic pyramid-based scale-invariant CNN architecture which additionally extracts untrained feature maps across multiple image resolutions, thereby allowing the network to learn scale-independent information and improving the recognition performance on low resolution images. Experimental results on the LFW dataset and our own CCTV database show PSI-CNN consistently outperforming the widely-adopted VGG face model in terms of face matching accuracy. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle The Application of Deep Learning and Image Processing Technology in Laser Positioning
Appl. Sci. 2018, 8(9), 1542; https://doi.org/10.3390/app8091542
Received: 13 August 2018 / Revised: 30 August 2018 / Accepted: 31 August 2018 / Published: 3 September 2018
PDF Full-text (3669 KB) | HTML Full-text | XML Full-text
Abstract
In this study, machine vision technology was used to precisely position the highest energy of the laser spot to facilitate the subsequent joining of product workpieces in a laser welding machine. The displacement stage could place workpieces into the superposition area and allow [...] Read more.
In this study, machine vision technology was used to precisely position the highest energy of the laser spot to facilitate the subsequent joining of product workpieces in a laser welding machine. The displacement stage could place workpieces into the superposition area and allow the parts to be joined. With deep learning and a convolutional neural network training program, the system could enhance the accuracy of the positioning and enhance the efficiency of the machine work. A bi-analytic deep learning localization method was proposed in this study. A camera was used for real-time monitoring. The first step was to use a convolutional neural network to perform a large-scale preliminary search and locate the laser light spot region. The second step was to increase the optical magnification of the camera, re-image the spot area, and then use template matching to perform high-precision repositioning. According to the aspect ratio of the search result area, the integrity parameters of the target spot were determined. The centroid calculation was performed in the complete laser spot. If the target was an incomplete laser spot, the operation of invariant moments would be performed. Based on the result, the precise position of the highest energy of the laser spot could be obtained from the incomplete laser spot image. The amount of displacement could be calculated by overlapping the highest energy of the laser spot and the center of the image. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Partial Block Scheme and Adaptive Update Model for Kernelized Correlation Filters-Based Object Tracking
Appl. Sci. 2018, 8(8), 1349; https://doi.org/10.3390/app8081349
Received: 30 May 2018 / Revised: 6 August 2018 / Accepted: 7 August 2018 / Published: 10 August 2018
Cited by 1 | PDF Full-text (3690 KB) | HTML Full-text | XML Full-text
Abstract
In visual object tracking, the dynamic environment is a challenging issue. Partial occlusion and scale variation are typical challenging problems. We present a correlation-based object tracking based on the discriminative model. To attenuate the influence by partial occlusion, partial sub-blocks are constructed from [...] Read more.
In visual object tracking, the dynamic environment is a challenging issue. Partial occlusion and scale variation are typical challenging problems. We present a correlation-based object tracking based on the discriminative model. To attenuate the influence by partial occlusion, partial sub-blocks are constructed from the original block, and each of them operates independently. The scale space is employed to deal with scale variation using a feature pyramid. We also present an adaptive update model with a weighting function to calculate the frame-adaptive learning rate. Theoretical analysis and experimental results demonstrate that the proposed method can robustly track drastic deformed objects. The sparse update reduces the computational cost for real-time tracking. Although the partial block scheme generation increases the computational cost, we present a novel sparse update approach to reduce the computational cost drastically for real-time tracking. The experiments were performed on a variety of sequences, and the proposed method exhibited better performance compared with the state-of-the-art trackers. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Image Dehazing and Enhancement Using Principal Component Analysis and Modified Haze Features
Appl. Sci. 2018, 8(8), 1321; https://doi.org/10.3390/app8081321
Received: 5 June 2018 / Revised: 17 July 2018 / Accepted: 24 July 2018 / Published: 8 August 2018
PDF Full-text (16204 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a computationally efficient haze removal and image enhancement methods. The major contribution of the proposed research is two-fold: (i) an accurate atmospheric light estimation using principal component analysis, and (ii) learning-based transmission estimation. To reduce the computational cost, we impose [...] Read more.
This paper presents a computationally efficient haze removal and image enhancement methods. The major contribution of the proposed research is two-fold: (i) an accurate atmospheric light estimation using principal component analysis, and (ii) learning-based transmission estimation. To reduce the computational cost, we impose a constraint on the candidate pixels to estimate the haze components in the sub-image. In addition, the proposed method extracts modified haze-relevant features to estimate an accurate transmission using random forest. Experimental results show that the proposed method can provide high-quality results with a significantly reduced computational load compared with existing methods. In addition, we demonstrate that the proposed method can significantly enhance the contrast of low-light images according to the assumption on the visual similarity between the inverted low-light and haze images. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Deep Region of Interest and Feature Extraction Models for Palmprint Verification Using Convolutional Neural Networks Transfer Learning
Appl. Sci. 2018, 8(7), 1210; https://doi.org/10.3390/app8071210
Received: 15 April 2018 / Revised: 24 June 2018 / Accepted: 2 July 2018 / Published: 23 July 2018
Cited by 1 | PDF Full-text (1558 KB) | HTML Full-text | XML Full-text
Abstract
Palmprint verification is one of the most significant and popular approaches for personal authentication due to its high accuracy and efficiency. Using deep region of interest (ROI) and feature extraction models for palmprint verification, a novel approach is proposed where convolutional neural networks [...] Read more.
Palmprint verification is one of the most significant and popular approaches for personal authentication due to its high accuracy and efficiency. Using deep region of interest (ROI) and feature extraction models for palmprint verification, a novel approach is proposed where convolutional neural networks (CNNs) along with transfer learning are exploited. The extracted palmprint ROIs are fed to the final verification system, which is composed of two modules. These modules are (i) a pre-trained CNN architecture as a feature extractor and (ii) a machine learning classifier. In order to evaluate our proposed model, we computed the intersection over union (IoU) metric for ROI extraction along with accuracy, receiver operating characteristic (ROC) curves, and equal error rate (EER) for the verification task.The experiments demonstrated that the ROI extraction module could significantly find the appropriate palmprint ROIs, and the verification results were crucially precise. This was verified by different databases and classification methods employed in our proposed model. In comparison with other existing approaches, our model was competitive with the state-of-the-art approaches that rely on the representation of hand-crafted descriptors. We achieved a IoU score of 93% and EER of 0.0125 using a support vector machine (SVM) classifier for the contact-based Hong Kong Polytechnic University Palmprint (HKPU) database. It is notable that all codes are open-source and can be accessed online. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Automated Diabetic Retinopathy Screening System Using Hybrid Simulated Annealing and Ensemble Bagging Classifier
Appl. Sci. 2018, 8(7), 1198; https://doi.org/10.3390/app8071198
Received: 5 July 2018 / Revised: 18 July 2018 / Accepted: 19 July 2018 / Published: 22 July 2018
PDF Full-text (2831 KB) | HTML Full-text | XML Full-text
Abstract
Diabetic Retinopathy (DR) is the leading cause of blindness in working-age adults globally. Primary screening of DR is essential, and it is recommended that diabetes patients undergo this procedure at least once per year to prevent vision loss. However, in addition to the [...] Read more.
Diabetic Retinopathy (DR) is the leading cause of blindness in working-age adults globally. Primary screening of DR is essential, and it is recommended that diabetes patients undergo this procedure at least once per year to prevent vision loss. However, in addition to the insufficient number of ophthalmologists available, the eye examination itself is labor-intensive and time-consuming. Thus, an automated DR screening method using retinal images is proposed in this paper to reduce the workload of ophthalmologists in the primary screening process and so that ophthalmologists may make effective treatment plans promptly to help prevent patient blindness. First, all possible candidate lesions of DR were segmented from the whole retinal image using a combination of morphological-top-hat and Kirsch edge-detection methods supplemented by pre- and post-processing steps. Then, eight feature extractors were utilized to extract a total of 208 features based on the pixel density of the binary image as well as texture, color, and intensity information for the detected regions. Finally, hybrid simulated annealing was applied to select the optimal feature set to be used as the input to the ensemble bagging classifier. The evaluation results of this proposed method, on a dataset containing 1200 retinal images, indicate that it performs better than previous methods, with an accuracy of 97.08%, a sensitivity of 90.90%, a specificity of 98.92%, a precision of 96.15%, an F-measure of 93.45% and the area under receiver operating characteristic curve at 98.34%. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Image Segmentation by Searching for Image Feature Density Peaks
Appl. Sci. 2018, 8(6), 969; https://doi.org/10.3390/app8060969
Received: 14 May 2018 / Revised: 2 June 2018 / Accepted: 9 June 2018 / Published: 13 June 2018
Cited by 1 | PDF Full-text (17703 KB) | HTML Full-text | XML Full-text
Abstract
Image segmentation attempts to classify the pixels of a digital image into multiple groups to facilitate subsequent image processing. It is an essential problem in many research areas such as computer vision and image processing application. A large number of techniques have been [...] Read more.
Image segmentation attempts to classify the pixels of a digital image into multiple groups to facilitate subsequent image processing. It is an essential problem in many research areas such as computer vision and image processing application. A large number of techniques have been proposed for image segmentation. Among these techniques, the clustering-based segmentation algorithms occupy an extremely important position in this field. However, existing popular clustering schemes often depends on prior knowledge and threshold used in the clustering process, or lack of an automatic mechanism to find clustering centers. In this paper, we propose a novel image segmentation method by searching for image feature density peaks. We apply the clustering method to each superpixel in an input image and construct the final segmentation map according to the classification results of each pixel. Our method can give the number of clusters directly without prior knowledge, and the cluster centers can be recognized automatically without interference from noise. Experimental results validate the improved robustness and effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Superpixel Segmentation Using Weighted Coplanar Feature Clustering on RGBD Images
Appl. Sci. 2018, 8(6), 902; https://doi.org/10.3390/app8060902
Received: 14 April 2018 / Revised: 28 May 2018 / Accepted: 29 May 2018 / Published: 31 May 2018
PDF Full-text (9732 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Superpixel segmentation is a widely used preprocessing method in computer vision, but its performance is unsatisfactory for color images in cluttered indoor environments. In this work, a superpixel method named weighted coplanar feature clustering (WCFC) is proposed, which produces full coverage of superpixels [...] Read more.
Superpixel segmentation is a widely used preprocessing method in computer vision, but its performance is unsatisfactory for color images in cluttered indoor environments. In this work, a superpixel method named weighted coplanar feature clustering (WCFC) is proposed, which produces full coverage of superpixels in RGB-depth (RGBD) images of indoor scenes. Basically, a linear iterative clustering is adopted based on a cluster criterion that measures the color similarity, space proximity and geometric resemblance between pixels. However, to avoid the adverse impact of RGBD image flaws and to make full use of the depth information, WCFC first preprocesses the raw depth maps with an inpainting algorithm called a Cross-Bilateral Filter. Second, a coplanar feature is extracted from the refined RGBD image to represent the geometric similarities between pixels. Third, combined with the colors and positions of the pixels, the coplanar feature constructs the feature vector of the clustering method; thus, the distance measure, as the cluster criterion, is computed by normalizing the feature vectors. Finally, in order to extract the features of the RGBD image dynamically, a content-adaptive weight is introduced as a coefficient of the coplanar feature, which strikes a balance between the coplanar feature and other features. Experiments performed on the New York University (NYU) Depth V2 dataset demonstrate that WCFC outperforms the available state-of-the-art methods in terms of accuracy of superpixel segmentation, while maintaining a high speed. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Development and Experimental Evaluation of Machine-Learning Techniques for an Intelligent Hairy Scalp Detection System
Appl. Sci. 2018, 8(6), 853; https://doi.org/10.3390/app8060853
Received: 14 May 2018 / Revised: 20 May 2018 / Accepted: 21 May 2018 / Published: 23 May 2018
Cited by 1 | PDF Full-text (11678 KB) | HTML Full-text | XML Full-text
Abstract
Deep learning has become the most popular research subject in the fields of artificial intelligence (AI) and machine learning. In October 2013, MIT Technology Review commented that deep learning was a breakthrough technology. Deep learning has made progress in voice and image recognition, [...] Read more.
Deep learning has become the most popular research subject in the fields of artificial intelligence (AI) and machine learning. In October 2013, MIT Technology Review commented that deep learning was a breakthrough technology. Deep learning has made progress in voice and image recognition, image classification, and natural language processing. Prior to deep learning, decision tree, linear discriminant analysis (LDA), support vector machines (SVM), k-nearest neighbors algorithm (K-NN), and ensemble learning were popular in solving classification problems. In this paper, we applied the previously mentioned and deep learning techniques to hairy scalp images. Hairy scalp problems are usually diagnosed by non-professionals in hair salons, and people with such problems may be advised by these non-professionals. Additionally, several common scalp problems are similar; therefore, non-experts may provide incorrect diagnoses. Hence, scalp problems have worsened. In this work, we implemented and compared the deep-learning method, the ImageNet-VGG-f model Bag of Words (BOW), with machine-learning classifiers, and histogram of oriented gradients (HOG)/pyramid histogram of oriented gradients (PHOG) with machine-learning classifiers. The tools from the classification learner apps were used for hairy scalp image classification. The results indicated that deep learning can achieve an accuracy of 89.77% when the learning rate is 1 × 10−4, and this accuracy is far higher than those achieved by BOW with SVM (80.50%) and PHOG with SVM (53.0%). Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle An Improved Image Semantic Segmentation Method Based on Superpixels and Conditional Random Fields
Appl. Sci. 2018, 8(5), 837; https://doi.org/10.3390/app8050837
Received: 11 April 2018 / Revised: 11 May 2018 / Accepted: 17 May 2018 / Published: 22 May 2018
Cited by 2 | PDF Full-text (7324 KB) | HTML Full-text | XML Full-text
Abstract
This paper proposed an improved image semantic segmentation method based on superpixels and conditional random fields (CRFs). The proposed method can take full advantage of the superpixel edge information and the constraint relationship among different pixels. First, we employ fully convolutional networks (FCN) [...] Read more.
This paper proposed an improved image semantic segmentation method based on superpixels and conditional random fields (CRFs). The proposed method can take full advantage of the superpixel edge information and the constraint relationship among different pixels. First, we employ fully convolutional networks (FCN) to obtain pixel-level semantic features and utilize simple linear iterative clustering (SLIC) to generate superpixel-level region information, respectively. Then, the segmentation results of image boundaries are optimized by the fusion of the obtained pixel-level and superpixel-level results. Finally, we make full use of the color and position information of pixels to further improve the semantic segmentation accuracy using the pixel-level prediction capability of CRFs. In summary, this improved method has advantages both in terms of excellent feature extraction capability and good boundary adherence. Experimental results on both the PASCAL VOC 2012 dataset and the Cityscapes dataset show that the proposed method can achieve significant improvement of segmentation accuracy in comparison with the traditional FCN model. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle An Efficient Multiscale Scheme Using Local Zernike Moments for Face Recognition
Appl. Sci. 2018, 8(5), 827; https://doi.org/10.3390/app8050827
Received: 3 April 2018 / Revised: 13 May 2018 / Accepted: 17 May 2018 / Published: 21 May 2018
Cited by 2 | PDF Full-text (3482 KB) | HTML Full-text | XML Full-text
Abstract
In this study, we propose a face recognition scheme using local Zernike moments (LZM), which can be used for both identification and verification. In this scheme, local patches around the landmarks are extracted from the complex components obtained by LZM transformation. Then, phase [...] Read more.
In this study, we propose a face recognition scheme using local Zernike moments (LZM), which can be used for both identification and verification. In this scheme, local patches around the landmarks are extracted from the complex components obtained by LZM transformation. Then, phase magnitude histograms are constructed within these patches to create descriptors for face images. An image pyramid is utilized to extract features at multiple scales, and the descriptors are constructed for each image in this pyramid. We used three different public datasets to examine the performance of the proposed method:Face Recognition Technology (FERET), Labeled Faces in the Wild (LFW), and Surveillance Cameras Face (SCface). The results revealed that the proposed method is robust against variations such as illumination, facial expression, and pose. Aside from this, it can be used for low-resolution face images acquired in uncontrolled environments or in the infrared spectrum. Experimental results show that our method outperforms state-of-the-art methods on FERET and SCface datasets. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Open AccessArticle Multi-View Ground-Based Cloud Recognition by Transferring Deep Visual Information
Appl. Sci. 2018, 8(5), 748; https://doi.org/10.3390/app8050748
Received: 4 April 2018 / Revised: 4 May 2018 / Accepted: 5 May 2018 / Published: 9 May 2018
Cited by 1 | PDF Full-text (1174 KB) | HTML Full-text | XML Full-text
Abstract
Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance [...] Read more.
Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance metrics from sample pairs. Correspondingly, we propose transfer deep local binary patterns (TDLBP) and weighted metric learning (WML). On one hand, to deal with view shift, like variations of illuminations, locations, resolutions and occlusions, we first utilize cloud images to train a convolutional neural network (CNN), and then extract local features from the part summing maps (PSMs) based on feature maps. Finally, we maximize the occurrences of regions for the final feature representation. On the other hand, the number of cloud images in each category varies greatly, leading to the unbalanced similar pairs. Hence, we propose a weighted strategy for metric learning. We validate the proposed method on three cloud datasets (the MOC_e, IAP_e, and CAMS_e) that are collected by different meteorological organizations in China, and the experimental results show the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Review

Jump to: Research

Open AccessReview Place Recognition: An Overview of Vision Perspective
Appl. Sci. 2018, 8(11), 2257; https://doi.org/10.3390/app8112257
Received: 14 October 2018 / Revised: 30 October 2018 / Accepted: 31 October 2018 / Published: 15 November 2018
PDF Full-text (1129 KB) | HTML Full-text | XML Full-text
Abstract
Place recognition is one of the most fundamental topics in the computer-vision and robotics communities, where the task is to accurately and efficiently recognize the location of a given query image. Despite years of knowledge accumulated in this field, place recognition still remains [...] Read more.
Place recognition is one of the most fundamental topics in the computer-vision and robotics communities, where the task is to accurately and efficiently recognize the location of a given query image. Despite years of knowledge accumulated in this field, place recognition still remains an open problem due to the various ways in which the appearance of real-world places may differ. This paper presents an overview of the place-recognition literature. Since condition-invariant and viewpoint-invariant features are essential factors to long-term robust visual place-recognition systems, we start with traditional image-description methodology developed in the past, which exploits techniques from the image-retrieval field. Recently, the rapid advances of related fields, such as object detection and image classification, have inspired a new technique to improve visual place-recognition systems, that is, convolutional neural networks (CNNs). Thus, we then introduce the recent progress of visual place-recognition systems based on CNNs to automatically learn better image representations for places. Finally, we close with discussions and mention of future work on place recognition. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Figures

Figure 1

Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top