Applied Sciences

Editorial

Jump to: Research, Review, Other

6 pages, 191 KB

Open AccessEditorial

Special Issue on Advances in Deep Learning

by Diego Gragnaniello, Andrea Bottino, Sandro Cumani and Wonjoon Kim

Appl. Sci. 2020, 10(9), 3172; https://doi.org/10.3390/app10093172 - 2 May 2020

Cited by 4 | Viewed by 2371

Abstract

Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, [...] Read more.

Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, automatic translation, and many others [...] Full article

(This article belongs to the Special Issue Advances in Deep Learning)

Research

Jump to: Editorial, Review, Other

15 pages, 6023 KB

Open AccessArticle

Image-to-Image Translation Using Identical-Pair Adversarial Networks

by Thai Leang Sung and Hyo Jong Lee

Appl. Sci. 2019, 9(13), 2668; https://doi.org/10.3390/app9132668 - 30 Jun 2019

Cited by 10 | Viewed by 8256

Abstract

We propose Identical-pair Adversarial Networks (iPANs) to solve image-to-image translation problems, such as aerial-to-map, edge-to-photo, de-raining, and night-to-daytime. Our iPANs rely mainly on the effectiveness of adversarial loss function and its network architectures. Our iPANs consist of two main networks, an image transformation [...] Read more.

We propose Identical-pair Adversarial Networks (iPANs) to solve image-to-image translation problems, such as aerial-to-map, edge-to-photo, de-raining, and night-to-daytime. Our iPANs rely mainly on the effectiveness of adversarial loss function and its network architectures. Our iPANs consist of two main networks, an image transformation network T and a discriminative network D. We use U-NET for the transformation network T and a perceptual similarity network, which has two streams of VGG16 that share the same weights for network D. Our proposed adversarial losses play a minimax game against each other based on a real identical-pair and a fake identical-pair distinguished by the discriminative network D; e.g. a discriminative network D considers two inputs as a real pair only when they are identical, otherwise a fake pair. Meanwhile, the transformation network T tries to persuade the discriminator network D that the fake pair is a real pair. We experimented on several problems of image-to-image translation and achieved results that are comparable to those of some existing approaches, such as pix2pix, and PAN. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

13 pages, 1675 KB

Open AccessArticle

Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer

by Sanghyun Seo and Juntae Kim

Appl. Sci. 2019, 9(12), 2559; https://doi.org/10.3390/app9122559 - 23 Jun 2019

Cited by 34 | Viewed by 10181

Abstract

Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively [...] Read more.

Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively large memory space and computational costs, which not only increase the time to train the model but also limit the real-time application of the trained model. For this reason, various neural network compression methodologies have been studied to efficiently use CNN in small embedded hardware such as mobile and edge devices. In this paper, we propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. The proposed method performs efficient weights quantization using a significantly smaller number of sampled weights than the number of original weights. Four-bit quantization experiments on the classification of the ImageNet dataset with various CNN architectures show that the proposed methodology can perform weights quantization efficiently in terms of computational costs without significant reduction in model performance. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

16 pages, 1751 KB

Open AccessArticle

Improving Generative and Discriminative Modelling Performance by Implementing Learning Constraints in Encapsulated Variational Autoencoders

by Wenjun Bai, Changqin Quan and Zhi-Wei Luo

Appl. Sci. 2019, 9(12), 2551; https://doi.org/10.3390/app9122551 - 21 Jun 2019

Cited by 2 | Viewed by 4173

Abstract

Learning latent representations of observed data that can favour both discriminative and generative tasks remains a challenging task in artificial-intelligence (AI) research. Previous attempts that ranged from the convex binding of discriminative and generative models to the semisupervised learning paradigm could hardly yield [...] Read more.

Learning latent representations of observed data that can favour both discriminative and generative tasks remains a challenging task in artificial-intelligence (AI) research. Previous attempts that ranged from the convex binding of discriminative and generative models to the semisupervised learning paradigm could hardly yield optimal performance on both generative and discriminative tasks. To this end, in this research, we harness the power of two neuroscience-inspired learning constraints, that is, dependence minimisation and regularisation constraints, to improve generative and discriminative modelling performance of a deep generative model. To demonstrate the usage of these learning constraints, we introduce a novel deep generative model: encapsulated variational autoencoders (EVAEs) to stack two different variational autoencoders together with their learning algorithm. Using the MNIST digits dataset as a demonstration, the generative modelling performance of EVAEs was improved with the imposed dependence-minimisation constraint, encouraging our derived deep generative model to produce various patterns of MNIST-like digits. Using CIFAR-10(4K) as an example, a semisupervised EVAE with an imposed regularisation learning constraint was able to achieve competitive discriminative performance on the classification benchmark, even in the face of state-of-the-art semisupervised learning approaches. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

18 pages, 1713 KB

Open AccessArticle

Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

by Anvarjon Tursunov, Soonil Kwon and Hee-Suk Pang

Appl. Sci. 2019, 9(12), 2470; https://doi.org/10.3390/app9122470 - 17 Jun 2019

Cited by 31 | Viewed by 7222

Abstract

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence [...] Read more.

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

23 pages, 421 KB

Open AccessArticle

A Simple Convolutional Neural Network with Rule Extraction

by Guido Bologna

Appl. Sci. 2019, 9(12), 2411; https://doi.org/10.3390/app9122411 - 13 Jun 2019

Cited by 22 | Viewed by 5750

Abstract

Classification responses provided by Multi Layer Perceptrons (MLPs) can be explained by means of propositional rules. So far, many rule extraction techniques have been proposed for shallow MLPs, but not for Convolutional Neural Networks (CNNs). To fill this gap, this work presents a [...] Read more.

Classification responses provided by Multi Layer Perceptrons (MLPs) can be explained by means of propositional rules. So far, many rule extraction techniques have been proposed for shallow MLPs, but not for Convolutional Neural Networks (CNNs). To fill this gap, this work presents a new rule extraction method applied to a typical CNN architecture used in Sentiment Analysis (SA). We focus on the textual data on which the CNN is trained with “tweets” of movie reviews. Its architecture includes an input layer representing words by “word embeddings”, a convolutional layer, a max-pooling layer, followed by a fully connected layer. Rule extraction is performed on the fully connected layer, with the help of the Discretized Interpretable Multi Layer Perceptron (DIMLP). This transparent MLP architecture allows us to generate symbolic rules, by precisely locating axis-parallel hyperplanes. Experiments based on cross-validation emphasize that our approach is more accurate than that based on SVMs and decision trees that substitute DIMLPs. Overall, rules reach high fidelity and the discriminative n-grams represented in the antecedents explain the classifications adequately. With several test examples we illustrate the n-grams represented in the activated rules. They present the particularity to contribute to the final classification with a certain intensity. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 1924 KB

Open AccessArticle

Disentangled Feature Learning for Noise-Invariant Speech Enhancement

by Soo Hyun Bae, Inkyu Choi and Nam Soo Kim

Appl. Sci. 2019, 9(11), 2289; https://doi.org/10.3390/app9112289 - 3 Jun 2019

Cited by 3 | Viewed by 4459

Abstract

Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data [...] Read more.

Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Graphical abstract

14 pages, 2694 KB

Open AccessArticle

Boosting Targeted Black-Box Attacks via Ensemble Substitute Training and Linear Augmentation

by Xianfeng Gao, Yu-an Tan, Hongwei Jiang, Quanxin Zhang and Xiaohui Kuang

Appl. Sci. 2019, 9(11), 2286; https://doi.org/10.3390/app9112286 - 3 Jun 2019

Cited by 29 | Viewed by 3780

Abstract

These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box [...] Read more.

These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box and white-box attacks, according to the adversaries’ access to target models. In order to overcome the problem of black-box attackers’ unreachabilities to the internals of target DNN, many researchers put forward a series of strategies. Previous works include a method of training a local substitute model for the target black-box model via Jacobian-based augmentation and then use the substitute model to craft adversarial examples using white-box methods. In this work, we improve the dataset augmentation to make the substitute models better fit the decision boundary of the target model. Unlike the previous work that just performed the non-targeted attack, we make it first to generate targeted adversarial examples via training substitute models. Moreover, to boost the targeted attacks, we apply the idea of ensemble attacks to the substitute training. Experiments on MNIST and GTSRB, two common datasets for image classification, demonstrate our effectiveness and efficiency of boosting a targeted black-box attack, and we finally attack the MNIST and GTSRB classifiers with the success rates of 97.7% and 92.8%. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 872 KB

Open AccessArticle

Design and Investigation of Capsule Networks for Sentence Classification

by Haftu Wedajo Fentaw and Tae-Hyong Kim

Appl. Sci. 2019, 9(11), 2200; https://doi.org/10.3390/app9112200 - 29 May 2019

Cited by 21 | Viewed by 5711

Abstract

In recent years, convolutional neural networks (CNNs) have been used as an alternative to recurrent neural networks (RNNs) in text processing with promising results. In this paper, we investigated the newly introduced capsule networks (CapsNets), which are getting a lot of attention due [...] Read more.

In recent years, convolutional neural networks (CNNs) have been used as an alternative to recurrent neural networks (RNNs) in text processing with promising results. In this paper, we investigated the newly introduced capsule networks (CapsNets), which are getting a lot of attention due to their great performance gains on image analysis more than CNNs, for sentence classification or sentiment analysis in some cases. The results of our experiment show that the proposed well-tuned CapsNet model can be a good, sometimes better and cheaper, substitute of models based on CNNs and RNNs used in sentence classification. In order to investigate whether CapsNets can learn the sequential order of words or not, we performed a number of experiments by reshuffling the test data. Our CapsNet model shows an overall better classification performance and better resistance to adversarial attacks than CNN and RNN models. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

14 pages, 622 KB

Open AccessArticle

Confidence Measures for Deep Learning in Domain Adaptation

by Simone Bonechi, Paolo Andreini, Monica Bianchini, Akshay Pai and Franco Scarselli

Appl. Sci. 2019, 9(11), 2192; https://doi.org/10.3390/app9112192 - 29 May 2019

Cited by 6 | Viewed by 4060

Abstract

In recent years, Deep Neural Networks (DNNs) have led to impressive results in a wide variety of machine learning tasks, typically relying on the existence of a huge amount of supervised data. However, in many applications (e.g., bio–medical image analysis), gathering large sets [...] Read more.

In recent years, Deep Neural Networks (DNNs) have led to impressive results in a wide variety of machine learning tasks, typically relying on the existence of a huge amount of supervised data. However, in many applications (e.g., bio–medical image analysis), gathering large sets of labeled data can be very difficult and costly. Unsupervised domain adaptation exploits data from a source domain, where annotations are available, to train a model able to generalize also to a target domain, where labels are unavailable. Recent research has shown that Generative Adversarial Networks (GANs) can be successfully employed for domain adaptation, although deciding when to stop learning is a major concern for GANs. In this work, we propose some confidence measures that can be used to early stop the GAN training, also showing how such measures can be employed to predict the reliability of the network output. The effectiveness of the proposed approach has been tested in two domain adaptation tasks, with very promising results. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 936 KB

Open AccessArticle

Chronic Disease Prediction Using Character-Recurrent Neural Network in The Presence of Missing Information

by Changgyun Kim, Youngdoo Son and Sekyoung Youm

Appl. Sci. 2019, 9(10), 2170; https://doi.org/10.3390/app9102170 - 27 May 2019

Cited by 19 | Viewed by 5859

Abstract

The aim of this study was to predict chronic diseases in individual patients using a character-recurrent neural network (Char-RNN), which is a deep learning model that treats data in each class as a word when a large portion of its input values is [...] Read more.

The aim of this study was to predict chronic diseases in individual patients using a character-recurrent neural network (Char-RNN), which is a deep learning model that treats data in each class as a word when a large portion of its input values is missing. An advantage of Char-RNN is that it does not require any additional imputation method because it implicitly infers missing values considering the relationship with nearby data points. We applied Char-RNN to classify cases in the Korea National Health and Nutrition Examination Survey (KNHANES) VI as normal status and five chronic diseases: hypertension, stroke, angina pectoris, myocardial infarction, and diabetes mellitus. We also employed a multilayer perceptron network for the same task for comparison. The results show higher accuracy for Char-RNN than for the conventional multilayer perceptron model. Char-RNN showed remarkable performance in finding patients with hypertension and stroke. The present study utilized the KNHANES VI data to demonstrate a practical approach to predicting and managing chronic diseases with partially observed information. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

13 pages, 1288 KB

Open AccessArticle

Layer-Level Knowledge Distillation for Deep Neural Network Learning

by Hao-Ting Li, Shih-Chieh Lin, Cheng-Yeh Chen and Chen-Kuo Chiang

Appl. Sci. 2019, 9(10), 1966; https://doi.org/10.3390/app9101966 - 14 May 2019

Cited by 25 | Viewed by 7533

Abstract

Motivated by the recently developed distillation approaches that aim to obtain small and fast-to-execute models, in this paper a novel Layer Selectivity Learning (LSL) framework is proposed for learning deep models. We firstly use an asymmetric dual-model learning framework, called Auxiliary Structure Learning [...] Read more.

Motivated by the recently developed distillation approaches that aim to obtain small and fast-to-execute models, in this paper a novel Layer Selectivity Learning (LSL) framework is proposed for learning deep models. We firstly use an asymmetric dual-model learning framework, called Auxiliary Structure Learning (ASL), to train a small model with the help of a larger and well-trained model. Then, the intermediate layer selection scheme, called the Layer Selectivity Procedure (LSP), is exploited to determine the corresponding intermediate layers of source and target models. The LSP is achieved by two novel matrices, the layered inter-class Gram matrix and the inter-layered Gram matrix, to evaluate the diversity and discrimination of feature maps. The experimental results, demonstrated using three publicly available datasets, present the superior performance of model training using the LSL deep model learning framework. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

16 pages, 834 KB

Open AccessArticle

Heated Metal Mark Attribute Recognition Based on Compressed CNNs Model

by He Yin, Keming Mao, Jianzhe Zhao, Huidong Chang, Dazhi E and Zhenhua Tan

Appl. Sci. 2019, 9(9), 1955; https://doi.org/10.3390/app9091955 - 13 May 2019

Cited by 2 | Viewed by 2979

Abstract

This study considered heated metal mark attribute recognition based on compressed convolutional neural networks (CNNs) models. Based on our previous works, the heated metal mark image benchmark dataset was further expanded. State-of-the-art lightweight CNNs models were selected. Technologies of pruning, compressing, weight quantization [...] Read more.

This study considered heated metal mark attribute recognition based on compressed convolutional neural networks (CNNs) models. Based on our previous works, the heated metal mark image benchmark dataset was further expanded. State-of-the-art lightweight CNNs models were selected. Technologies of pruning, compressing, weight quantization were introduced and analyzed. Then, a multi-label model training method was devised. Moreover, the proposed models were deployed on Android devices. Finally, comprehensive experiments were evaluated. The results show that, with the fine-tuned compressed CNNs model, the recognition rate of attributes meta type, heating mode, heating temperature, heating duration, cooling mode, placing duration and relative humidity were 0.803, 0.837, 0.825, 0.812, 0.883, 0.817 and 0.894, respectively. The best model obtained an overall performance of 0.823. Comparing with traditional CNNs, the adopted compressed multi-label model greatly improved the training efficiency and reduced the space occupation, with a relatively small decrease in recognition accuracy. The running time on Android devices was acceptable. It is shown that the proposed model is applicable for real time application and is convenient to implement on mobile or embedded devices scenarios. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

14 pages, 1763 KB

Open AccessArticle

The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

by Pavel Stefanovič, Olga Kurasova and Rokas Štrimaitis

Appl. Sci. 2019, 9(9), 1870; https://doi.org/10.3390/app9091870 - 7 May 2019

Cited by 27 | Viewed by 7286

Abstract

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, [...] Read more.

In the paper the word-level n-grams based approach is proposed to find similarity between texts. The approach is a combination of two separate and independent techniques: self-organizing map (SOM) and text similarity measures. SOM’s uniqueness is that the obtained results of data clustering, as well as dimensionality reduction, are presented in a visual form. The four measures have been evaluated: cosine, dice, extended Jaccard’s, and overlap. First of all, texts have to be converted to numerical expression. For that purpose, the text has been split into the word-level n-grams and after that, the bag of n-grams has been created. The n-grams’ frequencies are calculated and the frequency matrix of dataset is formed. Various filters are used to create a bag of n-grams: stemming algorithms, number and punctuation removers, stop words, etc. All experimental investigation has been made using a corpus of plagiarized short answers dataset. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Graphical abstract

18 pages, 10497 KB

Open AccessArticle

A Deep Learning Method for Bearing Fault Diagnosis through Stacked Residual Dilated Convolutions

by Zilong Zhuang, Huichun Lv, Jie Xu, Zizhao Huang and Wei Qin

Appl. Sci. 2019, 9(9), 1823; https://doi.org/10.3390/app9091823 - 1 May 2019

Cited by 87 | Viewed by 5543

Abstract

Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a [...] Read more.

Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a deep learning method based on stacked residual dilated convolutional neural network (SRDCNN) is proposed for real-time bearing fault diagnosis, which is subtly combined by the dilated convolution, the input gate structure of long short-term memory network (LSTM) and the residual network. In the SRDCNN model, the dilated convolution is used to exponentially increase the receptive field of convolution kernel and extract features from the sample with more points, alleviating the influence of randomness. The input gate structure of LSTM could effectively remove noise and control the entry of information contained in the input sample. Meanwhile, the residual network is introduced to overcome the problem of vanishing gradients caused by the deeper structure of the neural network, hence improving the overall classification accuracy. The experimental results indicate that compared with three excellent models, the proposed SRDCNN model has higher denoising ability and better workload adaptability. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

24 pages, 2026 KB

Open AccessArticle

Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic

by Mircea-Bogdan Radac and Radu-Emil Precup

Appl. Sci. 2019, 9(9), 1807; https://doi.org/10.3390/app9091807 - 30 Apr 2019

Cited by 46 | Viewed by 5118

Abstract

This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require [...] Read more.

This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require an initial controller to start the learning process; however, systematic guidelines for choosing the initial controller are not offered in the literature, especially in a model-free manner. Virtual Reference Feedback Tuning (VRFT) is proposed for obtaining an initially stabilizing NN nonlinear state-feedback controller, designed from input-state-output data collected from the process in open-loop setting. The solution offers systematic design guidelines for initial controller design. The resulting suboptimal state-feedback controller is next improved under the AAC learning framework by online adaptation of a critic NN and a controller NN. The mixed VRFT-AAC approach is validated on a multi-input multi-output nonlinear constrained coupled vertical two-tank system. Discussions on the control system behavior are offered together with comparisons with similar approaches. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

18 pages, 3137 KB

Open AccessArticle

Deterministic and Probabilistic Wind Power Forecasting Based on Bi-Level Convolutional Neural Network and Particle Swarm Optimization

by Xiyun Yang, Yanfeng Zhang, Yuwei Yang and Wei Lv

Appl. Sci. 2019, 9(9), 1794; https://doi.org/10.3390/app9091794 - 29 Apr 2019

Cited by 31 | Viewed by 4169

Abstract

The intermittency and uncertainty of wind power result in challenges for large-scale wind power integration. Accurate wind power prediction is becoming increasingly important for power system planning and operation. In this paper, a probabilistic interval prediction method for wind power based on deep [...] Read more.

The intermittency and uncertainty of wind power result in challenges for large-scale wind power integration. Accurate wind power prediction is becoming increasingly important for power system planning and operation. In this paper, a probabilistic interval prediction method for wind power based on deep learning and particle swarm optimization (PSO) is proposed. Variational mode decomposition (VMD) and phase space reconstruction are used to pre-process the original wind power data to obtain additional details and uncover hidden information in the data. Subsequently, a bi-level convolutional neural network is used to learn nonlinear features in the pre-processed wind power data for wind power forecasting. PSO is used to determine the uncertainty of the point-based wind power prediction and to obtain the probabilistic prediction interval of the wind power. Wind power data from a Chinese wind farm and modeled wind power data provided by the United States Renewable Energy Laboratory are used to conduct extensive tests of the proposed method. The results show that the proposed method has competitive advantages for the point-based and probabilistic interval prediction of wind power. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 1106 KB

Open AccessArticle

Obtaining Human Experience for Intelligent Dredger Control: A Reinforcement Learning Approach

by Changyun Wei, Fusheng Ni and Xiujing Chen

Appl. Sci. 2019, 9(9), 1769; https://doi.org/10.3390/app9091769 - 28 Apr 2019

Cited by 18 | Viewed by 4060

Abstract

This work presents a reinforcement learning approach for intelligent decision-making of a Cutter Suction Dredger (CSD), which is a special type of vessel for deepening harbors, constructing ports or navigational channels, and reclaiming landfills. Currently, CSDs are usually controlled by human operators, and [...] Read more.

This work presents a reinforcement learning approach for intelligent decision-making of a Cutter Suction Dredger (CSD), which is a special type of vessel for deepening harbors, constructing ports or navigational channels, and reclaiming landfills. Currently, CSDs are usually controlled by human operators, and the production rate is mainly determined by the so-called cutting process (i.e., cutting the underwater soil into fragments). Long-term manual operation is likely to cause driving fatigue, resulting in operational accidents and inefficiencies. To reduce the labor intensity of the operator, we seek an intelligent controller the can manipulate the cutting process to replace human operators. To this end, our proposed reinforcement learning approach consists of two parts. In the first part, we employ a neural network model to construct a virtual environment based on the historical dredging data. In the second part, we develop a reinforcement learning model that can lean the optimal control policy by interacting with the virtual environment to obtain human experience. The results show that the proposed learning approach can successfully imitate the dredging behavior of an experienced human operator. Moreover, the learning approach can outperform the operator in a way that can make quick responses to the change in uncertain environments. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Graphical abstract

13 pages, 282 KB

Open AccessArticle

Abstract Text Summarization with a Convolutional Seq2seq Model

by Yong Zhang, Dan Li, Yuheng Wang, Yang Fang and Weidong Xiao

Appl. Sci. 2019, 9(8), 1665; https://doi.org/10.3390/app9081665 - 23 Apr 2019

Cited by 65 | Viewed by 7947

Abstract

Abstract text summarization aims to offer a highly condensed and valuable information that expresses the main ideas of the text. Most previous researches focus on extractive models. In this work, we put forward a new generative model based on convolutional seq2seq architecture. A [...] Read more.

Abstract text summarization aims to offer a highly condensed and valuable information that expresses the main ideas of the text. Most previous researches focus on extractive models. In this work, we put forward a new generative model based on convolutional seq2seq architecture. A hierarchical CNN framework is much more efficient than the conventional RNN seq2seq models. We also equip our model with a copying mechanism to deal with the rare or unseen words. Additionally, we incorporate a hierarchical attention mechanism to model the keywords and key sentences simultaneously. Finally we verify our model on two real-life datasets, GigaWord and DUC corpus. The experiment results verify the effectiveness of our model as it outperforms state-of-the-art alternatives consistently and statistical significantly. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

16 pages, 3315 KB

Open AccessArticle

Fertility Detection of Hatching Eggs Based on a Convolutional Neural Network

by Lei Geng, Yuzhou Hu, Zhitao Xiao and Jiangtao Xi

Appl. Sci. 2019, 9(7), 1408; https://doi.org/10.3390/app9071408 - 3 Apr 2019

Cited by 17 | Viewed by 5703

Abstract

In order to achieve the goal of detecting the fertility of hatching eggs which are divided into fertile eggs and dead eggs more accurately and effectively, a novel method combining a convolution neural network (CNN) and a heartbeat signal of the hatching eggs [...] Read more.

In order to achieve the goal of detecting the fertility of hatching eggs which are divided into fertile eggs and dead eggs more accurately and effectively, a novel method combining a convolution neural network (CNN) and a heartbeat signal of the hatching eggs is proposed in this paper. Firstly, we collected heartbeat signals of 9-day-later hatching eggs by the method of PhotoPlethysmoGraphy (PPG), which is a non-invasive method to detect the change of blood volume in living tissues by photoelectric means. Secondly, a sequential convolutional neural network E-CNN, which was used to analyze heartbeat sequence of hatching eggs, was designed. Thirdly, an end-to-end trainable convolutional neural network SR-CNN, which was used to process heartbeat waveform images of hatching eggs, was designed to improve the classification performance in this paper. Key to improving the classification performance of SR-CNN is the SE-Res module, which combines the channel weighting unit “Squeeze-and-Excitation” (SE) block and the residual structure. The experimental results show that two models trained on our dataset, with E-CNN and SR-CNN, are able to achieve the fertility detection of the hatching eggs with superior identification accuarcy, up to 99.50% and 99.62% respectively, on our test set. It is demonstrated that the proposed method is feasible for identifying and classifying the survival of hatching eggs accurately and effectively. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Graphical abstract

16 pages, 1976 KB

Open AccessArticle

Parts Semantic Segmentation Aware Representation Learning for Person Re-Identification

by Hua Gao, Shengyong Chen and Zhaosheng Zhang

Appl. Sci. 2019, 9(6), 1239; https://doi.org/10.3390/app9061239 - 25 Mar 2019

Cited by 13 | Viewed by 4908

Abstract

Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and [...] Read more.

Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and occlusion. In this paper, we propose a person re-identification network which fuses global and local features, to deal with part misalignment problem. The network is a four-branch convolutional neural network (CNN) which learns global person appearance and local features of three human body parts respectively. Local patches, including the head, torso and lower body, are segmented by using a U_Net semantic segmentation CNN architecture. All four feature maps are then concatenated and fused to represent a person image. We propose a DropParts method to solve the parts missing problem, with which the local features are weighed according to the number of parts found by semantic segmentation. Since three body parts are well aligned, the approach significantly improves person re-identification. Experiments on the standard benchmark datasets, such as Market1501, CUHK03 and DukeMTMC-reID datasets, show the effectiveness of our proposed pipeline. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

15 pages, 634 KB

Open AccessArticle

A Spam Filtering Method Based on Multi-Modal Fusion

by Hong Yang, Qihe Liu, Shijie Zhou and Yang Luo

Appl. Sci. 2019, 9(6), 1152; https://doi.org/10.3390/app9061152 - 19 Mar 2019

Cited by 46 | Viewed by 7081

Abstract

In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine [...] Read more.

In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine them to reduce the recognition rate of the single-modal spam filtering systems, thereby implementing the purpose of evading detection. In view of this situation, a new model called multi-modal architecture based on model fusion (MMA-MF) is proposed, which use a multi-modal fusion method to ensure it could effectively filter spam whether it is hidden in the text or in the image. The model fuses a Convolutional Neural Network (CNN) model and a Long Short-Term Memory (LSTM) model to filter spam. Using the LSTM model and the CNN model to process the text and image parts of an email separately to obtain two classification probability values, then the two classification probability values are incorporated into a fusion model to identify whether the email is spam or not. For the hyperparameters of the MMA-MF model, we use a grid search optimization method to get the most suitable hyperparameters for it, and employ a k-fold cross-validation method to evaluate the performance of this model. Our experimental results show that this model is superior to the traditional spam filtering systems and can achieve accuracies in the range of 92.64–98.48%. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

10 pages, 2840 KB

Open AccessArticle

Learning Deep CNN Denoiser Priors for Depth Image Inpainting

by Zun Li and Jin Wu

Appl. Sci. 2019, 9(6), 1103; https://doi.org/10.3390/app9061103 - 15 Mar 2019

Cited by 22 | Viewed by 4613

Abstract

Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color [...] Read more.

Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color images as a guide. Within the framework of model-based optimization methods for depth image inpainting, the split Bregman iteration algorithm was used to transform depth image inpainting into the corresponding denoising subproblem. Then, we trained a set of efficient convolutional neural network (CNN) denoisers to solve this subproblem. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with three traditional methods in terms of visual quality and objective metrics. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

13 pages, 4754 KB

Open AccessArticle

An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation

by Hongbo Qin, Haodi Zhang, Hai Wang, Yujin Yan, Min Zhang and Wei Zhao

Appl. Sci. 2019, 9(6), 1054; https://doi.org/10.3390/app9061054 - 13 Mar 2019

Cited by 12 | Viewed by 3873

Abstract

An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was [...] Read more.

An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was divided into two steps: (1) The semantic segmentation results were employed in the bounding box enhancement module (BEM) to correct the multibox results. (2) The semantic bounding box module (SBM) was used to optimize the adhesion text boundary of the semantic segmentation results. Non-maximum suppression (NMS) was adopted to merge the SBM and BEM results. Our algorithm was evaluated on the ICDAR2013 and SVT datasets. The experimental results show that the developed algorithm had a maximum increase of 13.62% in the F-measure score and the highest F-measure score was 81.38%. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

15 pages, 4097 KB

Open AccessArticle

An Automatic Modulation Recognition Method with Low Parameter Estimation Dependence Based on Spatial Transformer Networks

by Mingxuan Li, Ou Li, Guangyi Liu and Ce Zhang

Appl. Sci. 2019, 9(5), 1010; https://doi.org/10.3390/app9051010 - 11 Mar 2019

Cited by 15 | Viewed by 5541

Abstract

Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors [...] Read more.

Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors introduced during signal reception and processing will greatly deteriorate the classification performance, which affects the practical application of such methods. Therefore, we first analyze and quantify the errors introduced by signal detection and isolation in noncooperative communication through a baseline convolution neural network. In response to these errors, we then design a signal spatial transformer module based on the attention model to eliminate errors by a priori learning of signal structure. By cascading a signal spatial transformer module in front of the baseline classification network, we propose a method that can adaptively resample the signal capture to adjust time drift, symbol rate, and clock recovery. Besides, it can also automatically add a perturbation on the signal carrier to correct frequency offset. By applying this improved model to automatic modulation recognition, we obtain a significant improvement in classification performance compared with several existing methods. Our method significantly improves the prospect of the application of automatic modulation recognition based on deep learning under nonideal synchronization. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 2520 KB

Open AccessArticle

An On-Line and Adaptive Method for Detecting Abnormal Events in Videos Using Spatio-Temporal ConvNet

by Samir Bouindour, Hichem Snoussi, Mohamad Mazen Hittawe, Nacef Tazi and Tian Wang

Appl. Sci. 2019, 9(4), 757; https://doi.org/10.3390/app9040757 - 21 Feb 2019

Cited by 37 | Viewed by 4826

Abstract

We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we [...] Read more.

We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we develop a robust classifier based on the selection of vectors of interest. It is able to learn the normal behavior model and detect potentially dangerous abnormal events. This unsupervised method prevents the marginalization of normal events that occur rarely during the training phase since it minimizes redundancy information, and adapt to the appearance of new normal events that occur during the testing phase. Experimental results on challenging datasets show the superiority of the proposed method compared to the state of the art in both frame-level and pixel-level in anomaly detection task. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

14 pages, 13637 KB

Open AccessArticle

Joint Pedestrian and Body Part Detection via Semantic Relationship Learning

by Junhua Gu, Chuanxin Lan, Wenbai Chen and Hu Han

Appl. Sci. 2019, 9(4), 752; https://doi.org/10.3390/app9040752 - 21 Feb 2019

Cited by 11 | Viewed by 4525

Abstract

While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection [...] Read more.

While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection via semantic relationship learning under unconstrained scenarios. Specifically, we propose a Body Part Indexed Feature (BPIF) representation to encode the semantic relationship between individual body parts (i.e., head, head-shoulder, upper body, and whole body) and highlight per body part features, providing robustness against partial occlusions to the whole body. We also propose an Adaptive Joint Non-Maximum Suppression (AJ-NMS) to replace the original NMS algorithm widely used in object detection, leading to higher precision and recall for detecting overlapped pedestrians. Experimental results on the public-domain CUHK-SYSU Person Search Dataset show that the proposed approach outperforms the state-of-the-art methods for joint pedestrian and body part detection in the wild. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

13 pages, 395 KB

Open AccessArticle

A Deep Temporal Neural Music Recommendation Model Utilizing Music and User Metadata

by Hai-Tao Zheng, Jin-Yuan Chen, Nan Liang, Arun Kumar Sangaiah, Yong Jiang and Cong-Zhi Zhao

Appl. Sci. 2019, 9(4), 703; https://doi.org/10.3390/app9040703 - 18 Feb 2019

Cited by 23 | Viewed by 5029

Abstract

Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, [...] Read more.

Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, and the music characteristics are not fully exploited by these methods. In addition, the music characteristics and the users’ temporal preferences are not combined naturally, which cause the relatively low performance of music recommendation. To address these issues, we proposed a Deep Temporal Neural Music Recommendation model (DTNMR) based on music characteristics and the users’ temporal preferences. We encoded the music metadata into one-hot vectors and utilized the Deep Neural Network to project the music vectors to low-dimensional space and obtain the music characteristics. In addition, Long Short-Term Memory (LSTM) neural networks are utilized to learn about users’ long-term and short-term preferences from their listening histories. DTNMR alleviates the cold start problem in the item side using the music medadata and discovers new users’ preferences immediately after they listen to music. The experimental results show DTNMR outperforms seven baseline methods in terms of recall, precision, f-measure, MAP, user coverage and AUC. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

15 pages, 3141 KB

Open AccessArticle

Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation

by Hao Qu, Lilian Zhang, Xuesong Wu, Xiaofeng He, Xiaoping Hu and Xudong Wen

Appl. Sci. 2019, 9(3), 565; https://doi.org/10.3390/app9030565 - 8 Feb 2019

Cited by 21 | Viewed by 5067

Abstract

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order [...] Read more.

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

15 pages, 2087 KB

Open AccessArticle

Diverse Decoding for Abstractive Document Summarization

by Xu-Wang Han, Hai-Tao Zheng, Jin-Yuan Chen and Cong-Zhi Zhao

Appl. Sci. 2019, 9(3), 386; https://doi.org/10.3390/app9030386 - 23 Jan 2019

Cited by 11 | Viewed by 3652

Abstract

Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural [...] Read more.

Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

14 pages, 3205 KB

Open AccessArticle

Unsupervised Domain Adaptation with Coupled Generative Adversarial Autoencoders

by Xiaoqing Wang and Xiangjun Wang

Appl. Sci. 2018, 8(12), 2529; https://doi.org/10.3390/app8122529 - 7 Dec 2018

Cited by 10 | Viewed by 4642

Abstract

When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods [...] Read more.

When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods are based on a shared latent space assumption and they do not consider the situation when shared high level representations in different domains do not exist or are not ideal as they assumed. To overcome this limitation, we propose a neural network structure called coupled generative adversarial autoencoders (CGAA) that allows a pair of generators to learn the high-level differences between two domains by sharing only part of the high-level layers. Additionally, by introducing a class consistent loss calculated by a stand-alone classifier into the generator optimization, our model is able to generate class invariant style-transferred images suitable for classification tasks in domain adaptation. We apply CGAA to several domain transferred image classification scenarios including several benchmark datasets. Experiment results have shown that our method can achieve state-of-the-art classification results. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research, Other

24 pages, 11091 KB

Open AccessReview

A Survey on Deep Learning-Driven Remote Sensing Image Scene Understanding: Scene Classification, Scene Retrieval and Scene-Guided Object Detection

by Yating Gu, Yantian Wang and Yansheng Li

Appl. Sci. 2019, 9(10), 2110; https://doi.org/10.3390/app9102110 - 23 May 2019

Cited by 134 | Viewed by 12065

Abstract

As a fundamental and important task in remote sensing, remote sensing image scene understanding (RSISU) has attracted tremendous research interest in recent years. RSISU includes the following sub-tasks: remote sensing image scene classification, remote sensing image scene retrieval, and scene-driven remote sensing image [...] Read more.

As a fundamental and important task in remote sensing, remote sensing image scene understanding (RSISU) has attracted tremendous research interest in recent years. RSISU includes the following sub-tasks: remote sensing image scene classification, remote sensing image scene retrieval, and scene-driven remote sensing image object detection. Although these sub-tasks have different goals, they share some communal hints. Hence, this paper tries to discuss them as a whole. Similar to other domains (e.g., speech recognition and natural image recognition), deep learning has also become the state-of-the-art technique in RSISU. To facilitate the sustainable progress of RSISU, this paper presents a comprehensive review of deep-learning-based RSISU methods, and points out some future research directions and potential applications of RSISU. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

29 pages, 10285 KB

Open AccessEditor’s ChoiceReview

Review of Artificial Intelligence Adversarial Attack and Defense Technologies

by Shilin Qiu, Qihe Liu, Shijie Zhou and Chunjiang Wu

Appl. Sci. 2019, 9(5), 909; https://doi.org/10.3390/app9050909 - 4 Mar 2019

Cited by 333 | Viewed by 35986

Abstract

In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. [...] Read more.

In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. Therefore, improving the robustness of AI systems against adversarial attacks has played an increasingly important role in the further development of AI. This paper aims to comprehensively summarize the latest research progress on adversarial attack and defense technologies in deep learning. According to the target model’s different stages where the adversarial attack occurred, this paper expounds the adversarial attack methods in the training stage and testing stage respectively. Then, we sort out the applications of adversarial attack technologies in computer vision, natural language processing, cyberspace security, and the physical world. Finally, we describe the existing adversarial defense methods respectively in three main categories, i.e., modifying data, modifying models and using auxiliary tools. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

Other

Jump to: Editorial, Research, Review

9 pages, 2859 KB

Open AccessLetter

Variable Chromosome Genetic Algorithm for Structure Learning in Neural Networks to Imitate Human Brain

by Kang-moon Park, Donghoon Shin and Sung-do Chi

Appl. Sci. 2019, 9(15), 3176; https://doi.org/10.3390/app9153176 - 5 Aug 2019

Cited by 19 | Viewed by 4031

Abstract

This paper proposes the variable chromosome genetic algorithm (VCGA) for structure learning in neural networks. Currently, the structural parameters of neural networks, i.e., number of neurons, coupling relations, number of layers, etc., have mostly been designed on the basis of heuristic knowledge of [...] Read more.

This paper proposes the variable chromosome genetic algorithm (VCGA) for structure learning in neural networks. Currently, the structural parameters of neural networks, i.e., number of neurons, coupling relations, number of layers, etc., have mostly been designed on the basis of heuristic knowledge of an artificial intelligence (AI) expert. To overcome this limitation, in this study evolutionary approach (EA) has been utilized to automatically generate the proper artificial neural network (ANN) structures. VCGA has a new genetic operation called a chromosome attachment. By applying the VCGA, the initial ANN structures can be flexibly evolved toward the proper structure. The case study applied to the typical exclusive or (XOR) problem shows the feasibility of our methodology. Our approach is differentiated with others in that it uses a variable chromosome in the genetic algorithm. It makes a neural network structure vary naturally, both constructively and destructively. It has been shown that the XOR problem is successfully optimized using a VCGA with a chromosome attachment to learn the structure of neural networks. Research on the structure learning of more complex problems is the topic of our future research. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

17 pages, 7106 KB

Open AccessCase Report

Evaluation of Deep Learning Neural Networks for Surface Roughness Prediction Using Vibration Signal Analysis

by Wan-Ju Lin, Shih-Hsuan Lo, Hong-Tsu Young and Che-Lun Hung

Appl. Sci. 2019, 9(7), 1462; https://doi.org/10.3390/app9071462 - 8 Apr 2019

Cited by 120 | Viewed by 10447

Abstract

The use of surface roughness (Ra) to indicate product quality in the milling process in an intelligent monitoring system applied in-process has been developing. From the considerations of convenient installation and cost-effectiveness, accelerator vibration signals combined with deep learning predictive models for predicting [...] Read more.

The use of surface roughness (Ra) to indicate product quality in the milling process in an intelligent monitoring system applied in-process has been developing. From the considerations of convenient installation and cost-effectiveness, accelerator vibration signals combined with deep learning predictive models for predicting surface roughness is a potential tool. In this paper, three models, namely, Fast Fourier Transform-Deep Neural Networks (FFT-DNN), Fast Fourier Transform Long Short Term Memory Network (FFT-LSTM), and one-dimensional convolutional neural network (1-D CNN), are used to explore the training and prediction performances. Feature extraction plays an important role in the training and predicting results. FFT and the one-dimensional convolution filter, known as 1-D CNN, are employed to extract vibration signals’ raw data. The results show the following: (1) the LSTM model presents the temporal modeling ability to achieve a good performance at higher Ra value and (2) 1-D CNN, which is better at extracting features, exhibits highly accurate prediction performance at lower Ra ranges. Based on the results, vibration signals combined with a deep learning predictive model could be applied to predict the surface roughness in the milling process. Based on this experimental study, the use of prediction of the surface roughness via vibration signals using FFT-LSTM or 1-D CNN is recommended to develop an intelligent system. Full article

(This article belongs to the Special Issue Advances in Deep Learning)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Deep Learning

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (35 papers)

Editorial

Research

Review

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI