Special Issue "Advances in Deep Learning"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 1 May 2019

Special Issue Editors

Guest Editor
Dr. Diego Gragnaniello

University of Naples Federico II, Department of Electrical Engineering and Information Technologies, via Claudio, 21, 80125 Napoli, Italy
Website | E-Mail
Interests: deep learning; computer vision; multimedia forensics; medical imaging; biometrics
Guest Editor
Prof. Dr. Andrea Bottino

Politecnico di Torino, Department of Control and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA
Website | E-Mail
Interests: computer vision; machine learning; human computer interaction; computer graphics; virtual and augmented reality; serious games
Guest Editor
Dr. Sandro Cumani

Politecnico di Torino, Department of Control and Computer Engineering, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
Website | E-Mail
Interests: speaker and language recognition; pattern recognition; machine learning; statistical models
Guest Editor
Dr. Wonjoon Kim

Department of Control and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA
Website | E-Mail
Interests: human factors; statistic learning; deep learning

Special Issue Information

Machine-learning-based algorithms are widespread in several aspects of our daily life, from the advertising and logistics systems of corporations to the applications on our smartphones and cameras, with an ever-increasing number of devices including dedicated hardware. This growing deployment of machine-learning-based algorithms would not have been possible if not for the lightning-fast progress of the relevant research.

In recent years, a growing interest in deep learning approaches has been observed among the scientific community. These are a particular class of machine-learning techniques that allow an intelligent system to automatically learn a suitable data representation from the data themselves. This has been even more successful for multimedia applications, such as video and audio classification, due to the ability of deep-learning-based techniques to extract the implicit information of this kind of data. For instance, various deep learning classifiers have reached human performance in medical image classification for the recognition of a large number of diseases, narrowing the gap between the analytic capability of the machine and that of the human brain. Great improvements have also been achieved in the field of natural language processing, with techniques able to analyze and extract information from a text even when it lacks a predetermined form.

An even more interesting research trend is focusing on generative models: A completely novel deep learning approach that has shown the ability to learn a complex statistical distribution from its samples in an unsupervised manner. The aim of this approach is to train a neural network to generate new samples of the learned distribution. Generative models have demonstrated their effectiveness in different fields, from the generation of image and video that are marginally distinguishable from the original ones to text and speech automatic translation.

We encourage authors to submit original research articles, reviews, theoretical and critical perspectives, and viewpoint articles, on (but not limited to) the following topics:

- Convolutional neural networks;

- Recurrent neural networks;

- Generative neural network models;

- Comparison of neural networks and other methods;

- Multiscale multimedia analysis;

- Constrained learning approaches for critical applications;

- Predictive analysis;

- Developing new models for multimodal deep learning;

- Combining multiple deep learning models;

- Applications in vision, audio, speech, natural language processing, robotics, neuroscience, or any other field.

Dr. Diego Gragnaniello
Prof. Dr. Andrea Bottino
Dr. Sandro Cumani
Dr. Wonjoon Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1500 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Deep learning Neural networks Generative neural network models Multiscale data representation Constrained optimization Predictive analysis Feature interpretation Deep learning analytics involving linked data

Published Papers (12 papers)

View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Parts Semantic Segmentation Aware Representation Learning for Person Re-Identification
Appl. Sci. 2019, 9(6), 1239; https://doi.org/10.3390/app9061239
Received: 26 February 2019 / Revised: 15 March 2019 / Accepted: 22 March 2019 / Published: 25 March 2019
PDF Full-text (1976 KB) | HTML Full-text | XML Full-text
Abstract
Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and [...] Read more.
Person re-identification is a typical computer vision problem which aims at matching pedestrians across disjoint camera views. It is challenging due to the misalignment of body parts caused by pose variations, background clutter, detection errors, camera point of view variation, different accessories and occlusion. In this paper, we propose a person re-identification network which fuses global and local features, to deal with part misalignment problem. The network is a four-branch convolutional neural network (CNN) which learns global person appearance and local features of three human body parts respectively. Local patches, including the head, torso and lower body, are segmented by using a U_Net semantic segmentation CNN architecture. All four feature maps are then concatenated and fused to represent a person image. We propose a DropParts method to solve the parts missing problem, with which the local features are weighed according to the number of parts found by semantic segmentation. Since three body parts are well aligned, the approach significantly improves person re-identification. Experiments on the standard benchmark datasets, such as Market1501, CUHK03 and DukeMTMC-reID datasets, show the effectiveness of our proposed pipeline. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle A Spam Filtering Method Based on Multi-Modal Fusion
Appl. Sci. 2019, 9(6), 1152; https://doi.org/10.3390/app9061152
Received: 24 January 2019 / Revised: 13 March 2019 / Accepted: 14 March 2019 / Published: 19 March 2019
PDF Full-text (634 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine [...] Read more.
In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine them to reduce the recognition rate of the single-modal spam filtering systems, thereby implementing the purpose of evading detection. In view of this situation, a new model called multi-modal architecture based on model fusion (MMA-MF) is proposed, which use a multi-modal fusion method to ensure it could effectively filter spam whether it is hidden in the text or in the image. The model fuses a Convolutional Neural Network (CNN) model and a Long Short-Term Memory (LSTM) model to filter spam. Using the LSTM model and the CNN model to process the text and image parts of an email separately to obtain two classification probability values, then the two classification probability values are incorporated into a fusion model to identify whether the email is spam or not. For the hyperparameters of the MMA-MF model, we use a grid search optimization method to get the most suitable hyperparameters for it, and employ a k-fold cross-validation method to evaluate the performance of this model. Our experimental results show that this model is superior to the traditional spam filtering systems and can achieve accuracies in the range of 92.64–98.48%. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle Learning Deep CNN Denoiser Priors for Depth Image Inpainting
by and
Appl. Sci. 2019, 9(6), 1103; https://doi.org/10.3390/app9061103
Received: 17 January 2019 / Revised: 11 March 2019 / Accepted: 12 March 2019 / Published: 15 March 2019
PDF Full-text (2840 KB) | HTML Full-text | XML Full-text
Abstract
Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color [...] Read more.
Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color images as a guide. Within the framework of model-based optimization methods for depth image inpainting, the split Bregman iteration algorithm was used to transform depth image inpainting into the corresponding denoising subproblem. Then, we trained a set of efficient convolutional neural network (CNN) denoisers to solve this subproblem. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with three traditional methods in terms of visual quality and objective metrics. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation
Appl. Sci. 2019, 9(6), 1054; https://doi.org/10.3390/app9061054
Received: 28 January 2019 / Revised: 27 February 2019 / Accepted: 8 March 2019 / Published: 13 March 2019
PDF Full-text (4754 KB) | HTML Full-text | XML Full-text
Abstract
An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was [...] Read more.
An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was divided into two steps: (1) The semantic segmentation results were employed in the bounding box enhancement module (BEM) to correct the multibox results. (2) The semantic bounding box module (SBM) was used to optimize the adhesion text boundary of the semantic segmentation results. Non-maximum suppression (NMS) was adopted to merge the SBM and BEM results. Our algorithm was evaluated on the ICDAR2013 and SVT datasets. The experimental results show that the developed algorithm had a maximum increase of 13.62% in the F-measure score and the highest F-measure score was 81.38%. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle An Automatic Modulation Recognition Method with Low Parameter Estimation Dependence Based on Spatial Transformer Networks
Appl. Sci. 2019, 9(5), 1010; https://doi.org/10.3390/app9051010
Received: 7 February 2019 / Revised: 5 March 2019 / Accepted: 7 March 2019 / Published: 11 March 2019
PDF Full-text (4097 KB) | HTML Full-text | XML Full-text
Abstract
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors [...] Read more.
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors introduced during signal reception and processing will greatly deteriorate the classification performance, which affects the practical application of such methods. Therefore, we first analyze and quantify the errors introduced by signal detection and isolation in noncooperative communication through a baseline convolution neural network. In response to these errors, we then design a signal spatial transformer module based on the attention model to eliminate errors by a priori learning of signal structure. By cascading a signal spatial transformer module in front of the baseline classification network, we propose a method that can adaptively resample the signal capture to adjust time drift, symbol rate, and clock recovery. Besides, it can also automatically add a perturbation on the signal carrier to correct frequency offset. By applying this improved model to automatic modulation recognition, we obtain a significant improvement in classification performance compared with several existing methods. Our method significantly improves the prospect of the application of automatic modulation recognition based on deep learning under nonideal synchronization. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle An On-Line and Adaptive Method for Detecting Abnormal Events in Videos Using Spatio-Temporal ConvNet
Appl. Sci. 2019, 9(4), 757; https://doi.org/10.3390/app9040757
Received: 24 January 2019 / Revised: 14 February 2019 / Accepted: 18 February 2019 / Published: 21 February 2019
PDF Full-text (2520 KB) | HTML Full-text | XML Full-text
Abstract
We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we [...] Read more.
We address in this paper the problem of abnormal event detection in video-surveillance. In this context, we use only normal events as training samples. We propose to use a modified version of pretrained 3D residual convolutional network to extract spatio-temporal features, and we develop a robust classifier based on the selection of vectors of interest. It is able to learn the normal behavior model and detect potentially dangerous abnormal events. This unsupervised method prevents the marginalization of normal events that occur rarely during the training phase since it minimizes redundancy information, and adapt to the appearance of new normal events that occur during the testing phase. Experimental results on challenging datasets show the superiority of the proposed method compared to the state of the art in both frame-level and pixel-level in anomaly detection task. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle Joint Pedestrian and Body Part Detection via Semantic Relationship Learning
Appl. Sci. 2019, 9(4), 752; https://doi.org/10.3390/app9040752
Received: 24 December 2018 / Revised: 26 January 2019 / Accepted: 5 February 2019 / Published: 21 February 2019
PDF Full-text (13637 KB) | HTML Full-text | XML Full-text
Abstract
While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection [...] Read more.
While remarkable progress has been made to pedestrian detection in recent years, robust pedestrian detection in the wild e.g., under surveillance scenarios with occlusions, remains a challenging problem. In this paper, we present a novel approach for joint pedestrian and body part detection via semantic relationship learning under unconstrained scenarios. Specifically, we propose a Body Part Indexed Feature (BPIF) representation to encode the semantic relationship between individual body parts (i.e., head, head-shoulder, upper body, and whole body) and highlight per body part features, providing robustness against partial occlusions to the whole body. We also propose an Adaptive Joint Non-Maximum Suppression (AJ-NMS) to replace the original NMS algorithm widely used in object detection, leading to higher precision and recall for detecting overlapped pedestrians. Experimental results on the public-domain CUHK-SYSU Person Search Dataset show that the proposed approach outperforms the state-of-the-art methods for joint pedestrian and body part detection in the wild. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle A Deep Temporal Neural Music Recommendation Model Utilizing Music and User Metadata
Appl. Sci. 2019, 9(4), 703; https://doi.org/10.3390/app9040703
Received: 16 December 2018 / Revised: 10 February 2019 / Accepted: 12 February 2019 / Published: 18 February 2019
PDF Full-text (395 KB) | HTML Full-text | XML Full-text
Abstract
Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, [...] Read more.
Deep learning shows its superiority in many domains such as computing vision, nature language processing, and speech recognition. In music recommendation, most deep learning-based methods focus on learning users’ temporal preferences using their listening histories. The cold start problem is not addressed, however, and the music characteristics are not fully exploited by these methods. In addition, the music characteristics and the users’ temporal preferences are not combined naturally, which cause the relatively low performance of music recommendation. To address these issues, we proposed a Deep Temporal Neural Music Recommendation model (DTNMR) based on music characteristics and the users’ temporal preferences. We encoded the music metadata into one-hot vectors and utilized the Deep Neural Network to project the music vectors to low-dimensional space and obtain the music characteristics. In addition, Long Short-Term Memory (LSTM) neural networks are utilized to learn about users’ long-term and short-term preferences from their listening histories. DTNMR alleviates the cold start problem in the item side using the music medadata and discovers new users’ preferences immediately after they listen to music. The experimental results show DTNMR outperforms seven baseline methods in terms of recall, precision, f-measure, MAP, user coverage and AUC. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation
Appl. Sci. 2019, 9(3), 565; https://doi.org/10.3390/app9030565
Received: 24 December 2018 / Revised: 28 January 2019 / Accepted: 30 January 2019 / Published: 8 February 2019
PDF Full-text (3141 KB) | HTML Full-text | XML Full-text
Abstract
The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order [...] Read more.
The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle Diverse Decoding for Abstractive Document Summarization
Appl. Sci. 2019, 9(3), 386; https://doi.org/10.3390/app9030386
Received: 13 December 2018 / Revised: 16 January 2019 / Accepted: 18 January 2019 / Published: 23 January 2019
PDF Full-text (2087 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural [...] Read more.
Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Open AccessArticle Unsupervised Domain Adaptation with Coupled Generative Adversarial Autoencoders
Appl. Sci. 2018, 8(12), 2529; https://doi.org/10.3390/app8122529
Received: 9 November 2018 / Revised: 27 November 2018 / Accepted: 5 December 2018 / Published: 7 December 2018
PDF Full-text (3205 KB) | HTML Full-text | XML Full-text
Abstract
When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods [...] Read more.
When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods are based on a shared latent space assumption and they do not consider the situation when shared high level representations in different domains do not exist or are not ideal as they assumed. To overcome this limitation, we propose a neural network structure called coupled generative adversarial autoencoders (CGAA) that allows a pair of generators to learn the high-level differences between two domains by sharing only part of the high-level layers. Additionally, by introducing a class consistent loss calculated by a stand-alone classifier into the generator optimization, our model is able to generate class invariant style-transferred images suitable for classification tasks in domain adaptation. We apply CGAA to several domain transferred image classification scenarios including several benchmark datasets. Experiment results have shown that our method can achieve state-of-the-art classification results. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Review

Jump to: Research

Open AccessReview Review of Artificial Intelligence Adversarial Attack and Defense Technologies
Appl. Sci. 2019, 9(5), 909; https://doi.org/10.3390/app9050909
Received: 19 January 2019 / Revised: 20 February 2019 / Accepted: 22 February 2019 / Published: 4 March 2019
PDF Full-text (10285 KB) | HTML Full-text | XML Full-text
Abstract
In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. [...] Read more.
In recent years, artificial intelligence technologies have been widely used in computer vision, natural language processing, automatic driving, and other fields. However, artificial intelligence systems are vulnerable to adversarial attacks, which limit the applications of artificial intelligence (AI) technologies in key security fields. Therefore, improving the robustness of AI systems against adversarial attacks has played an increasingly important role in the further development of AI. This paper aims to comprehensively summarize the latest research progress on adversarial attack and defense technologies in deep learning. According to the target model’s different stages where the adversarial attack occurred, this paper expounds the adversarial attack methods in the training stage and testing stage respectively. Then, we sort out the applications of adversarial attack technologies in computer vision, natural language processing, cyberspace security, and the physical world. Finally, we describe the existing adversarial defense methods respectively in three main categories, i.e., modifying data, modifying models and using auxiliary tools. Full article
(This article belongs to the Special Issue Advances in Deep Learning)
Figures

Figure 1

Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top