Special Issue "Deep Learning for Image and Video Understanding"

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (31 March 2019).

Special Issue Editors

Guest Editor
Prof. Adil Mehmood Khan Website E-Mail
Associate Professor and Head of Machine Learning & Knowledge Representation Lab, Innopolis University, Russia
Interests: Machine Learning, Data Mining, Pattern Recognition, Context-Aware Computing, Intelligent Systems, Data Modeling and Analysis
Guest Editor
Prof. Adín Ramírez Rivera Website E-Mail
Institute of Computing, Universidade Estadual de Campinas, Brazil
Interests: Image Processing; Computer Vision; Machine Learning

Special Issue Information

Dear Colleagues,

Comically trivial tasks, such as recognizing a handwritten digit in images, become dauntingly difficult when we try to automate them by writing a computer program. However, thanks to artificial neural networks, especially deep learning, we have found a solution to this problem. Such methods can learn to model such complex problems as a layered representation of simple concepts, directly from data, without requiring any hand-crafted features or hard-coded knowledge from experts.

Deep learning methods are therefore being employed on a large scale to solve computer vision problems. We invite you to submit your latest research in the area of deep learning and computer vision to this Special Issue, “Deep Learning for Image and Video Understanding.” We are looking for new and innovative deep learning approaches to solving problems such as object detection, segmentation, recognition, tracking, action recognition, etc.

High-quality papers are solicited to address both theoretical and practical issues of deep learning algorithms. Submissions are welcome both for traditional computer vision problems, as well as new applications. Potential topics include, but are not limited to:

  • K-shot learning for image and video understanding
  • Open set recognition for image and video understanding
  • Small-data learning for image and video understanding
  • Hierarchical and ensemble learning for image and video understanding
  • Data augmentation and transfer learning for image and video understanding
  • Semantic segmentation
  • Representation learning, feature detection and description for image and video understanding
  • Scene modeling and reconstruction
  • Scene understanding
  • Object detection, recognition and classification
  • Object pose estimation and tracking for image and video understanding
  • Person detection, tracking and identification for image and video understanding
  • Action and activity recognition for image and video understanding
  • Video annotation

Prof. Adil Mehmood Khan
Prof. Adín Ramírez Rivera
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Deep learning
  • computer vision
  • image processing

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Refinement of Background-Subtraction Methods Based on Convolutional Neural Network Features for Dynamic Background
Algorithms 2019, 12(7), 128; https://doi.org/10.3390/a12070128 - 27 Jun 2019
Abstract
Advancing the background-subtraction method in dynamic scenes is an ongoing timely goal for many researchers. Recently, background subtraction methods have been developed with deep convolutional features, which have improved their performance. However, most of these deep methods are supervised, only available for a [...] Read more.
Advancing the background-subtraction method in dynamic scenes is an ongoing timely goal for many researchers. Recently, background subtraction methods have been developed with deep convolutional features, which have improved their performance. However, most of these deep methods are supervised, only available for a certain scene, and have high computational cost. In contrast, the traditional background subtraction methods have low computational costs and can be applied to general scenes. Therefore, in this paper, we propose an unsupervised and concise method based on the features learned from a deep convolutional neural network to refine the traditional background subtraction methods. For the proposed method, the low-level features of an input image are extracted from the lower layer of a pretrained convolutional neural network, and the main features are retained to further establish the dynamic background model. The evaluation of the experiments on dynamic scenes demonstrates that the proposed method significantly improves the performance of traditional background subtraction methods. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Triplet Loss Network for Unsupervised Domain Adaptation
Algorithms 2019, 12(5), 96; https://doi.org/10.3390/a12050096 - 08 May 2019
Abstract
Domain adaptation is a sub-field of transfer learning that aims at bridging the dissimilarity gap between different domains by transferring and re-using the knowledge obtained in the source domain to the target domain. Many methods have been proposed to resolve this problem, using [...] Read more.
Domain adaptation is a sub-field of transfer learning that aims at bridging the dissimilarity gap between different domains by transferring and re-using the knowledge obtained in the source domain to the target domain. Many methods have been proposed to resolve this problem, using techniques such as generative adversarial networks (GAN), but the complexity of such methods makes it hard to use them in different problems, as fine-tuning such networks is usually a time-consuming task. In this paper, we propose a method for unsupervised domain adaptation that is both simple and effective. Our model (referred to as TripNet) harnesses the idea of a discriminator and Linear Discriminant Analysis (LDA) to push the encoder to generate domain-invariant features that are category-informative. At the same time, pseudo-labelling is used for the target data to train the classifier and to bring the same classes from both domains together. We evaluate TripNet against several existing, state-of-the-art methods on three image classification tasks: Digit classification (MNIST, SVHN, and USPC datasets), object recognition (Office31 dataset), and traffic sign recognition (GTSRB and Synthetic Signs datasets). Our experimental results demonstrate that (i) TripNet beats almost all existing methods (having a similar simple model like it) on all of these tasks; and (ii) for models that are significantly more complex (or hard to train) than TripNet, it even beats their performance in some cases. Hence, the results confirm the effectiveness of using TripNet for unsupervised domain adaptation in image classification. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Open AccessArticle
Learning an Efficient Convolution Neural Network for Pansharpening
Algorithms 2019, 12(1), 16; https://doi.org/10.3390/a12010016 - 08 Jan 2019
Cited by 2
Abstract
Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly [...] Read more.
Pansharpening is a domain-specific task of satellite imagery processing, which aims at fusing a multispectral image with a corresponding panchromatic one to enhance the spatial resolution of multispectral image. Most existing traditional methods fuse multispectral and panchromatic images in linear manners, which greatly restrict the fusion accuracy. In this paper, we propose a highly efficient inference network to cope with pansharpening, which breaks the linear limitation of traditional methods. In the network, we adopt a dilated multilevel block coupled with a skip connection to perform local and overall compensation. By using dilated multilevel block, the proposed model can make full use of the extracted features and enlarge the receptive field without introducing extra computational burden. Experiment results reveal that our network tends to induce competitive even superior pansharpening performance compared with deeper models. As our network is shallow and trained with several techniques to prevent overfitting, our model is robust to the inconsistencies across different satellites. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Open AccessArticle
A Robust Visual Tracking Algorithm Based on Spatial-Temporal Context Hierarchical Response Fusion
Algorithms 2019, 12(1), 8; https://doi.org/10.3390/a12010008 - 26 Dec 2018
Abstract
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical [...] Read more.
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Open AccessArticle
A Study on Faster R-CNN-Based Subway Pedestrian Detection with ACE Enhancement
Algorithms 2018, 11(12), 192; https://doi.org/10.3390/a11120192 - 26 Nov 2018
Cited by 1
Abstract
At present, the problem of pedestrian detection has attracted increasing attention in the field of computer vision. The faster regions with convolutional neural network features (Faster R-CNN) are regarded as one of the most important techniques for studying this problem. However, the detection [...] Read more.
At present, the problem of pedestrian detection has attracted increasing attention in the field of computer vision. The faster regions with convolutional neural network features (Faster R-CNN) are regarded as one of the most important techniques for studying this problem. However, the detection capability of the model trained by faster R-CNN is susceptible to the diversity of pedestrians’ appearance and the light intensity in specific scenarios, such as in a subway, which can lead to the decline in recognition rate and the offset of target selection for pedestrians. In this paper, we propose the modified faster R-CNN method with automatic color enhancement (ACE), which can improve sample contrast by calculating the relative light and dark relationship to correct the final pixel value. In addition, a calibration method based on sample categories reduction is presented to accurately locate the target for detection. Then, we choose the faster R-CNN target detection framework on the experimental dataset. Finally, the effectiveness of this method is verified with the actual data sample collected from the subway passenger monitoring video. Full article
(This article belongs to the Special Issue Deep Learning for Image and Video Understanding)
Show Figures

Figure 1

Back to TopTop