Special Issue "Deep Learning Technologies for Machine Vision and Audition"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 7 September 2020.

Special Issue Editors

Dr. Nikolaos Mitianoudis
Guest Editor
Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece
Interests: deep learning; computer vision; audio source separation; music information retrieval
Special Issues and Collections in MDPI journals
Assoc. Prof. Georgios Tzimiropoulos
Guest Editor
University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, United Kingdom
Interests: deep learning; computer vision; face recognition

Special Issue Information

Dear Colleagues,

In recent years, we have witnessed extensive breakthroughs in the field of autonomous robotics. One key element of a successful robotic system is the exploitation of the visual and auditory information around the system in order to make decisions. Therefore, machine vision and audition is a major task in most robotic systems. Humans, on the other hand, are very adept at handling and processing visual and auditory stimuli to perform series of tasks such as object detection and identification. The key element in these tasks is the human brain—a complicated organ featuring some billions of neurons and some trillions of synapses (connections) between them. In recent years, due to the rise of parallel-processing hardware (i.e., graphical processing units (GPUs)), we have seen the emergence of deep neural network architectures that attempt to emulate the vastness and complexity of the human brain in order to match its performance. This is particularly evident in machine vision and audition applications, where the emergence of deep learning techniques has boosted the performance of traditional shallow neural network architectures.

The aim of this Special Issue is to present and highlight the newest trends in deep learning for machine vision and audition applications. This may include but is not limited to:

  • Deep learning architectures;
  • Deep learning image and audio classification;
  • Deep learning object detection;
  • Deep learning semantic segmentation;
  • Deep learning image enhancement;
  • Deep learning music information retrieval tasks;
  • Deep learning audio-visual source separation;
  • Deep learning audio-visual enhancement;
  • Deep learning for audio-visual scene analysis;
  • Deep learning for audio-visual emotion recognition;
  • Deep learning for audio-visual face analysis.

Assoc. Prof. Nikolaos Mitianoudis
Assoc. Prof. Georgios Tzimiropoulos
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.


  • deep learning
  • image enhancement
  • object detection
  • image semantic segmentation
  • source separation
  • music information retrieval
  • audio enhancement
  • scene analysis
  • emotion recognition
  • face analysis

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:


Open AccessArticle
A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking
Electronics 2020, 9(5), 854; https://doi.org/10.3390/electronics9050854 - 21 May 2020
Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel [...] Read more.
Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second. Full article
(This article belongs to the Special Issue Deep Learning Technologies for Machine Vision and Audition)
Show Figures

Figure 1

Back to TopTop