Deep Learning for Visual Contents Processing and Analysis

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (30 June 2021) | Viewed by 21840

Special Issue Editors


E-Mail Website
Guest Editor
LRIT, FLSHR, Mohammed V University in Rabat, Rabat, Morocco
Interests: computer vision; deep learning; complex networks; data modeling and analysis; artificial intelligence

Special Issue Information

Dear Colleagues,

Nowadays, many visualization and imaging systems generate complex visual contents with a large amount of data, making the data extremely difficult to handle. This growing mass of data requires new strategies for data analysis and interpretation. In recent years, particular attention has been paid to deep learning methods for visual contents analysis and applications. Inspired by artificial intelligence, mathematics, biology and other fields, these methods can find relationships between different categories of complex data and provide a set of tools for analyzing and handling visual contents.

This Special Issue will provide a forum to publish original research papers covering state-of-the-art, new algorithms, methodologies, applications, theories and implementations of deep learning methods for visual contents, such as image, video, stereoscopic images, 3D meshes, points clouds, visual graphs, etc.

This Special Issue is primarily focused on, but not limited to, the following topics:

  • Classification;
  • Retrieval;
  • Restoration;
  • Compression;
  • Segmentation;
  • Visual quality assessment;
  • Convolutional neural networks (CNN);
  • Autoencoders;
  • Generative adversarial networks (GAN);
  • Reinforcement learning.

Prof. Dr. Hocine Cherifi
Prof. Dr. Mohammed El Hassouni
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

35 pages, 3397 KiB  
Article
A Hybrid Robust Image Watermarking Method Based on DWT-DCT and SIFT for Copyright Protection
by Mohamed Hamidi, Mohamed El Haziti, Hocine Cherifi and Mohammed El Hassouni
J. Imaging 2021, 7(10), 218; https://doi.org/10.3390/jimaging7100218 - 19 Oct 2021
Cited by 11 | Viewed by 2811
Abstract
In this paper, a robust hybrid watermarking method based on discrete wavelet transform (DWT), discrete cosine transform (DCT), and scale-invariant feature transformation (SIFT) is proposed. Indeed, it is of prime interest to develop robust feature-based image watermarking schemes to withstand both image processing [...] Read more.
In this paper, a robust hybrid watermarking method based on discrete wavelet transform (DWT), discrete cosine transform (DCT), and scale-invariant feature transformation (SIFT) is proposed. Indeed, it is of prime interest to develop robust feature-based image watermarking schemes to withstand both image processing attacks and geometric distortions while preserving good imperceptibility. To this end, a robust watermark is embedded in the DWT-DCT domain to withstand image processing manipulations, while SIFT is used to protect the watermark from geometric attacks. First, the watermark is embedded in the middle band of the discrete cosine transform (DCT) coefficients of the HL1 band of the discrete wavelet transform (DWT). Then, the SIFT feature points are registered to be used in the extraction process to correct the geometric transformations. Extensive experiments have been conducted to assess the effectiveness of the proposed scheme. The results demonstrate its high robustness against standard image processing attacks and geometric manipulations while preserving a high imperceptibility. Furthermore, it compares favorably with alternative methods. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

13 pages, 940 KiB  
Article
Deep Features for Training Support Vector Machines
by Loris Nanni, Stefano Ghidoni and Sheryl Brahnam
J. Imaging 2021, 7(9), 177; https://doi.org/10.3390/jimaging7090177 - 5 Sep 2021
Cited by 15 | Viewed by 2982
Abstract
Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features [...] Read more.
Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

32 pages, 7118 KiB  
Article
Designing a Computer-Vision Application: A Case Study for Hand-Hygiene Assessment in an Open-Room Environment
by Chengzhang Zhong, Amy R. Reibman, Hansel A. Mina and Amanda J. Deering
J. Imaging 2021, 7(9), 170; https://doi.org/10.3390/jimaging7090170 - 30 Aug 2021
Cited by 9 | Viewed by 3320
Abstract
Hand-hygiene is a critical component for safe food handling. In this paper, we apply an iterative engineering process to design a hand-hygiene action detection system to improve food-handling safety. We demonstrate the feasibility of a baseline RGB-only convolutional neural network (CNN) in the [...] Read more.
Hand-hygiene is a critical component for safe food handling. In this paper, we apply an iterative engineering process to design a hand-hygiene action detection system to improve food-handling safety. We demonstrate the feasibility of a baseline RGB-only convolutional neural network (CNN) in the restricted case of a single scenario; however, since this baseline system performs poorly across scenarios, we also demonstrate the application of two methods to explore potential reasons for its poor performance. This leads to the development of our hierarchical system that incorporates a variety of modalities (RGB, optical flow, hand masks, and human skeleton joints) for recognizing subsets of hand-hygiene actions. Using hand-washing video recorded from several locations in a commercial kitchen, we demonstrate the effectiveness of our system for detecting hand hygiene actions in untrimmed videos. In addition, we discuss recommendations for designing a computer vision system for a real application. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

15 pages, 1907 KiB  
Article
Investigating Semantic Augmentation in Virtual Environments for Image Segmentation Using Convolutional Neural Networks
by Joshua Ganter, Simon Löffler, Ron Metzger, Katharina Ußling and Christoph Müller
J. Imaging 2021, 7(8), 146; https://doi.org/10.3390/jimaging7080146 - 14 Aug 2021
Viewed by 2101
Abstract
Collecting real-world data for the training of neural networks is enormously time-consuming and expensive. As such, the concept of virtualizing the domain and creating synthetic data has been analyzed in many instances. This virtualization offers many possibilities of changing the domain, and with [...] Read more.
Collecting real-world data for the training of neural networks is enormously time-consuming and expensive. As such, the concept of virtualizing the domain and creating synthetic data has been analyzed in many instances. This virtualization offers many possibilities of changing the domain, and with that, enabling the relatively fast creation of data. It also offers the chance to enhance necessary augmentations with additional semantic information when compared with conventional augmentation methods. This raises the question of whether such semantic changes, which can be seen as augmentations of the virtual domain, contribute to better results for neural networks, when trained with data augmented this way. In this paper, a virtual dataset is presented, including semantic augmentations and automatically generated annotations, as well as a comparison between semantic and conventional augmentation for image data. It is determined that the results differ only marginally for neural network models trained with the two augmentation approaches. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

15 pages, 14401 KiB  
Article
On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval
by Yan Gong, Georgina Cosma and Hui Fang
J. Imaging 2021, 7(8), 125; https://doi.org/10.3390/jimaging7080125 - 26 Jul 2021
Cited by 10 | Viewed by 4880
Abstract
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, [...] Read more.
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed that the pre-trained network, UNITER, achieved 61.5% on average Recall@5 for the task of retrieving all relevant descriptions. The traditional networks, VSRN, SCAN, and VSE++, achieved 50.3%, 47.1%, and 29.4% on average Recall@5, respectively, for the same task. An additional analysis was performed on image–text pairs from the top 25 worst-performing classes using a subset of the Flickr30K-based dataset to identify the limitations of the performance of the best-performing models, VSRN and UNITER. These limitations are discussed from the perspective of image scenes, image objects, image semantics, and basic functions of neural networks. This paper discusses the strengths and limitations of VSE networks to guide further research into the topic of using VSE networks for cross-modal information retrieval tasks. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

16 pages, 7265 KiB  
Article
No-Reference Quality Assessment of In-Capture Distorted Videos
by Mirko Agarla, Luigi Celona and Raimondo Schettini
J. Imaging 2020, 6(8), 74; https://doi.org/10.3390/jimaging6080074 - 30 Jul 2020
Cited by 14 | Viewed by 4145
Abstract
We introduce a no-reference method for the assessment of the quality of videos affected by in-capture distortions due to camera hardware and processing software. The proposed method encodes both quality attributes and semantic content of each video frame by using two Convolutional Neural [...] Read more.
We introduce a no-reference method for the assessment of the quality of videos affected by in-capture distortions due to camera hardware and processing software. The proposed method encodes both quality attributes and semantic content of each video frame by using two Convolutional Neural Networks (CNNs) and then estimates the quality score of the whole video by using a Recurrent Neural Network (RNN), which models the temporal information. The extensive experiments conducted on four benchmark databases (CVD2014, KoNViD-1k, LIVE-Qualcomm, and LIVE-VQC) containing in-capture distortions demonstrate the effectiveness of the proposed method and its ability to generalize in cross-database setup. Full article
(This article belongs to the Special Issue Deep Learning for Visual Contents Processing and Analysis)
Show Figures

Figure 1

Back to TopTop