Image and Video Processing Based on Deep Learning

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 June 2024 | Viewed by 5120

Special Issue Editor

School of Electronic Science and Technology, Beijing University of Technology, Beijing 100124, China
Interests: control theory; computer vision; deep learning

Special Issue Information

Dear Colleagues,

This Special Issue (SI) aims to present the research achievements of new theories and methods of image and video processing based on deep learning. Image and video processing has become an important research area with the fast developments of visual communication technologies and their use in portable devices, image sensors, medical imaging, video, and social networking tools. In recent years, deep neural networks, such as deep belief network, deep autoencoder, convolutional neural network (CNN), and transformer, have proved capable of extracting complex statistical features and efficiently learning their representations, allowing them to perform well across a wide variety of computer vision tasks, including image classification, image restoration/enhancement, video interpretation and understanding, and so on. However, various computer vision tasks based on deep learning still have many limitations because of its poor interpretability, reliance on large amounts of training data, trading complex computations for performance, and poor generalizability to real application scenarios. We look forward to the latest research findings that suggest theories and practical solutions for various computer vision tasks based on deep learning.

The topics of interest include, but are not limited to, the following:

  • Synthesis, rendering, and visualization.
  • Compression, coding, and transmission.
  • Detection, recognition, retrieval, and classification.
  • Restoration and enhancement.
  • Motion estimation, registration, and fusion.
  • Image and video interpretation and understanding.
  • Stereoscopic, multi-view, and 3D processing.
  • Biomedical and biological image processing.
  • Image and video quality models.
  • Learning with limited labels.

Dr. Jiafeng Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 3115 KiB  
Article
Task-Aligned Oriented Object Detection in Remote Sensing Images
by Xiaoliang Qian, Jiakun Zhao, Baokun Wu, Zhiwu Chen, Wei Wang and Han Kong
Electronics 2024, 13(7), 1301; https://doi.org/10.3390/electronics13071301 - 30 Mar 2024
Viewed by 594
Abstract
Oriented object detection (OOD) can recognize and locate various objects more precisely than horizontal object detection; however, two problems have not been satisfactorily resolved so far. Firstly, the absence of interactions between the classification and regression branches leads to inconsistent performance in the [...] Read more.
Oriented object detection (OOD) can recognize and locate various objects more precisely than horizontal object detection; however, two problems have not been satisfactorily resolved so far. Firstly, the absence of interactions between the classification and regression branches leads to inconsistent performance in the two tasks of object detection. Secondly, the traditional convolution operation cannot precisely extract the features of objects in extremely aspect ratio in remote sensing images (RSIs). To address the first problem, the task-aligned detection module (TADM) and the task-aligned loss function (TL) are proposed in this paper. On the one hand, a spatial probability map and a spatial offset map are inferred from the shared features in the TADM and separately incorporated into the classification and regression branches to obtain consistency in the two tasks. On the other hand, the TL combines employing the generalized intersection over union (GIoU) metric with classification loss to further enhance the consistency in the two tasks. To address the second problem, a two-stage detection framework based on alignment convolution (TDA) is proposed. The features extracted from the backbone network are refined through alignment convolution in the first stage, and the final OOD results are inferred from refined features in the second stage. The ablation study verifies the effectiveness of the TADM, TL, and TDA. The comparisons with other advanced methods, on two RSI benchmarks, demonstrate the overall effectiveness of our method. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

22 pages, 18065 KiB  
Article
Car Full View Dataset: Fine-Grained Predictions of Car Orientation from Images
by Andy Catruna, Pavel Betiu, Emanuel Tertes, Vladimir Ghita, Emilian Radoi, Irina Mocanu and Mihai Dascalu
Electronics 2023, 12(24), 4947; https://doi.org/10.3390/electronics12244947 - 09 Dec 2023
Viewed by 1426
Abstract
The orientation of objects plays an important role in accurate predictions for the tasks of classification, detection, and trajectory estimation. This is especially important in the automotive domain, where estimating an accurate car orientation can significantly impact the effectiveness of the other prediction [...] Read more.
The orientation of objects plays an important role in accurate predictions for the tasks of classification, detection, and trajectory estimation. This is especially important in the automotive domain, where estimating an accurate car orientation can significantly impact the effectiveness of the other prediction tasks. This work presents Car Full View (CFV), a novel dataset for car orientation prediction from images obtained by video recording all possible angles of individual vehicles in diverse scenarios. We developed a tool to semi-automatically annotate all the video frames with the respective car angle based on the walking speed of the recorder and manually annotated key angles. The final dataset contains over 23,000 images of individual cars along with fine-grained angle annotations. We study the performance of three state-of-the-art deep learning architectures on this dataset in three different learning settings: classification, regression, and multi-objective. The top result of 3.39° in circular mean absolute error (CMAE) shows that the model accurately predicts car orientations for unseen vehicles and images. Furthermore, we test the trained models on images from two different datasets and show their generalization capability to realistic images. We release the dataset and the best models while publishing a web service to annotate new images. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

13 pages, 5605 KiB  
Article
NAVIBox: Real-Time Vehicle–Pedestrian Risk Prediction System in an Edge Vision Environment
by Hyejin Lee, Haechan Cho, Byeongjoon Noh and Hwasoo Yeo
Electronics 2023, 12(20), 4311; https://doi.org/10.3390/electronics12204311 - 18 Oct 2023
Cited by 1 | Viewed by 896
Abstract
This study introduces a novel system, termed NAVIBox, designed to proactively identify vehicle–pedestrian risks using vision sensors deployed within edge computing devices in the field. NAVIBox consolidates all operational components into a single unit, resembling an intelligent CCTV system, and is built upon [...] Read more.
This study introduces a novel system, termed NAVIBox, designed to proactively identify vehicle–pedestrian risks using vision sensors deployed within edge computing devices in the field. NAVIBox consolidates all operational components into a single unit, resembling an intelligent CCTV system, and is built upon four core pipelines: motioned-video capture, object detection and tracking, trajectory refinement, and predictive risk recognition and warning decision. The operation begins with the capture of motioned video through a frame difference approach. Road users are subsequently detected, and their trajectories are determined using a deep learning-based lightweight object detection model, in conjunction with the Centroid tracker. In the trajectory refinement stage, the system converts the perspective of the original image into a top view and conducts grid segmentation to capture road users’ behaviors precisely. Lastly, vehicle–pedestrian risks are predicted by analyzing these extracted behaviors, and alert signals are promptly dispatched to drivers and pedestrians when risks are anticipated. The feasibility and practicality of the proposed system have been verified through implementation and testing in real-world test sites within Sejong City, South Korea. This systematic approach presents a comprehensive solution to proactively identify and address vehicle–pedestrian risks, enhancing safety and efficiency in urban environments. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

17 pages, 5166 KiB  
Article
Arbitrary-Oriented Object Detection in Aerial Images with Dynamic Deformable Convolution and Self-Normalizing Channel Attention
by Yutong Zhang, Chunjie Ma, Li Zhuo and Jiafeng Li
Electronics 2023, 12(9), 2132; https://doi.org/10.3390/electronics12092132 - 06 May 2023
Cited by 4 | Viewed by 1507
Abstract
Objects in aerial images often have arbitrary orientations and variable shapes and sizes. As a result, accurate and robust object detection in aerial images is a challenging problem. In this paper, an arbitrary-oriented object detection method for aerial images, based on Dynamic Deformable [...] Read more.
Objects in aerial images often have arbitrary orientations and variable shapes and sizes. As a result, accurate and robust object detection in aerial images is a challenging problem. In this paper, an arbitrary-oriented object detection method for aerial images, based on Dynamic Deformable Convolution (DDC) and Self-normalizing Channel Attention Mechanism (SCAM), is proposed; this method uses ReResNet-50 as the backbone network to extract rotation-equivariant features. First, DDC is proposed as a replacement for the conventional convolution operation in the Convolutional Neural Network (CNN) in order to cope with various shapes, sizes and arbitrary orientations of the objects. Second, SCAM embedded into the high layer of ReResNet-50, which allows the network to enhance the important feature channels and suppress the irrelevant ones. Finally, Rotation Regions of Interest (RRoI) are generated based on a Region Proposal Network (RPN) and a RoI Transformer (RT), and the RoI-wise classification and bounding box regression are realized by Rotation-invariant RoI Align (RiRoI Align). The proposed method is comprehensively evaluated on three publicly available benchmark datasets. The mean Average Precision (mAP) can reach 80.91%, 92.73% and 94.1% on DOTA-v1.0, DOTA-v1.5 and HRSC2016 datasets, respectively. The experimental results show that, when compared with the state-of-the-arts methods, the proposed method can achieve superior detection accuracy. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

Back to TopTop