sensors-logo

Journal Browser

Journal Browser

Machine Learning in Image/Video Processing and Sensing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 31 December 2025 | Viewed by 3551

Special Issue Editor


E-Mail Website
Guest Editor
College of Microelectronics, Fudan University, Shanghai 201203, China
Interests: image processing; video coding; machine learning; associated VLSI architecture

Special Issue Information

Dear Colleagues,

In recent years, machine learning methods have been increasingly applied to video and image processing, such as video compression, image denoising, super resolution, image generation, etc. At the algorithmic level, more and more video image processing algorithms are based on machine learning, achieving better video and image quality, as well as video compression rates. But, at the same time, they also face challenges in computational complexity and real-time processing capabilities. At the hardware level, some machine learning methods are gradually being applied to processor design, such as AI-ISP processors and AI-Codec processors. At the same time, they also face the challenge of integrating traditional hardware modules with machine learning acceleration modules. 

This Special Issue is focused on machine learning in image/video processing and sensing technologies, addressing (but not limited to) the following topics:

  • Machine learning in image and video compression;
  • Machine learning in image processing and enhancement;
  • Hardware design of accelerator for machine learning;
  • Hardware design of AI-Codec;
  • Hardware design of AI-ISP.

Prof. Dr. Yibo Fan
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image/video processing
  • video compression
  • machine learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 1569 KiB  
Article
IESSP: Information Extraction-Based Sparse Stripe Pruning Method for Deep Neural Networks
by Jingjing Liu, Lingjin Huang, Manlong Feng, Aiying Guo, Luqiao Yin and Jianhua Zhang
Sensors 2025, 25(7), 2261; https://doi.org/10.3390/s25072261 - 3 Apr 2025
Viewed by 266
Abstract
Network pruning is a deep learning model compression technique aimed at reducing model storage requirements and decreasing computational resource consumption. However, mainstream pruning techniques often encounter challenges such as limited precision in feature selection and a diminished feature extraction capability. To address these [...] Read more.
Network pruning is a deep learning model compression technique aimed at reducing model storage requirements and decreasing computational resource consumption. However, mainstream pruning techniques often encounter challenges such as limited precision in feature selection and a diminished feature extraction capability. To address these issues, we propose an information extraction-based sparse stripe pruning (IESSP) method. This method introduces an information extraction module (IEM), which enhances stripe selection through a mask-based mechanism, promoting inter-layer interactions and directing the network’s focus toward key features. In addition, we design a novel loss function that links output loss to stripe selection, enabling an effective balance between accuracy and efficiency. This loss function also supports the adaptive optimization of stripe sparsity during training. Experimental results on benchmark datasets demonstrate that the proposed method outperforms existing techniques. Specifically, when applied to prune the VGG-16 model on the CIFAR-10 dataset, the proposed method achieves a 0.29% improvement in accuracy while reducing FLOPs by 75.88% compared to the baseline. Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
Show Figures

Figure 1

17 pages, 19409 KiB  
Article
Wavelet-Based Topological Loss for Low-Light Image Denoising
by Alexandra Malyugina, Nantheera Anantrasirichai and David Bull
Sensors 2025, 25(7), 2047; https://doi.org/10.3390/s25072047 - 25 Mar 2025
Viewed by 247
Abstract
Despite significant advances in image denoising, most algorithms rely on supervised learning, with their performance largely dependent on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). [...] Read more.
Despite significant advances in image denoising, most algorithms rely on supervised learning, with their performance largely dependent on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that this assumption is invalid. Therefore, this paper tackles image corruption by real noise, providing a framework to capture and utilise the underlying structural information of an image along with the spatial information conventionally used for deep learning tasks. We propose a novel denoising loss function that incorporates topological invariants and is informed by textural information extracted from the image wavelet domain. The effectiveness of this proposed method was evaluated by training state-of-the-art denoising models on the BVI-Lowlight dataset, which features a wide range of real noise distortions. Adding a topological term to common loss functions leads to a significant increase in the LPIPS (Learned Perceptual Image Patch Similarity) metric, with the improvement reaching up to 25%. The results indicate that the proposed loss function enables neural networks to learn noise characteristics better. We demonstrate that they can consequently extract the topological features of noise-free images, resulting in enhanced contrast and preserved textural information. Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
Show Figures

Figure 1

20 pages, 2654 KiB  
Article
DCAN: Dynamic Channel Attention Network for Multi-Scale Distortion Correction
by Jianhua Zhang, Saijie Peng, Jingjing Liu and Aiying Guo
Sensors 2025, 25(5), 1482; https://doi.org/10.3390/s25051482 - 28 Feb 2025
Viewed by 455
Abstract
Image distortion correction is a fundamental yet challenging task in image restoration, especially in scenarios with complex distortions and fine details. Existing methods often rely on fixed-scale feature extraction, which struggles to capture multi-scale distortions. This limitation results in difficulties in achieving a [...] Read more.
Image distortion correction is a fundamental yet challenging task in image restoration, especially in scenarios with complex distortions and fine details. Existing methods often rely on fixed-scale feature extraction, which struggles to capture multi-scale distortions. This limitation results in difficulties in achieving a balance between global structural consistency and local detail preservation on distorted images with varying levels of complexity, resulting in suboptimal restoration quality for highly complex distortions. To address these challenges, this paper proposes a dynamic channel attention network (DCAN) for multi-scale distortion correction. Firstly, DCAN employs a multi-scale design and utilizes the optical flow network for distortion feature extraction, effectively balancing global structural consistency and local detail preservation under varying levels of distortion. Secondly, we present the channel attention and fusion selective module (CAFSM), which dynamically recalibrates feature importance across multi-scale distortions. By embedding CAFSM into the upsampling stage, the network enhances its ability to refine local features while preserving global structural integrity. Moreover, to further improve detail preservation and structural consistency, a comprehensive loss function is designed, incorporating structural similarity loss (SSIM Loss) to balance local and global optimization. Experimental results on the widely used Places2 dataset demonstrate that DCAN achieves state-of-the-art performance, with an average improvement of 1.55 dB in PSNR and 0.06 in SSIM compared with existing methods. Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
Show Figures

Figure 1

28 pages, 10234 KiB  
Article
Estimating QoE from Encrypted Video Conferencing Traffic
by Michael Sidorov, Raz Birman, Ofer Hadar and Amit Dvir
Sensors 2025, 25(4), 1009; https://doi.org/10.3390/s25041009 - 8 Feb 2025
Viewed by 678
Abstract
Traffic encryption is vital for internet security but complicates analytical applications like video delivery optimization or quality of experience (QoE) estimation, which often rely on clear text data. While many models address the problem of QoE prediction in video streaming, the video conferencing [...] Read more.
Traffic encryption is vital for internet security but complicates analytical applications like video delivery optimization or quality of experience (QoE) estimation, which often rely on clear text data. While many models address the problem of QoE prediction in video streaming, the video conferencing (VC) domain remains underexplored despite rising demand for these applications. Existing models often provide low-resolution predictions, categorizing QoE into broad classes such as “high” or “low”, rather than providing precise, continuous predictions. Moreover, most models focus on clear-text rather than encrypted traffic. This paper addresses these challenges by analyzing a large dataset of Zoom sessions and training five classical machine learning (ML) models and two custom deep neural networks (DNNs) to predict three QoE indicators: frames per second (FPS), resolution (R), and the naturalness image quality evaluator (NIQE). The models achieve mean error rates of 8.27%, 7.56%, and 2.08% for FPS, R, and NIQE, respectively, using a 10-fold cross-validation technique. This approach advances QoE assessment for encrypted traffic in VC applications. Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
Show Figures

Figure 1

18 pages, 13728 KiB  
Article
BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection
by Ruicheng Cao, Ruiteng Zhang, Xinyue Yan and Jian Zhang
Sensors 2024, 24(22), 7411; https://doi.org/10.3390/s24227411 - 20 Nov 2024
Cited by 1 | Viewed by 1515
Abstract
Degraded underwater images decrease the accuracy of underwater object detection. Existing research uses image enhancement methods to improve the visual quality of images, which may not be beneficial in underwater image detection and lead to serious degradation in detector performance. To alleviate this [...] Read more.
Degraded underwater images decrease the accuracy of underwater object detection. Existing research uses image enhancement methods to improve the visual quality of images, which may not be beneficial in underwater image detection and lead to serious degradation in detector performance. To alleviate this problem, we proposed a bidirectional guided method for underwater object detection, referred to as BG-YOLO. In the proposed method, a network is organized by constructing an image enhancement branch and an object detection branch in a parallel manner. The image enhancement branch consists of a cascade of an image enhancement subnet and object detection subnet. The object detection branch only consists of a detection subnet. A feature-guided module connects the shallow convolution layers of the two branches. When training the image enhancement branch, the object detection subnet in the enhancement branch guides the image enhancement subnet to be optimized towards the direction that is most conducive to the detection task. The shallow feature map of the trained image enhancement branch is output to the feature-guided module, constraining the optimization of the object detection branch through consistency loss and prompting the object detection branch to learn more detailed information about the objects. This enhances the detection performance. During the detection tasks, only the object detection branch is reserved so that no additional computational cost is introduced. Extensive experiments demonstrate that the proposed method significantly improves the detection performance of the YOLOv5s object detection network (the mAP is increased by up to 2.9%) and maintains the same inference speed as YOLOv5s (132 fps). Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
Show Figures

Figure 1

Back to TopTop