sensors-logo

Journal Browser

Journal Browser

Deep Learning Technologies and Their Applications in Image Processing, Computer Vision, and Computational Intelligence

A special issue of Sensors (ISSN 1424-8220).

Deadline for manuscript submissions: closed (15 May 2026) | Viewed by 8735

Special Issue Editors

College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Interests: image/video restoration; image/video coding; machine learning; image segmentation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep learning has emerged as a pivotal technology across diverse domains, including image processing, computer vision, natural language processing, speech recognition, and beyond. With rapid advancements in artificial intelligence, deep learning, and high-performance computing, image, vision, and computing technologies have been widely implemented in autonomous driving, medical imaging, smart cities, augmented reality, and other cutting-edge fields.

These technological breakthroughs and expanded applications not only offer novel tools and methodologies for scientific research but also enhance industrial innovation. In the era of intelligence and digitization, deep learning technologies and their applications in image, vision, and computing are accelerating societal progress while providing critical support for future talent cultivation and disciplinary development.

This Special Issue will showcase the latest advances in deep learning, encompassing fundamental technologies and interdisciplinary applications in image processing, computer vision, intelligent computing, and related domains. We invite original research papers and comprehensive literature reviews addressing the aforementioned topics. Particularly, extended versions of papers accepted at ICIVC 2025 (https://icivc.org/) and ICDLT 2025 (https://www.icdlt.org/) are highly encouraged.

You may choose our Joint Special Issue in AI.

Dr. Honggang Chen
Dr. Chao Ren
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning models and algorithms
  • machine learning theory and technology
  • image processing theory and applications
  • computer graphics and computational photography
  • computer vision techniques and applications
  • multimedia technology

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 13632 KB  
Article
Research on Multi-Agent Semantic Communication Framework Based on Comparative Learning Joint Optimization
by Hong Yang, Hongyan Li, Honggang Chen, Lijuan Wang, Ji Li, Linbo Qing and Xiaohai He
Sensors 2026, 26(10), 2963; https://doi.org/10.3390/s26102963 - 8 May 2026
Viewed by 583
Abstract
With the rapid development of intelligent services, communication objectives are shifting from humans to multi-agent (MA) systems. This transition necessitates new communication paradigms capable of supporting real-time perception, decision-making, and collaboration among agents. Semantic communication (SeC) focuses on the efficient transmission and accurate [...] Read more.
With the rapid development of intelligent services, communication objectives are shifting from humans to multi-agent (MA) systems. This transition necessitates new communication paradigms capable of supporting real-time perception, decision-making, and collaboration among agents. Semantic communication (SeC) focuses on the efficient transmission and accurate understanding of information “meaning” and is well-suited to meet the needs of Mas, such as collaborative perception, reasoning, and decision-making. However, the transmission of semantic information is still constrained by dynamic environments and the diversity of MA tasks. To address these challenges, this work proposes a COmparative learning Joint Optimal (COJO) SeC framework. This work makes three main contributions: first, it jointly optimizes the image reconstruction and classification functions designed for multi-task semantic objectives under different channel conditions, thereby improving the overall task performance of the system; second, based on input image features, compression ratio, task requirements, and channel conditions, an enhanced further compressor is designed, which obtains a training-based mask to significantly reduce the volume of transmitted data; finally, to prevent the loss of key semantic information in multi-task scenarios under channel constraints, it designs a task-driven end-to-end semantic communication training scheme. Full article
Show Figures

Figure 1

22 pages, 3108 KB  
Article
Self-Information-Driven Gated Graph Convolutional Network for Occluded Person Re-Identification
by Wanran Guo, Jiake Meng, Yuan Xue, Yaxian Fan and Zhenyu Fang
Sensors 2026, 26(9), 2901; https://doi.org/10.3390/s26092901 - 6 May 2026
Viewed by 764
Abstract
Occluded person re-identification (Re-ID) aims to accurately match occluded pedestrian images against complete gallery images captured across multiple cameras, a task that is critical to public security and intelligent surveillance systems. Existing graph neural network (GNN)-based methods typically assign uniform aggregation weights to [...] Read more.
Occluded person re-identification (Re-ID) aims to accurately match occluded pedestrian images against complete gallery images captured across multiple cameras, a task that is critical to public security and intelligent surveillance systems. Existing graph neural network (GNN)-based methods typically assign uniform aggregation weights to all nodes, failing to reflect the inherent reliability difference between visible and occluded body regions, which allows noise from low-confidence nodes to propagate freely and corrupt the final pedestrian representation. To address this, we propose the Self-Information-Driven Gated Graph Convolutional Network (SI-GCN). Keypoint detection confidence scores are transformed into logarithmic self-information measures as uncertainty priors for a learnable gating mechanism. The proposed SIG module enables visible nodes to dominate information diffusion while occluded nodes absorb more from neighbors, achieving efficient feature updating. A dynamic confidence calibration (DCC) strategy further synchronizes node reliability estimates with feature evolution across successive GCN layers. Extensive experiments on six public benchmarks covering occluded, partial, and holistic Re-ID scenarios demonstrate that SI-GCN achieves state-of-the-art performance, with Rank-1 accuracy and mAP improvements of 1.2% and 0.9%, respectively, over the strongest baseline on the Occluded-REID dataset, demonstrating its strong potential for deployment in real-world public security and urban surveillance applications where occlusion is pervasive. Full article
Show Figures

Figure 1

24 pages, 11176 KB  
Article
JMSC: Joint Spatial–Temporal Modeling with Semantic Completion for Audio–Visual Learning
by Xinfu Xu, Fan Yang and Zhibin Yu
Sensors 2026, 26(4), 1288; https://doi.org/10.3390/s26041288 - 16 Feb 2026
Viewed by 628
Abstract
Audio–visual learning seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted parameter-efficient tuning methods to adapt large-scale vision models to the audio–visual domain. [...] Read more.
Audio–visual learning seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted parameter-efficient tuning methods to adapt large-scale vision models to the audio–visual domain. Despite the competitive performance of existing methods, several challenges persist. Firstly, effectively leveraging the complementary semantics between the audio and visual modalities remains difficult, as these two modalities capture fundamentally different aspects of a video. Secondly, comprehending dynamic video context is challenging because both spatial attributes (such as scale) and temporal characteristics (such as motion) of objects co-evolve over time, making semantic comprehension more complex. To address these challenges, we propose a novel framework, named Joint Spatial–Temporal Modeling with Semantic Completion (JMSC). JMSC introduces cross-modal latent reconstruction, which moves beyond shallow correlation by encouraging the model to reconstruct one modality’s complete semantic summary from a masked version of its counterpart. Furthermore, JMSC learns a unified representation of video spatial attributes and temporal changes by jointly modeling them under audio guidance, enabling accurate localization and consistent tracking in dynamic video scenes. Experimental results demonstrate that JMSC achieves state-of-the-art performance across multiple downstream tasks while maintaining high computational efficiency. Full article
Show Figures

Figure 1

25 pages, 3611 KB  
Article
Automatic Estimation of Football Possession via Improved YOLOv8 Detection and DBSCAN-Based Team Classification
by Rong Guo, Yucheng Zeng, Rong Deng, Yawen Lei, Yonglin Che, Lin Yu, Jianpeng Zhang, Xiaobin Xu, Zhaoxiang Ma, Jiajin Zhang and Jianke Yang
Sensors 2026, 26(4), 1252; https://doi.org/10.3390/s26041252 - 14 Feb 2026
Viewed by 1241
Abstract
Recent developments in computer vision have significantly enhanced the automation and objectivity of sports analytics. This paper proposes a novel deep learning-based framework for estimating football possession directly from broadcast video, eliminating the reliance on manual annotations or event-based data that are often [...] Read more.
Recent developments in computer vision have significantly enhanced the automation and objectivity of sports analytics. This paper proposes a novel deep learning-based framework for estimating football possession directly from broadcast video, eliminating the reliance on manual annotations or event-based data that are often labor-intensive, subjective, and temporally coarse. The framework incorporates two structurally improved object detection models: YOLOv8-P2S3A for football detection and YOLOv8-HWD3A for player detection. These models demonstrate superior accuracy compared to baseline detectors, achieving 79.4% and 71.1% validation average precision, respectively, while maintaining low computational latency. Team identification is accomplished through unsupervised DBSCAN clustering on jersey color features, enabling robust and label-free team assignment across diverse match scenarios. Object trajectories are maintained via the Norfair multi-object tracking algorithm, and a temporally aware refinement module ensures accurate estimation of ball possession durations. Extensive experiments were conducted on a dataset comprising 20 full-match Video clips. The proposed system achieved a root mean square error (RMSE) of 4.87 in possession estimation, outperforming all evaluated baselines, including YOLOv10n (RMSE: 5.12) and YOLOv11 (RMSE: 5.17), with a substantial improvement over YOLOv6n (RMSE: 12.73). These results substantiate the effectiveness of the proposed framework in enhancing the precision, efficiency, and automation of football analytics, offering practical value for coaches, analysts, and sports scientists in professional settings. Full article
Show Figures

Figure 1

22 pages, 4204 KB  
Article
SAM2-Dehaze: Fusing High-Quality Semantic Priors with Convolutions for Single-Image Dehazing
by Sen Li, Jianchao Wang and Zhanqiang Huo
Sensors 2025, 25(22), 7097; https://doi.org/10.3390/s25227097 - 20 Nov 2025
Viewed by 957
Abstract
Single-image dehazing suffers from severe information loss and the under-constraint problem. The lack of high-quality robust priors leads to limited generalization ability of existing dehazing methods in real-world scenarios. To tackle this challenge, we propose a simple but effective single-image dehazing network by [...] Read more.
Single-image dehazing suffers from severe information loss and the under-constraint problem. The lack of high-quality robust priors leads to limited generalization ability of existing dehazing methods in real-world scenarios. To tackle this challenge, we propose a simple but effective single-image dehazing network by fusing high-quality semantic priors extracted from Segment Anything Model 2 (SAM2) with different types of advanced convolutions, abbreviated SAM2-Dehaze, which follows the U-Net architecture and consists of five stages. Specifically, we first employ the superior semantic perception and cross-domain generalization capabilities of SAM2 to generate accurate structural semantic masks. Then, a dual-branch Semantic Prior Fusion Block is designed to enable deep collaboration between the structural semantic masks and hazy image features at each stage of the U-Net. Furthermore, to avoid the drawbacks of feature redundancy and neglect of high-frequency information in traditional convolution, we have designed a novel parallel detail-enhanced and compression convolution that combines the advantages of standard convolution, difference convolution, and reconstruction convolution to replace the traditional convolution at each stage of the U-Net. Finally, a Semantic Alignment Block is incorporated into the post-processing phase to ensure semantic consistency and visual naturalness in the final dehazed result. Extensive quantitative and qualitative experiments demonstrate that SAM2-Dehaze outperforms existing dehazing methods on several synthetic and real-world foggy-image benchmarks, and exhibits excellent generalization ability. Full article
Show Figures

Figure 1

21 pages, 9744 KB  
Article
MsGf: A Lightweight Self-Supervised Monocular Depth Estimation Framework with Multi-Scale Feature Extraction
by Xinxing Tian, Zhilin He, Yawei Zhang, Fengkai Liu and Tianhao Gu
Sensors 2025, 25(20), 6380; https://doi.org/10.3390/s25206380 - 16 Oct 2025
Cited by 1 | Viewed by 1753
Abstract
Monocular depth estimation is an essential component in computer vision that enables 3D scene understanding, with critical applications in autonomous driving and augmented reality. This paper proposes a lightweight self-supervised framework from single RGB images for multi-scale feature extraction and artifact elimination in [...] Read more.
Monocular depth estimation is an essential component in computer vision that enables 3D scene understanding, with critical applications in autonomous driving and augmented reality. This paper proposes a lightweight self-supervised framework from single RGB images for multi-scale feature extraction and artifact elimination in monocular depth estimation (MsGf). The proposed framework first designs a Cross-Dimensional Multi-scale Feature Extraction (CDMs) module. The CDMs module combines parallel multi-scale convolution with sequential feature convolutions to achieve multi-scale feature extraction with minimal parameters. Additionally, a Sobel Edge Perception-Guided Filtering (SEGF) module is proposed. The SEGF module uses the Sobel operator to decompose the features into horizontal direction features and vertical direction features, and then generates the filter kernel through two steps of filtering to effectively suppress artifacts and better capture structural and edge features. A large number of ablation experiments and comparative experiments on the KITTI and Make3D datasets demonstrate that the MsGf with only 0.8 M parameters can achieve better performance than the current most advanced methods. Full article
Show Figures

Figure 1

27 pages, 28041 KB  
Article
A Unified GAN-Based Framework for Unsupervised Video Anomaly Detection Using Optical Flow and RGB Cues
by Seung-Hun Kang and Hyun-Soo Kang
Sensors 2025, 25(18), 5869; https://doi.org/10.3390/s25185869 - 19 Sep 2025
Cited by 1 | Viewed by 1844
Abstract
Video anomaly detection in unconstrained environments remains a fundamental challenge due to the scarcity of labeled anomalous data and the diversity of real-world scenarios. To address this, we propose a novel unsupervised framework that integrates RGB appearance and optical flow motion via a [...] Read more.
Video anomaly detection in unconstrained environments remains a fundamental challenge due to the scarcity of labeled anomalous data and the diversity of real-world scenarios. To address this, we propose a novel unsupervised framework that integrates RGB appearance and optical flow motion via a unified GAN-based architecture. The generator features a dual encoder and a GRU–attention temporal bottleneck, while the discriminator employs ConvLSTM layers and residual-enhanced MLPs to evaluate temporal coherence. To improve training stability and reconstruction quality, we introduce DASLoss—a composite loss that incorporates pixel, perceptual, temporal, and feature consistency terms. Experiments were conducted on three benchmark datasets. On XD-Violence, our model achieves an Average Precision (AP) of 80.5%, outperforming other unsupervised methods such as MGAFlow and Flashback. On Hockey Fight, it achieves an AUC of 0.92 and an F1-score of 0.85, demonstrating strong performance in detecting short-duration violent events. On UCSD Ped2, our model attains an AUC of 0.96, matching several state-of-the-art models despite using no supervision. These results confirm the effectiveness and generalizability of our approach in diverse anomaly detection settings. Full article
Show Figures

Figure 1

Back to TopTop