sensors-logo

Journal Browser

Journal Browser

AI-Based Computer Vision Sensors & Systems—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 30 June 2026 | Viewed by 6530

Special Issue Editors


E-Mail Website
Guest Editor
School of Artificial Intelligence, Xidian University, Xi'an, China
Interests: visual cognitive computing; computer vision; visual big data mining; intelligent algorithms
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor Assistant
Research Institute of Electrical Communication, Tohoku University, Sendai, Miyagi, Japan
Interests: spatial mechanisms of human visual attention; size tuning; cognitive science; LLM for psychology; explainable human–AI interaction systems

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) in computer vision sensors and systems is a specialized field that encompasses both current and historical AI advancements, as well as their potential impact and future prospects within sensor technology and its applications. This Special Issue explores the innovative landscape of AI-based computer vision sensors and systems, emphasizing their transformative potential across a variety of applications. These technologies harness advanced imaging techniques to facilitate real-time analysis and intelligent decision-making. We invite researchers to submit original articles investigating the use of RGB cameras, depth cameras (e.g., LiDAR), and thermal cameras in conjunction with image processing units (GPUs, TPUs, FPGAs) and object detection frameworks (e.g., YOLO, SSD, Faster R-CNN) in areas such as environmental monitoring, healthcare imaging, autonomous navigation, and security systems. This Issue aims to highlight innovative methodologies that enhance object detection, gesture recognition, and real-time analytics, ultimately advancing the capabilities of computer vision.

Prof. Dr. Xuefeng Liang
Guest Editor

Dr. Guangyu Chen
Guest Editor Assistant

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • RGB cameras
  • depth cameras (LiDAR)
  • thermal cameras
  • image processing units (GPUs, TPUs, FPGAs)
  • YOLO (You Only Look Once)
  • gesture recognition systems
  • autonomous navigation systems
  • augmented reality (AR)
  • industrial automation
  • smart surveillance systems

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 20805 KB  
Article
A Lightweight Radar–Camera Fusion Deep Learning Model for Human Activity Recognition
by Minkyung Jeon and Sungmin Woo
Sensors 2026, 26(3), 894; https://doi.org/10.3390/s26030894 - 29 Jan 2026
Viewed by 35
Abstract
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar [...] Read more.
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar with coarse spatial cues from ultra-low-resolution camera frames. The radar stream is processed as a Range–Doppler–Time cube, where each frame is flattened and sequentially encoded using a Transformer-based temporal model to capture fine-grained micro-Doppler patterns. The visual stream employs a privacy-preserving 4×5-pixel camera input, from which a temporal sequence of difference frames is extracted and modeled with a dedicated camera Transformer encoder. The two modality-specific feature vectors—each representing the temporal dynamics of motion—are concatenated and passed through a lightweight fully connected classifier to predict human activity categories. A multimodal dataset of synchronized radar cubes and ultra-low-resolution camera sequences across 15 activity classes was constructed for evaluation. Experimental results show that the proposed fusion model achieves 98.74% classification accuracy, significantly outperforming single-modality baselines (single-radar and single-camera). Despite its performance, the entire model requires only 11 million floating-point operations (11 MFLOPs), making it highly efficient for deployment on embedded or edge devices. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
28 pages, 3652 KB  
Article
A Ground-Based Visual System for UAV Detection and Altitude Measurement Deployment and Evaluation of Ghost-YOLOv11n on Edge Devices
by Hongyu Wang, Yifeng Qu, Zheng Dang, Duosheng Wu, Mingzhu Cui, Hanqi Shi and Jintao Zhao
Sensors 2026, 26(1), 205; https://doi.org/10.3390/s26010205 - 28 Dec 2025
Viewed by 516
Abstract
The growing threat of unauthorized drones to ground-based critical infrastructure necessitates efficient ground-to-air surveillance systems. This paper proposes a lightweight framework for UAV detection and altitude measurement from a fixed ground perspective. We introduce Ghost-YOLOv11n, an optimized detector that integrates GhostConv modules into [...] Read more.
The growing threat of unauthorized drones to ground-based critical infrastructure necessitates efficient ground-to-air surveillance systems. This paper proposes a lightweight framework for UAV detection and altitude measurement from a fixed ground perspective. We introduce Ghost-YOLOv11n, an optimized detector that integrates GhostConv modules into YOLOv11n, reducing computational complexity by 12.7% while achieving 98.8% mAP0.5 on a comprehensive dataset of 8795 images. Deployed on a LuBanCat4 edge device with Rockchip RK3588S NPU acceleration, the model achieves 20 FPS. For stable altitude estimation, we employ an Extended Kalman Filter to refine measurements from a monocular ranging method based on similar-triangle geometry. Experimental results under ground monitoring scenarios show height measurement errors remain within 10% up to 30 m. This work provides a cost-effective, edge-deployable solution specifically for ground-based anti-drone applications. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

19 pages, 1656 KB  
Article
YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention
by Haiyan Wang, Guiyuan Gao, Wei Zhang, Kejing Li, Na Che, Caihua Yan and Liu Wang
Sensors 2025, 25(22), 6972; https://doi.org/10.3390/s25226972 - 14 Nov 2025
Viewed by 952
Abstract
Student classroom behavior recognition is a core research direction in intelligent education systems. Real-time analysis of students’ learning states and behavioral features through classroom monitoring provides quantitative support for teaching evaluation, classroom management, and personalized instruction, offering significant value for data-driven educational decision-making. [...] Read more.
Student classroom behavior recognition is a core research direction in intelligent education systems. Real-time analysis of students’ learning states and behavioral features through classroom monitoring provides quantitative support for teaching evaluation, classroom management, and personalized instruction, offering significant value for data-driven educational decision-making. To address the issues of low detection accuracy and severe occlusion in classroom behavior detection, this article proposes an improved YOLOv11n-based algorithm named YOLOv11-GLIDE. The model introduces a Channel Prior Convolutional Attention (CPCA) mechanism to integrate global and local feature information, enhancing feature extraction and detection performance. A scale-based dynamic loss (SD Loss) is designed to adaptively adjust the loss weights according to object scale, improving regression stability and detection accuracy. In addition, Sparse Depthwise Convolution (SPD-Conv) replaces traditional down-sampling to reduce fine-grained feature loss and computational cost. Experimental results on the SCB-Dataset3 demonstrate that YOLOv11-GLIDE achieves an excellent balance between accuracy and lightweight design. Compared with the baseline YOLOv11n, mAP@0.5 and mAP@0.5-0.95 increase by 2.5% and 7.6%, while Parameters and GFLOPS are reduced by 9.4% and 11.1%, respectively. The detection speed reaches 127.9 FPS, meeting the practical requirements of embedded classroom monitoring systems for accurate and efficient student behavior recognition. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

27 pages, 5331 KB  
Article
Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm
by Jinhyeong Lee, Junyoung Shin, Eunwoo Park and Daekeun Kim
Sensors 2025, 25(21), 6773; https://doi.org/10.3390/s25216773 - 5 Nov 2025
Viewed by 1667
Abstract
Multi-object tracking faces persistent challenges from occlusions and truncations in monocular vision systems. While stereo vision provides depth information, existing approaches require computationally expensive dense matching or 3D reconstruction. This paper presents a real-time 2.5D stereo multi-object tracking framework combining lightweight stereo matching [...] Read more.
Multi-object tracking faces persistent challenges from occlusions and truncations in monocular vision systems. While stereo vision provides depth information, existing approaches require computationally expensive dense matching or 3D reconstruction. This paper presents a real-time 2.5D stereo multi-object tracking framework combining lightweight stereo matching with resilient tracker management. The stereo matching module employs Direct Linear Transform-based triangulation using only bounding box coordinates, eliminating costly feature extraction while maintaining robust correspondence through geometric constraints. A dual-tracker architecture maintains independent trackers in both views, enabling re-identification when objects become occluded in one view but remain visible in the other. Experimental validation on a refrigerator monitoring dataset demonstrates that StereoSORT achieves a multiple object tracking accuracy (MOTA) of 0.932 and an identification F1 score (IDF1) of 0.823, substantially outperforming monocular trackers, including OC-SORT (IDF1: 0.765) and ByteTrack (IDF1: 0.609). The system achieves a 50.1 mm median depth error, comparable to commercial sensors, while maintaining 70 FPS on standard hardware. These results validate that geometric constraints alone enable robust stereo tracking without appearance features, offering a practical solution for resource-constrained environments where computational efficiency and tracking reliability are equally critical. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

18 pages, 5377 KB  
Article
M3ENet: A Multi-Modal Fusion Network for Efficient Micro-Expression Recognition
by Ke Zhao, Xuanyu Liu and Guangqian Yang
Sensors 2025, 25(20), 6276; https://doi.org/10.3390/s25206276 - 10 Oct 2025
Cited by 1 | Viewed by 1036
Abstract
Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, [...] Read more.
Micro-expression recognition (MER) aims to detect brief and subtle facial movements that reveal suppressed emotions, discerning authentic emotional responses in scenarios such as visitor experience analysis in museum settings. However, it remains a highly challenging task due to the fleeting duration, low intensity, and limited availability of annotated data. Most existing approaches rely solely on either appearance or motion cues, thereby restricting their ability to capture expressive information fully. To overcome these limitations, we propose a lightweight multi-modal fusion network, termed M3ENet, which integrates both motion and appearance cues through early-stage feature fusion. Specifically, our model extracts horizontal, vertical, and strain-based optical flow between the onset and apex frames, alongside RGB images from the onset, apex, and offset frames. These inputs are processed by two modality-specific subnetworks, whose features are fused to exploit complementary information for robust classification. To improve generalization in low data regimes, we employ targeted data augmentation and adopt focal loss to mitigate class imbalance. Extensive experiments on five benchmark datasets, including CASME I, CASME II, CAS(ME)2, SAMM, and MMEW, demonstrate that M3ENet achieves state-of-the-art performance with high efficiency. Ablation studies and Grad-CAM visualizations further confirm the effectiveness and interpretability of the proposed architecture. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

26 pages, 5861 KB  
Article
Robust Industrial Surface Defect Detection Using Statistical Feature Extraction and Capsule Network Architectures
by Azeddine Mjahad and Alfredo Rosado-Muñoz
Sensors 2025, 25(19), 6063; https://doi.org/10.3390/s25196063 - 2 Oct 2025
Cited by 1 | Viewed by 913
Abstract
Automated quality control is critical in modern manufacturing, especially for metallic cast components, where fast and accurate surface defect detection is required. This study evaluates classical Machine Learning (ML) algorithms using extracted statistical parameters and deep learning (DL) architectures including ResNet50, Capsule Networks, [...] Read more.
Automated quality control is critical in modern manufacturing, especially for metallic cast components, where fast and accurate surface defect detection is required. This study evaluates classical Machine Learning (ML) algorithms using extracted statistical parameters and deep learning (DL) architectures including ResNet50, Capsule Networks, and a 3D Convolutional Neural Network (CNN3D) using 3D image inputs. Using the Dataset Original, ML models with the selected parameters achieved high performance: RF reached 99.4 ± 0.2% precision and 99.4 ± 0.2% sensitivity, GB 96.0 ± 0.2% precision and 96.0 ± 0.2% sensitivity. ResNet50 trained with extracted parameters reached 98.0 ± 1.5% accuracy and 98.2 ± 1.7% F1-score. Capsule-based architectures achieved the best results, with ConvCapsuleLayer reaching 98.7 ± 0.2% accuracy and 100.0 ± 0.0% precision for the normal class, and 98.9 ± 0.2% F1-score for the affected class. CNN3D applied on 3D image inputs reached 88.61 ± 1.01% accuracy and 90.14 ± 0.95% F1-score. Using the Dataset Expanded with ML and PCA-selected features, Random Forest achieved 99.4 ± 0.2% precision and 99.4 ± 0.2% sensitivity, K-Nearest Neighbors 99.2 ± 0.0% precision and 99.2 ± 0.0% sensitivity, and SVM 99.2 ± 0.0% precision and 99.2 ± 0.0% sensitivity, demonstrating consistent high performance. All models were evaluated using repeated train-test splits to calculate averages of standard metrics (accuracy, precision, recall, F1-score), and processing times were measured, showing very low per-image execution times (as low as 3.69×104 s/image), supporting potential real-time industrial application. These results indicate that combining statistical descriptors with ML and DL architectures provides a robust and scalable solution for automated, non-destructive surface defect detection, with high accuracy and reliability across both the original and expanded datasets. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

19 pages, 3920 KB  
Article
HCDFI-YOLOv8: A Transmission Line Ice Cover Detection Model Based on Improved YOLOv8 in Complex Environmental Contexts
by Lipeng Kang, Feng Xing, Tao Zhong and Caiyan Qin
Sensors 2025, 25(17), 5421; https://doi.org/10.3390/s25175421 - 2 Sep 2025
Cited by 1 | Viewed by 896
Abstract
When unmanned aerial vehicles (UAVs) perform transmission line ice cover detection, it is often due to the variable shooting angle and complex background environment, which leads to difficulties such as poor ice-covering recognition accuracy and difficulty in accurately identifying the target. To address [...] Read more.
When unmanned aerial vehicles (UAVs) perform transmission line ice cover detection, it is often due to the variable shooting angle and complex background environment, which leads to difficulties such as poor ice-covering recognition accuracy and difficulty in accurately identifying the target. To address these issues, this study proposes an improved icing detection model based on HCDFI–You Only Look Once version 8 (HCDFI-YOLOv8). First, a cross-dense hybrid (CDH) parallel heterogeneous convolutional module is proposed, which can not only improve the detection accuracy of the model, but also effectively alleviate the problem of the surge in the number of floating-point operations during the improvement of the model. Second, deep and shallow feature weighted fusion using improved CSPDarknet53 to 2-Stage FPN_Dynamic Feature Fusion (C2f_DFF) module is proposed to reduce feature loss in neck networks. Third, optimization of the detection head using the feature adaptive spatial feature fusion (FASFF) detection head module is performed to enhance the model’s ability to extract features at different scales. Finally, a new inner-complete intersection over union (Inner_CIoU) loss function is introduced to solve the contradiction of the CIOU loss function used in the original YOLOv8. Experimental results demonstrate that the proposed HCDFI-YOLOv8 model achieves a 2.7% improvement in mAP@0.5 and a 2.5% improvement in mAP@0.5:0.95 compared to standard YOLOv8. Among twelve models for icing detection, the proposed model delivers the highest overall detection accuracy. The accuracy of the HCDFI-YOLOv8 model in detecting complex transmission line environments is verified and effective technical support is provided for transmission line ice cover detection. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

Back to TopTop