sensors-logo

Journal Browser

Journal Browser

Computer Vision for Object Detection and Tracking with Sensor-Based Applications

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 31 October 2025 | Viewed by 13682

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan
Interests: image analysis; visual computing; multimedia signal processing

E-Mail Website
Guest Editor
Institute of Data Science, National Cheng Kung University, Tainan 701, Taiwan
Interests: deep learning; image processing; computer vision; image compression; video editing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan
Interests: AI model architecture and compression; AI-based vision/audio application; machine learning; embedded systems; computer vision; digital signal processing

E-Mail Website
Guest Editor
Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliou 640301, Taiwan
Interests: artificial intelligence; autonomous vehicle; computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to explore the intersection of computer vision techniques and sensor technologies for object detection and tracking. With the rapid advancements in both computer vision and sensor technologies, there is a growing need to understand their synergistic relationship and explore their potential applications. In addition, with the rapid developments of artificial intelligence (e.g., deep learning) theories and techniques, AI-guided computer vision techniques (e.g., deep learning-based object detection) have demonstrated state-of-the-art performances in several related fields. This Special Issue seeks to bring together cutting-edge research and applications that demonstrate the integration of computer vision algorithms with various sensor technologies for robust and efficient object detection and tracking.

We invite researchers to submit original research papers, review articles, and case studies related, but not limited, to the following topics:

  • Development and optimization of computer vision algorithms for object detection and tracking based on sensor data;
  • Fusion of multiple sensor modalities (such as visual, thermal, LiDAR, radar, etc.) for enhanced object detection and tracking;
  • Sensor selection, calibration, and synchronization techniques for accurate and reliable object detection and tracking;
  • Real-time implementation and hardware acceleration of computer vision algorithms integrated with sensors;
  • Applications of computer vision and sensor fusion in autonomous vehicles, surveillance systems, robotics, and smart environments;
  • Novel sensor technologies and their impact on object detection and tracking performance;
  • Deep learning approaches for object detection and tracking using sensor data.

Dr. Li-Wei Kang
Dr. Chih-Chung Hsu
Dr. Chia-Chi Tsai
Dr. Chao-Yang Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • object tracking
  • object detection
  • sensor fusion
  • sensor technologies
  • autonomous vehicles
  • surveillance systems
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 2410 KiB  
Article
DAHD-YOLO: A New High Robustness and Real-Time Method for Smoking Detection
by Jianfei Zhang and Chengwei Jiang
Sensors 2025, 25(5), 1433; https://doi.org/10.3390/s25051433 - 26 Feb 2025
Viewed by 498
Abstract
Recent advancements in AI technologies have driven the extensive adoption of deep learning architectures for recognizing human behavioral patterns. However, the existing smoking behavior detection models based on object detection still have problems, including poor accuracy and insufficient real-time performance. Especially in complex [...] Read more.
Recent advancements in AI technologies have driven the extensive adoption of deep learning architectures for recognizing human behavioral patterns. However, the existing smoking behavior detection models based on object detection still have problems, including poor accuracy and insufficient real-time performance. Especially in complex environments, the existing models often struggle with erroneous detections and missed detections. In this paper, we introduce DAHD-YOLO, a model built upon the foundation of YOLOv8. We first designed the DBCA module to replace the bottleneck component in the backbone. The architecture integrates a diverse branch block and a contextual anchor mechanism, effectively improving the backbone network’s ability to extract features. Subsequently, at the end of the backbone, we introduce adaptive fine-grained channel attention (AFGCA) to effectively facilitate the fusion of both overarching patterns and localized details. We introduce the ECA-FPN, an improved version of the feature pyramid network, designed to refine the extraction of hierarchical information and enhance cross-scale feature interactions. The decoupled detection head is also updated via the reparameterization approach. The wise–powerful intersection over union (Wise-PIoU) is adopted as the new bounding box regression loss function, resulting in quicker convergence speed and improved detection outcomes. Our system achieves superior results compared to existing models using a self-constructed smoking detection dataset, reducing computational complexity by 23.20% while trimming the model parameters by 33.95%. Moreover, the mAP50 of our model has increased by 5.1% compared to the benchmark model, reaching 86.0%. Finally, we deploy the improved model on the RK3588. After optimizations such as quantization and multi-threading, the system achieves a detection rate of 50.2 fps, addressing practical application demands and facilitating the precise and instantaneous identification of smoking activities. Full article
Show Figures

Figure 1

18 pages, 7785 KiB  
Article
Research on Defect Detection in Lightweight Photovoltaic Cells Using YOLOv8-FSD
by Chao Chen, Zhuo Chen, Hao Li, Yawen Wang, Guangzhou Lei and Lingling Wu
Sensors 2025, 25(3), 843; https://doi.org/10.3390/s25030843 - 30 Jan 2025
Viewed by 1215
Abstract
Given the high computational complexity and poor real-time performance of current photovoltaic cell surface defect detection methods, this study proposes a lightweight model, YOLOv8-FSD, based on YOLOv8. By introducing the FasterNet network to replace the original backbone network, computational complexity and memory access [...] Read more.
Given the high computational complexity and poor real-time performance of current photovoltaic cell surface defect detection methods, this study proposes a lightweight model, YOLOv8-FSD, based on YOLOv8. By introducing the FasterNet network to replace the original backbone network, computational complexity and memory access are reduced. A thin neck structure designed based on hybrid convolution technology is adopted to reduce model parameters and computational load further. A lightweight dynamic feature upsampling operator improves the feature map quality. Additionally, the regularized Gaussian distribution distance loss function is used to enhance the detection ability for small target defects. Experimental results show that the YOLOv8-FSD lightweight algorithm improves detection accuracy while significantly reducing the number of parameters and computational requirements compared to the original algorithm. This improvement provides an efficient, accurate, and lightweight solution for PV cell defect detection. Full article
Show Figures

Figure 1

13 pages, 839 KiB  
Article
An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification
by Jiale Wang, Jin Lu, Junpo Yang, Meijia Wang and Weichuan Zhang
Sensors 2024, 24(23), 7737; https://doi.org/10.3390/s24237737 - 3 Dec 2024
Cited by 1 | Viewed by 884
Abstract
Few-shot fine-grained image classification (FSFGIC) aims to classify subspecies with similar appearances under conditions of very limited data. In this paper, we observe an interesting phenomenon: different types of image data augmentation techniques have varying effects on the performance of FSFGIC methods. This [...] Read more.
Few-shot fine-grained image classification (FSFGIC) aims to classify subspecies with similar appearances under conditions of very limited data. In this paper, we observe an interesting phenomenon: different types of image data augmentation techniques have varying effects on the performance of FSFGIC methods. This indicates that there may be biases in the features extracted from the input images. The bias of the acquired feature may cause deviation in the calculation of similarity, which is particularly detrimental to FSFGIC tasks characterized by low inter-class variation and high intra-class variation, thus affecting the classification accuracy. To address the problems mentioned, we propose an unbiased feature estimation network. The designed network has the capability to significantly optimize the quality of the obtained feature representations and effectively reduce the feature bias from input images. Furthermore, our proposed architecture can be easily integrated into any contextual training mechanism. Extensive experiments on the FSFGIC tasks demonstrate the effectiveness of the proposed algorithm, showing a notable improvement in classification accuracy. Full article
Show Figures

Figure 1

12 pages, 34384 KiB  
Article
Improved Small Object Detection Algorithm CRL-YOLOv5
by Zhiyuan Wang, Shujun Men, Yuntian Bai, Yutong Yuan, Jiamin Wang, Kanglei Wang and Lei Zhang
Sensors 2024, 24(19), 6437; https://doi.org/10.3390/s24196437 - 4 Oct 2024
Cited by 3 | Viewed by 2334
Abstract
Detecting small objects in images poses significant challenges due to their limited pixel representation and the difficulty in extracting sufficient features, often leading to missed or false detections. To address these challenges and enhance detection accuracy, this paper presents an improved small object [...] Read more.
Detecting small objects in images poses significant challenges due to their limited pixel representation and the difficulty in extracting sufficient features, often leading to missed or false detections. To address these challenges and enhance detection accuracy, this paper presents an improved small object detection algorithm, CRL-YOLOv5. The proposed approach integrates the Convolutional Block Attention Module (CBAM) attention mechanism into the C3 module of the backbone network, which enhances the localization accuracy of small objects. Additionally, the Receptive Field Block (RFB) module is introduced to expand the model’s receptive field, thereby fully leveraging contextual information. Furthermore, the network architecture is restructured to include an additional detection layer specifically for small objects, allowing for deeper feature extraction from shallow layers. When tested on the VisDrone2019 small object dataset, CRL-YOLOv5 achieved an mAP50 of 39.2%, representing a 5.4% improvement over the original YOLOv5, effectively boosting the detection precision for small objects in images. Full article
Show Figures

Figure 1

16 pages, 483 KiB  
Article
Query-Based Object Visual Tracking with Parallel Sequence Generation
by Chang Liu, Bin Zhang, Chunjuan Bo and Dong Wang
Sensors 2024, 24(15), 4802; https://doi.org/10.3390/s24154802 - 24 Jul 2024
Viewed by 1038
Abstract
Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In [...] Read more.
Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS. Full article
Show Figures

Figure 1

29 pages, 4407 KiB  
Article
Few-Shot Object Detection in Remote Sensing Images via Data Clearing and Stationary Meta-Learning
by Zijiu Yang, Wenbin Guan, Luyang Xiao and Honggang Chen
Sensors 2024, 24(12), 3882; https://doi.org/10.3390/s24123882 - 15 Jun 2024
Viewed by 2344
Abstract
Nowadays, the focus on few-shot object detection (FSOD) is fueled by limited remote sensing data availability. In view of various challenges posed by remote sensing images (RSIs) and FSOD, we propose a meta-learning-based Balanced Few-Shot Object Detector (B-FSDet), built upon YOLOv9 (GELAN-C version). [...] Read more.
Nowadays, the focus on few-shot object detection (FSOD) is fueled by limited remote sensing data availability. In view of various challenges posed by remote sensing images (RSIs) and FSOD, we propose a meta-learning-based Balanced Few-Shot Object Detector (B-FSDet), built upon YOLOv9 (GELAN-C version). Firstly, addressing the problem of incompletely annotated objects that potentially breaks the balance of the few-shot principle, we propose a straightforward yet efficient data clearing strategy, which ensures balanced input of each category. Additionally, considering the significant variance fluctuations in output feature vectors from the support set that lead to reduced effectiveness in accurately representing object information for each class, we propose a stationary feature extraction module and corresponding stationary and fast prediction method, forming a stationary meta-learning mode. In the end, in consideration of the issue of minimal inter-class differences in RSIs, we propose inter-class discrimination support loss based on the stationary meta-learning mode to ensure the information provided for each class from the support set is balanced and easier to distinguish. Our proposed detector’s performance is evaluated on the DIOR and NWPU VHR-10.v2 datasets, and comparative analysis with state-of-the-art detectors reveals promising performance. Full article
Show Figures

Figure 1

15 pages, 6519 KiB  
Article
FF-HPINet: A Flipped Feature and Hierarchical Position Information Extraction Network for Lane Detection
by Xiaofeng Zhou and Peng Zhang
Sensors 2024, 24(11), 3502; https://doi.org/10.3390/s24113502 - 29 May 2024
Viewed by 959
Abstract
Effective lane detection technology plays an important role in the current autonomous driving system. Although deep learning models, with their intricate network designs, have proven highly capable of detecting lanes, there persist key areas requiring attention. Firstly, the symmetry inherent in visuals captured [...] Read more.
Effective lane detection technology plays an important role in the current autonomous driving system. Although deep learning models, with their intricate network designs, have proven highly capable of detecting lanes, there persist key areas requiring attention. Firstly, the symmetry inherent in visuals captured by forward-facing automotive cameras is an underexploited resource. Secondly, the vast potential of position information remains untapped, which can undermine detection precision. In response to these challenges, we propose FF-HPINet, a novel approach for lane detection. We introduce the Flipped Feature Extraction module, which models pixel pairwise relationships between the flipped feature and the original feature. This module allows us to capture symmetrical features and obtain high-level semantic feature maps from different receptive fields. Additionally, we design the Hierarchical Position Information Extraction module to meticulously mine the position information of the lanes, vastly improving target identification accuracy. Furthermore, the Deformable Context Extraction module is proposed to distill vital foreground elements and contextual nuances from the surrounding environment, yielding focused and contextually apt feature representations. Our approach achieves excellent performance with the F1 score of 97.00% on the TuSimple dataset and 76.84% on the CULane dataset. Full article
Show Figures

Figure 1

18 pages, 14743 KiB  
Article
Large Span Sizes and Irregular Shapes Target Detection Methods Using Variable Convolution-Improved YOLOv8
by Yan Gao, Wei Liu, Hsiang-Chen Chui and Xiaoming Chen
Sensors 2024, 24(8), 2560; https://doi.org/10.3390/s24082560 - 17 Apr 2024
Cited by 4 | Viewed by 1993
Abstract
In this work, an object detection method using variable convolution-improved YOLOv8 is proposed to solve the problem of low accuracy and low efficiency in detecting spanning and irregularly shaped samples. Aiming at the problems of the irregular shape of a target, the low [...] Read more.
In this work, an object detection method using variable convolution-improved YOLOv8 is proposed to solve the problem of low accuracy and low efficiency in detecting spanning and irregularly shaped samples. Aiming at the problems of the irregular shape of a target, the low resolution of labeling frames, dense distribution, and the ease of overlap, a deformable convolution module is added to the original backbone network. This allows the model to deal flexibly with the problem of the insufficient perceptual field of the target corresponding to the detection point, and the situations of leakage and misdetection can be effectively improved. In order to solve the issue that small target detection is susceptible to image background and noise interference, the Sim-AM (simple parameter-free attention mechanism) module is added to the backbone network of YOLOv8, which enhances the attention to the underlying features and, thus, improves the detection accuracy of the model. More importantly, the Sim-AM module does not need to add parameters to the original network, which reduces the computation of the model. To address the problem of complex model structures that can lead to slower detection, the spatial pyramid pooling of the backbone network is replaced with focal modulation networks, which greatly simplifies the computation process. The experimental validation was carried out on the scrap steel dataset containing a large number of targets of multiple shapes and sizes. The results showed that the improved YOLOv8 network model improves the AP (average precision) by 2.1%, the mAP (mean average precision value) by 0.8%, and reduces the FPS (frames per second) by 5.4, which meets the performance requirements of real-time industrial inspection. Full article
Show Figures

Figure 1

20 pages, 7586 KiB  
Article
CenterADNet: Infrared Video Target Detection Based on Central Point Regression
by Jiaqi Sun, Ming Wei, Jiarong Wang, Ming Zhu, Huilan Lin, Haitao Nie and Xiaotong Deng
Sensors 2024, 24(6), 1778; https://doi.org/10.3390/s24061778 - 9 Mar 2024
Cited by 1 | Viewed by 1568
Abstract
Infrared video target detection is a fundamental technology within infrared warning and tracking systems. In long-distance infrared remote sensing images, targets often manifest as circular spots or even single points. Due to the weak and similar characteristics of the target to the background [...] Read more.
Infrared video target detection is a fundamental technology within infrared warning and tracking systems. In long-distance infrared remote sensing images, targets often manifest as circular spots or even single points. Due to the weak and similar characteristics of the target to the background noise, the intelligent detection of these targets is extremely complex. Existing deep learning-based methods are affected by the downsampling of image features by convolutional neural networks, causing the features of small targets to almost disappear. So, we propose a new infrared video weak-target detection network based on central point regression. We focus on suppressing the image background by fusing the different features between consecutive frames with the original image features to eliminate the background’s influence. We also employ high-resolution feature preservation and incorporate a spatial–temporal attention module into the network to capture as many target features as possible and improve detection accuracy. Our method achieves superior results on the infrared image weak aircraft target detection dataset proposed by the National University of Defense Technology, as well as on the simulated dataset generated based on real-world observation. This demonstrates the efficiency of our approach for detecting weak point targets in infrared continuous images. Full article
Show Figures

Figure 1

Back to TopTop