sensors-logo

Journal Browser

Journal Browser

Object Detection and Recognition Based on Deep Learning

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (10 December 2025) | Viewed by 51370

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information, Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, 84084 Salerno, Italy
Interests: artificial intelligence for smart infrastructures
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
DIEM, University of Salerno, 84084 Salerno, Italy
Interests: computer vision; cognitive robotics

Special Issue Information

Dear Colleagues,

In recent years, there has been a rapid and successful expansion of computer vision research in several application fields, ranging from intelligent video surveillance and cognitive robotics to automatic inspection and autonomous vehicle driving. Within this contest, object detection and recognition are among those areas that have seen great progress in recent years. The intended use of object detection and recognition is to determine the location of an object of interest in each time instant and the class to which the object belongs.

Deep neural networks (DNNs) have recently emerged as a type of powerful machine-learning model with the ability to learn powerful object representations/models without the need to manually design features. In fact, algorithms for object detection are strictly dependent on acquisition devices (such as RGB cameras, thermal devices, infrared devices, cloud points from lidar, and multi/hyper-spectral devices), as well as the availability of data acquired with that specific sensor type.

The aim of this Special Issue of Sensors is to provide some perspective on object detection and recognition research. It will be dedicated to highlighting both theoretical and practical aspects of object detection; applications requiring objects with detection and recognition algorithms, such as crowd counting, flame and smoke detection, or obstacle detection in both autonomous vehicle driving and smart transportation domains; and zero-shot algorithms for object detection and recognition, e.g., based on pre-trained visual questions.

Dr. Alessia Saggese
Dr. Paolo Spagnolo
Dr. Vincenzo Carletti
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object recognition
  • thermal image analysis
  • multispectral object analysis
  • applications
  • crowd counting
  • zero-shot detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

4 pages, 140 KB  
Editorial
Object Detection and Recognition Based on Deep Learning
by Vincenzo Carletti, Alessia Saggese and Paolo Spagnolo
Sensors 2026, 26(7), 2200; https://doi.org/10.3390/s26072200 - 2 Apr 2026
Viewed by 451
Abstract
Object detection and recognition have undergone a profound transformation over the past decade, driven by the rapid evolution of deep learning architectures, the availability of large annotated datasets, and the increasing computational power of modern hardware [...] Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)

Research

Jump to: Editorial, Review

23 pages, 3940 KB  
Article
Research on Enhancing Fire Detection Performance in Ancient Architecture Under Occlusion Scenarios Based on YOLO-AR
by Chen Li, Minghan Wang, Lei Lei, Honghui Liu, Kaiyin Gao and Zuoyi Wang
Sensors 2026, 26(4), 1357; https://doi.org/10.3390/s26041357 - 20 Feb 2026
Viewed by 615
Abstract
Fire detection in ancient architecture presents significant challenges due to complex scenes and unique structural characteristics. Traditional detection methods often demonstrate limitations when addressing the specific structural idiosyncrasies of individual ancient buildings and the overlapping occlusion prevalent in architectural complexes. This paper proposes [...] Read more.
Fire detection in ancient architecture presents significant challenges due to complex scenes and unique structural characteristics. Traditional detection methods often demonstrate limitations when addressing the specific structural idiosyncrasies of individual ancient buildings and the overlapping occlusion prevalent in architectural complexes. This paper proposes YOLO-AR, a novel fire detection algorithm based on an improved YOLOv8 framework. By embedding the Convolutional Block Attention Module (CBAM) at the end of the backbone network, the algorithm enhances its capability to capture key features of flames and smoke. Furthermore, the Repulsion Loss function is introduced to explicitly optimize bounding box localization accuracy in occluded and dense scenarios. Experiments conducted on a self-constructed ancient architecture dataset comprising 15,847 images demonstrate that YOLO-AR outperforms mainstream comparative algorithms in terms of Precision, Recall, and mean Average Precision (mAP). Specifically, the detection precision reached 90.7%, and the recall rate improved to 89.7%. This study provides an efficient and reliable visual detection solution for early warning systems in ancient architecture, offering significant value for cultural heritage preservation. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

30 pages, 4364 KB  
Article
Research on an Automatic Solution Method for Plane Frames Based on Computer Vision
by Dejiang Wang and Shuzhe Fan
Sensors 2026, 26(4), 1299; https://doi.org/10.3390/s26041299 - 17 Feb 2026
Viewed by 511
Abstract
In the internal force analysis of plane frames, traditional mechanics solutions require the cumbersome derivation of equations and complex numerical calculations, a process that is both time-consuming and error-prone. While general-purpose Finite Element Analysis (FEA) software offers rapid and precise calculations, it is [...] Read more.
In the internal force analysis of plane frames, traditional mechanics solutions require the cumbersome derivation of equations and complex numerical calculations, a process that is both time-consuming and error-prone. While general-purpose Finite Element Analysis (FEA) software offers rapid and precise calculations, it is limited by tedious modeling pre-processing and a steep learning curve, making it difficult to meet the demand for rapid and intelligent solutions. To address these challenges, this paper proposes a deep learning-based automatic solution method for plane frames, enabling the extraction of structural information from printed plane structural schematics and automatically completing the internal force analysis and visualization. First, images of printed plane frame schematics are captured using a smartphone, followed by image pre-processing steps such as rectification and enhancement. Second, the YOLOv8 algorithm is utilized to detect and recognize the plane frame, obtaining structural information including node coordinates, load parameters, and boundary constraints. Finally, the extracted data is input into a static analysis program based on the Matrix Displacement Method to calculate the internal forces of nodes and elements, and to generate the internal force diagrams of the frame. This workflow was validated using structural mechanics problem sets and the analysis of a double-span portal frame structure. Experimental results demonstrate that the detection accuracy of structural primitives reached 99.1%, and the overall solution accuracy of mechanical problems in the final test set exceeded 90%, providing a more convenient and efficient computational method for the analysis of plane frames. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

26 pages, 5409 KB  
Article
Geometric Monitoring of Steel Structures Using Terrestrial Laser Scanning and Deep Learning
by João Ventura, Jorge Magalhães, Tomás Jorge, Pedro Oliveira, Ricardo Santos, Rafael Cabral, Liliana Araújo, Rodrigo Falcão Moreira, Rosário Oliveira and Diogo Ribeiro
Sensors 2026, 26(3), 831; https://doi.org/10.3390/s26030831 - 27 Jan 2026
Viewed by 769
Abstract
Ensuring the quality and structural stability of industrial steel buildings requires precise geometric control during the execution stage, in accordance with assembly standards defined by EN 1090-2:2020. In this context, this work proposes a methodology that enables the automatic detection of geometric deviations [...] Read more.
Ensuring the quality and structural stability of industrial steel buildings requires precise geometric control during the execution stage, in accordance with assembly standards defined by EN 1090-2:2020. In this context, this work proposes a methodology that enables the automatic detection of geometric deviations by comparing the intended design with the actual as-built structure using a Terrestrial Laser Scanner. The integrated pipeline processes the 3D point cloud of the asset by projecting it into 2D images, on which a YOLOv8 segmentation model is trained to detect, classify and segment commercial steel cross-sections. Its application demonstrated improved identification and geometric representation of cross-sections, even in cases of incomplete or partially occluded geometries. To enhance generalisation, synthetic 3D data augmentation was applied, yielding promising results with segmentation metrics measured by mAp@50-95 reaching 70.20%. The methodology includes a systematic segmentation-based filtering step, followed by the computation of Oriented Bounding Boxes to quantify both positional and angular displacements. The effectiveness of the methodology was demonstrated in two field applications during the assembly of industrial steel structures. The results confirm the method’s effectiveness, achieving up to 94% of structural elements assessed in real assemblies, with 97% valid segmentations enabling reliable geometric verification under the standards. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

20 pages, 7462 KB  
Article
One-Dimensional Convolutional Neural Network for Object Recognition Through Electromagnetic Backscattering in the Frequency Domain
by Mohammad Hossein zadeh, Marina Barbiroli, Simone Del Prete and Franco Fuschini
Sensors 2025, 25(22), 6809; https://doi.org/10.3390/s25226809 - 7 Nov 2025
Viewed by 1193
Abstract
Over the last few decades, the item recognition problem has been mostly addressed through radar techniques or computer vision algorithms. While signal/image processing has mainly fueled the recognition process in the past, machine/deep learning methods have recently stepped in, to the extent that [...] Read more.
Over the last few decades, the item recognition problem has been mostly addressed through radar techniques or computer vision algorithms. While signal/image processing has mainly fueled the recognition process in the past, machine/deep learning methods have recently stepped in, to the extent that they nowadays represent the state-of-the-art methodology. In particular, Convolutional Neural Networks are spreading worldwide as effective tools for image-based object recognition. Nevertheless, the images used to feed vision-based algorithms may not be available in some cases, and/or may have poor quality. Furthermore, they can also pose privacy issues. For these reasons, this paper investigates a novel machine learning object recognition approach based on electromagnetic backscattering in the frequency domain. In particular, a 1D Convolutional Neural Network is employed to map the collected, backscattered signals onto two classes of objects. The experimental framework is aimed at data collection through backscattering measurements in the mmWave band with signal generators and spectrum analyzers in controlled environments to ensure data reliability. Results show that the proposed method achieves 100% accuracy in object detection and 84% accuracy in object recognition. This performance makes electromagnetic-based object recognition systems a possible solution to complement vision-based techniques, or even to replace them when they turn out impractical. The findings also reveal a trade-off between accuracy and processing speed when varying signal bandwidths and frequency steps, making this approach flexible and possibly suitable for real-time applications. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

23 pages, 2255 KB  
Article
Design and Implementation of a YOLOv2 Accelerator on a Zynq-7000 FPGA
by Huimin Kim and Tae-Kyoung Kim
Sensors 2025, 25(20), 6359; https://doi.org/10.3390/s25206359 - 14 Oct 2025
Cited by 3 | Viewed by 2551
Abstract
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power [...] Read more.
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power alternative. However, their efficient implementation requires architecture-level optimization tailored to limited device resources. This study presents an optimized YOLOv2 accelerator for the Zynq-7000 system-on-chip (SoC). The design employs 16-bit integer quantization, a filter reuse structure, an input feature map reuse scheme using a line buffer, and tiling parameter optimization for the convolution and max pooling layers to maximize resource efficiency. In addition, a stall-based control mechanism is introduced to prevent structural hazards in the pipeline. The proposed accelerator was implemented on the Zynq-7000 SoC board, and a system-level evaluation confirmed a negligible accuracy drop of only 0.2% compared with the 32-bit floating-point baseline. Compared with previous YOLO accelerators on the same SoC, the design achieved up to 26% and 15% reductions in flip-flop and digital signal processor usage, respectively. This result demonstrates feasible deployment on XC7Z020 with DSP 57.27% and FF 16.55% utilization. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

13 pages, 1697 KB  
Article
A Real-Time Vision-Based Adaptive Follow Treadmill for Animal Gait Analysis
by Guanghui Li, Salif Komi, Jakob Fleng Sorensen and Rune W. Berg
Sensors 2025, 25(14), 4289; https://doi.org/10.3390/s25144289 - 9 Jul 2025
Cited by 1 | Viewed by 2221
Abstract
Treadmills are a convenient tool to study animal gait and behavior. Traditional animal treadmill designs often entail preset speeds and therefore have reduced adaptability to animals’ dynamic behavior, thus restricting the experimental scope. Fortunately, advancements in computer vision and automation allow circumvention of [...] Read more.
Treadmills are a convenient tool to study animal gait and behavior. Traditional animal treadmill designs often entail preset speeds and therefore have reduced adaptability to animals’ dynamic behavior, thus restricting the experimental scope. Fortunately, advancements in computer vision and automation allow circumvention of these limitations. Here, we introduce a series of real-time adaptive treadmill systems utilizing both marker-based visual fiducial systems (colored blocks or AprilTags) and marker-free (pre-trained models) tracking methods powered by advanced computer vision to track experimental animals. We demonstrate their real-time object recognition capabilities in specific tasks by conducting practical tests and highlight the performance of the marker-free method using an object detection machine learning algorithm (FOMO MobileNetV2 network), which shows high robustness and accuracy in detecting a moving rat compared to the marker-based method. The combination of this computer vision system together with treadmill control overcome the issues of traditional treadmills by enabling the adjustment of belt speed and direction based on animal movement. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Graphical abstract

17 pages, 6780 KB  
Article
A Metric Learning-Based Improved Oriented R-CNN for Wildfire Detection in Power Transmission Corridors
by Xiaole Wang, Bo Wang, Peng Luo, Leixiong Wang and Yurou Wu
Sensors 2025, 25(13), 3882; https://doi.org/10.3390/s25133882 - 22 Jun 2025
Cited by 2 | Viewed by 1146
Abstract
Wildfire detection in power transmission corridors is essential for providing timely warnings and ensuring the safe and stable operation of power lines. However, this task faces significant challenges due to the large number of smoke-like samples in the background, the complex and diverse [...] Read more.
Wildfire detection in power transmission corridors is essential for providing timely warnings and ensuring the safe and stable operation of power lines. However, this task faces significant challenges due to the large number of smoke-like samples in the background, the complex and diverse target morphologies, and the difficulty of detecting small-scale smoke and flame objects. To address these issues, this paper proposed an improved Oriented R-CNN model enhanced with metric learning for wildfire detection in power transmission corridors. Specifically, a multi-center metric loss (MCM-Loss) module based on metric learning was introduced to enhance the model’s ability to differentiate features of similar targets, thereby improving the recognition accuracy in the presence of interference. Experimental results showed that the introduction of the MCM-Loss module increased the average precision (AP) for smoke targets by 2.7%. In addition, the group convolution-based network ResNeXt was adopted to replace the original backbone network ResNet, broadening the channel dimensions of the feature extraction network and enhancing the model’s capability to detect flame and smoke targets with diverse morphologies. This substitution led to a 0.6% improvement in mean average precision (mAP). Furthermore, an FPN-CARAFE module was designed by incorporating the content-aware up-sampling operator CARAFE, which improved multi-scale feature representation and significantly boosted performance in detecting small targets. In particular, the proposed FPN-CARAFE module improved the AP for fire targets by 8.1%. Experimental results demonstrated that the proposed model achieved superior performance in wildfire detection within power transmission corridors, achieving a mAP of 90.4% on the test dataset—an improvement of 6.4% over the baseline model. Compared with other commonly used object detection algorithms, the model developed in this study exhibited improved detection performance on the test dataset, offering research support for wildfire monitoring in power transmission corridors. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

20 pages, 5439 KB  
Article
LarGAN: A Label Auto-Rescaling Generation Adversarial Network for Rare Surface Defects
by Guan Qin, Hanxin Zhang, Ke Xu, Liaoting Pan, Lei Huang, Xuezhong Huang and Yi Wei
Sensors 2025, 25(10), 2958; https://doi.org/10.3390/s25102958 - 8 May 2025
Cited by 1 | Viewed by 1048
Abstract
Insufficient defect data significantly limits detection accuracy in continuous casting slab production. This limitation arises from the data collection in fast-paced production environments. To address this issue, we propose LarGAN, a data augmentation approach that synthesizes similar and high-quality defect data from a [...] Read more.
Insufficient defect data significantly limits detection accuracy in continuous casting slab production. This limitation arises from the data collection in fast-paced production environments. To address this issue, we propose LarGAN, a data augmentation approach that synthesizes similar and high-quality defect data from a single image. We utilize a progressive GAN framework to ensure a smooth and stable generation process, starting from low-resolution image synthesis and gradually increasing the network depth. We designed a Label Auto-Rescaling strategy to better adapt to defect data with annotation, enhancing both the quality and morphological diversity of the synthesized defects. To validate the generation results, we evaluate not only standard metrics, such as FID, SSIM, and LPIPS, but also performance, through the downstream detection model YOLOv8. Our experimental results demonstrate that the LarGAN model surpasses other single-image generation models in terms of image quality and diversity. Furthermore, the experiments reveal that the data generated by LarGAN effectively enhances the feature space of the original dataset, thereby improving the accuracy and generalization performance of the detection model. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

23 pages, 18399 KB  
Article
Channel Attention for Fire and Smoke Detection: Impact of Augmentation, Color Spaces, and Adversarial Attacks
by Usama Ejaz, Muhammad Ali Hamza and Hyun-chul Kim
Sensors 2025, 25(4), 1140; https://doi.org/10.3390/s25041140 - 13 Feb 2025
Cited by 7 | Viewed by 3167
Abstract
The prevalence of wildfires presents significant challenges for fire detection systems, particularly in differentiating fire from complex backgrounds and maintaining detection reliability under diverse environmental conditions. It is crucial to address these challenges for developing sustainable and effective fire detection systems. In this [...] Read more.
The prevalence of wildfires presents significant challenges for fire detection systems, particularly in differentiating fire from complex backgrounds and maintaining detection reliability under diverse environmental conditions. It is crucial to address these challenges for developing sustainable and effective fire detection systems. In this paper: (i) we introduce a channel-wise attention-based architecture, achieving 95% accuracy and demonstrating an improved focus on flame-specific features critical for distinguishing fire in complex backgrounds. Through ablation studies, we demonstrate that our channel-wise attention mechanism provides a significant 3–5% improvement in accuracy over the baseline and state-of-the-art fire detection models; (ii) evaluate the impact of augmentation on fire detection, demonstrating improved performance across varied environmental conditions; (iii) comprehensive evaluation across color spaces including RGB, Grayscale, HSV, and YCbCr to analyze detection reliability; and (iv) assessment of model vulnerabilities where Fast Gradient Sign Method (FGSM) perturbations significantly impact performance, reducing accuracy to 41%. Using Local Interpretable Model-Agnostic Explanations (LIME) visualization techniques, we provide insights into model decision-making processes across both standard and adversarial conditions, highlighting important considerations for fire detection applications. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

17 pages, 2953 KB  
Article
A Smart Visual Sensor for Smoke Detection Based on Deep Neural Networks
by Vincenzo Carletti, Antonio Greco, Alessia Saggese and Bruno Vento
Sensors 2024, 24(14), 4519; https://doi.org/10.3390/s24144519 - 12 Jul 2024
Cited by 12 | Viewed by 3869
Abstract
The automatic detection of smoke by analyzing the video stream acquired by traditional surveillance cameras is becoming a more and more interesting problem for the scientific community thanks to the necessity to prevent fires at the very early stages. The adoption of a [...] Read more.
The automatic detection of smoke by analyzing the video stream acquired by traditional surveillance cameras is becoming a more and more interesting problem for the scientific community thanks to the necessity to prevent fires at the very early stages. The adoption of a smart visual sensor, namely a computer vision algorithm running in real time, allows one to overcome the limitations of standard physical sensors. Nevertheless, this is a very challenging problem, due to the strong similarity of the smoke with other environmental elements like clouds, fog and dust. In addition to this challenge, data available for training deep neural networks is limited and not fully representative of real environments. Within this context, in this paper we propose a new method for smoke detection based on the combination of motion and appearance analysis with a modern convolutional neural network (CNN). Moreover, we propose a new dataset, called the MIVIA Smoke Detection Dataset (MIVIA-SDD), publicly available for research purposes; it consists of 129 videos covering about 28 h of recordings. The proposed hybrid method, trained and evaluated on the proposed dataset, demonstrated to be very effective by achieving a 94% smoke recognition rate and, at the same time, a substantially lower false positive rate if compared with fully deep learning-based approaches (14% vs. 100%). Therefore, the proposed combination of motion and appearance analysis with deep learning CNNs can be further investigated to improve the precision of fire detection approaches. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

51 pages, 7618 KB  
Review
A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
by Liang Yu, Lin Tang and Lisha Mu
Sensors 2025, 25(13), 3952; https://doi.org/10.3390/s25133952 - 25 Jun 2025
Cited by 9 | Viewed by 9493
Abstract
DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low [...] Read more.
DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low efficiency, prompting extensive research. This paper systematically reviews DETR’s technical evolution from a “problem-driven” perspective, focusing on advancements in attention mechanisms, query design, training strategies, and architectural efficiency. We also outline DETR’s applications in autonomous driving, medical imaging, and remote sensing, and its expansion to fine-grained classification and video understanding. Finally, we summarize current challenges and future directions. This “problem-driven” analysis offers researchers a comprehensive and insightful overview, aiming to fill gaps in the existing literature on DETR’s evolution and logic. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

32 pages, 451 KB  
Review
A Comprehensive Survey of Machine Learning Techniques and Models for Object Detection
by Maria Trigka and Elias Dritsas
Sensors 2025, 25(1), 214; https://doi.org/10.3390/s25010214 - 2 Jan 2025
Cited by 61 | Viewed by 22362
Abstract
Object detection is a pivotal research domain within computer vision, with applications spanning from autonomous vehicles to medical diagnostics. This comprehensive survey presents an in-depth analysis of the evolution and significant advancements in object detection, emphasizing the critical role of machine learning (ML) [...] Read more.
Object detection is a pivotal research domain within computer vision, with applications spanning from autonomous vehicles to medical diagnostics. This comprehensive survey presents an in-depth analysis of the evolution and significant advancements in object detection, emphasizing the critical role of machine learning (ML) and deep learning (DL) techniques. We explore a wide spectrum of methodologies, ranging from traditional approaches to the latest DL models, thoroughly evaluating their performance, strengths, and limitations. Additionally, the survey delves into various metrics for assessing model effectiveness, including precision, recall, and intersection over union (IoU), while addressing ongoing challenges in the field, such as managing occlusions, varying object scales, and improving real-time processing capabilities. Furthermore, we critically examine recent breakthroughs, including advanced architectures like Transformers, and discuss challenges and future research directions aimed at overcoming existing barriers. By synthesizing current advancements, this survey provides valuable insights for enhancing the robustness, accuracy, and efficiency of object detection systems across diverse and challenging applications. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

Back to TopTop