Submit to Special Issue Submit Abstract to Special Issue Review for Mathematics Propose a Special Issue

Journal Menu

Journal Browser

Advanced Methods and Applications with Deep Learning in Object Recognition

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: 31 October 2025 | Viewed by 34336

Share This Special Issue

Special Issue Editors

Prof. Dr. Jesús García-Herrero

E-Mail Website
Guest Editor

Computer Science Department, Universidad Carlos III de Madrid, Colmenarejo, Spain
Interests: information fusion; artificial intelligence; machine vision; autonomous vehicles
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Johan Debayle

*
Website
Guest Editor

Mines Saint-Etienne, CNRS, UMR 5307 LGF, Centre SPIN, Saint-Etienne F-42023, France
Interests: adaptive image processing; pattern analysis; stochastic geometry
* We dedicate the memory of the editor, Prof. Dr. Johan Debayle, who passed away during this special issue period.

Special Issue Information

Dear Colleagues,

Object detection and recognition are central tasks in computer vision, which include the detection of objects boundaries and their classification. They have become essential in many applications, such as search and rescue, warehouse logistics, video surveillance or monitoring using UAVs, with low-resolution or blurred images usually captured due to camera motion. Additionally, the conditions may differ across different situations, making it complex to achieve general solutions; thus, fine-tuning is essential in new scenarios.

The computer vision community has adopted deep-learning models in the last decade due to their superior performance with respect to those from classical methods. These models require a high processing power (GPUs) for training with large datasets and provide inferences in real time; typically, these models employ convolutional neural networks (CNNs). They are subdivided in two types: two-shot detectors, that search with maximum accuracy with the potential cost of inference time; and one-shot detectors, which are oriented at a minimum inference time for real-time applications. Two-shot detectors are dominated by the R-CNN family (region-proposal CNNs), such as Fast R-CNN, Faster R-CNN or Cascade R-CNN solutions, while the YOLO family dominates one-shot detectors, being SSD and RetinaNet other popular algorithms in this category. Additionally, in recent years, Vision Transformers (ViTs) have also been applied to object detection and recognition tasks. ViT-based algorithms, such as DETR or YOLOS, are based on a self-attention mechanism that learns the relationships between elements of a sequence, applying the transformer architecture to image grids. ViTs make use of CNNs as a backbone for feature extraction, given their ability to automatically extract relevant features. In addition, object detection is closely related with other open challenges in machine vision such as Multi-Object Tracking (MOT), which involves both the detection and tracking of objects of interest appearing in the video sequence. The goal in this case is not only to identify and locate the objects contained in each frame, but to also associate them across frames to keep track continuity and follow their dynamics over time. This task is usually solved by combining algorithms addressing object detection and data association, and some relevant algorithms in the SORT family (Simple Online and Real-time Tracking) can be mentioned such as deepSORT, StrongSORT or OCT-Sort.

Regarding evaluation, developing fair comparisons among different solutions is complex, considering the balance between accuracy and speed, the resolution of the input images, the configuration of the evaluation parameters, etc. Analyses are based on the available benchmarks and datasets, which are necessary to evaluate the performance of different architectures and configurations. In this sense, many authors have identified class imbalance as an additional challenge to achieving a high accuracy. In this sense, other deep-learning architectures, such as GAN or autoencoders, can be combined with detectors to enhance the training phase, increasing the size and variety of the datasets, for instance, to improve the detection of very small objects. Additionally, learning can be improved for imbalanced situations, adapting the loss function to focus learning on hard examples and avoid a bias towards numerous negative examples.

This Special Issue is aimed at contributions focused on these topics, showing the capability of novel mathematical algorithms, architectures and methods to improve the object detection and recognition tasks, with the possibility of multi-object tracking, with an emphasis in new solutions and analysis of their performance in challenging conditions in relevant applications.

Prof. Dr. Jesús García-Herrero
Prof. Dr. Johan Debayle
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

object detection and classification
multi-object tracking
deep-learning architectures
model training and evaluation
loss functions in learning
evaluation metrics and datasets
class imbalance
applications of object detection and object tracking
aerial object identification
transformers for object detection
edge ai for object detection
multi-modal object detection

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

24 pages, 16730 KiB

Open AccessFeature PaperArticle

LV-FeatEx: Large Viewpoint-Image Feature Extraction

by Yukai Wang, Yinghui Wang, Wenzhuo Li, Yanxing Liang, Liangyi Huang and Xiaojuan Ning

Mathematics 2025, 13(7), 1111; https://doi.org/10.3390/math13071111 - 27 Mar 2025

Viewed by 587

Abstract

Maintaining stable image feature extraction under viewpoint changes is challenging, particularly when the angle between the camera’s reverse direction and the object’s surface normal exceeds 40 degrees. Such conditions can result in unreliable feature detection. Consequently, this hinders the performance of vision-based systems. To address this, we propose a feature point extraction method named Large Viewpoint Feature Extraction (LV-FeatEx). Firstly, the method uses a dual-threshold approach based on image grayscale histograms and Kapur’s maximum entropy to constrain the AGAST (Adaptive and Generic Accelerated Segment Test) feature detector. Combined with the FREAK (Fast Retina Keypoint) descriptor, the method enables more effective estimation of camera motion parameters. Next, we design a longitude sampling strategy to create a sparser affine simulation model. Meanwhile, images undergo perspective transformation based on the camera motion parameters. This improves operational efficiency and aligns perspective distortions between two images, enhancing feature point extraction accuracy under large viewpoints. Finally, we verify the stability of the extracted feature points through feature point matching. Comprehensive experimental results show that, under large viewpoint changes, our method outperforms popular classical and deep learning feature extraction methods. The correct rate of feature point matching improves by an average of 40.1 percent, and speed increases by an average of 6.67 times simultaneously. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

26 pages, 19578 KiB

Open AccessArticle

Utilizing Cross-Ratios for the Detection and Correction of Missing Digits in Instrument Digit Recognition

by Jui-Hua Huang, Yong-Han Chen and Yen-Lung Tsai

Mathematics 2024, 12(11), 1669; https://doi.org/10.3390/math12111669 - 27 May 2024

Viewed by 1107

Abstract

This paper aims to enhance the existing Automatic Meter Reading (AMR) technologies for utilities in the public services sector, such as water, electricity, and gas, by allowing users to regularly upload images of their meters, which are then automatically processed by machines for digit recognition. We propose an end-to-end AMR approach designed explicitly for unconstrained environments, offering practical solutions to common failures encountered during the automatic recognition process, such as image blur, perspective distortion, partial reflection, poor lighting, missing digits, and intermediate digit states, to reduce the failure rate of automatic meter readings. The system’s first stage involves checking the quality of the user-uploaded images through the SVM method and requesting re-uploads for images unsuitable for digit extraction and recognition. The second stage employs deep learning models for digit localization and recognition, automatically detecting and correcting issues such as missing and intermediate digits to enhance the accuracy of automatic meter readings. Our research established a gas meter training dataset comprising 52,000 images, extensively annotated across various degrees, to train the deep learning models for high-precision digit recognition. Experimental results demonstrate that, with the simple SVM model, an accuracy of 87.03% is achieved for the classification of blurry image types. In addition, meter digit recognition (including intermediate digit states) can reach 97.6% (mAP), and the detection and correction of missing digits can be as high as 63.64%, showcasing the practical application value of the system developed in this study. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

17 pages, 6257 KiB

Open AccessArticle

Real-Time Motorbike Detection: AI on the Edge Perspective

by Awais Akhtar, Rehan Ahmed, Muhammad Haroon Yousaf and Sergio A. Velastin

Mathematics 2024, 12(7), 1103; https://doi.org/10.3390/math12071103 - 7 Apr 2024

Cited by 4 | Viewed by 3451

Abstract

Motorbikes are an integral part of transportation in emerging countries, but unfortunately, motorbike users are also one the most vulnerable road users (VRUs) and are engaged in a large number of yearly accidents. So, motorbike detection is very important for proper traffic surveillance, road safety, and security. Most of the work related to bike detection has been carried out to improve accuracy. If this task is not performed in real-time then it loses practical significance, but little to none has been reported for its real-time implementation. In this work, we have looked at multiple real-time deployable cost-efficient solutions for motorbike detection using various state-of-the-art embedded edge devices. This paper discusses an investigation of a proposed methodology on five different embedded devices that include Jetson Nano, Jetson TX2, Jetson Xavier, Intel Compute Stick, and Coral Dev Board. Running the highly compute-intensive object detection model on edge devices (in real-time) is made possible by optimization. As a result, we have achieved inference rates on different devices that are twice as high as GPUs, with only a marginal drop in accuracy. Secondly, the baseline accuracy of motorbike detection has been improved by developing a custom network based on YoloV5 by introducing sparsity and depth reduction. Dataset augmentation has been applied at both image and object levels to enhance robustness of detection. We have achieved 99% accuracy as compared to the previously reported 97% accuracy, with better FPS. Additionally, we have provided a performance comparison of motorbike detection on the different embedded edge devices, for practical implementation. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

28 pages, 5769 KiB

Open AccessArticle

Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis

by Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski and Daniel Sáez-Trigueros

Mathematics 2024, 12(1), 76; https://doi.org/10.3390/math12010076 - 25 Dec 2023

Cited by 5 | Viewed by 3191

Abstract

Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%, surpassing the current state-of-the-art of previous works on the same dataset by approximately 6%, its training time being 3.5 times lower than what is needed to fine-tune a YOLOv8-based model. For more application-oriented scenarios, with smaller datasets of six postures each, containing 1301 and 401 training images, the fine-tuned models attain an accuracy of 98.8% and 99.1%, respectively. Furthermore, our experiments indicate that training with as few as 20 images per pose can yield around 90% accuracy in a six-class dataset. This study demonstrates that this multimodal technique can be effectively used for yoga pose classification, and possibly for human posture classification, in general. Additionally, CLIP inference time (around 7 ms) supports that the model can be integrated into automated systems for posture evaluation, e.g., for developing a real-time personal yoga assistant for performance assessment. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

17 pages, 6799 KiB

Open AccessArticle

Automatic Recognition of Indoor Fire and Combustible Material with Material-Auxiliary Fire Dataset

by Feifei Hou, Wenqing Zhao and Xinyu Fan

Mathematics 2024, 12(1), 54; https://doi.org/10.3390/math12010054 - 23 Dec 2023

Cited by 1 | Viewed by 1879

Abstract

Early and timely fire detection within enclosed spaces notably diminishes the response time for emergency aid. Previous methods have mostly focused on singularly detecting either fire or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive understanding of indoor fire scenarios. Moreover, traditional fire load assessment methods such as empirical formula-based assessment are time-consuming and face challenges in diverse scenarios. In this paper, we collected a novel dataset of fire and materials, the Material-Auxiliary Fire Dataset (MAFD), and combined this dataset with deep learning to achieve both fire and material recognition and segmentation in the indoor scene. A sophisticated deep learning model, Dual Attention Network (DANet), was specifically designed for image semantic segmentation to recognize fire and combustible material. The experimental analysis of our MAFD database demonstrated that our approach achieved an accuracy of 84.26% and outperformed the prevalent methods (e.g., PSPNet, CCNet, FCN, ISANet, OCRNet), making a significant contribution to fire safety technology and enhancing the capacity to identify potential hazards indoors. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

16 pages, 2929 KiB

Open AccessArticle

Automatic Evaluation of Functional Movement Screening Based on Attention Mechanism and Score Distribution Prediction

by Xiuchun Lin, Tao Huang, Zhiqiang Ruan, Xuechao Yang, Zhide Chen, Guolong Zheng and Chen Feng

Mathematics 2023, 11(24), 4936; https://doi.org/10.3390/math11244936 - 12 Dec 2023

Cited by 4 | Viewed by 1793

Abstract

Functional movement screening (FMS) is a crucial testing method that evaluates fundamental movement patterns in the human body and identifies functional limitations. However, due to the inherent complexity of human movements, the automated assessment of FMS poses significant challenges. Prior methodologies have struggled to effectively capture and model critical human features in video data. To address this challenge, this paper introduces an automatic assessment approach for FMS by leveraging deep learning techniques. The proposed method harnesses an I3D network to extract spatiotemporal video features across various scales and levels. Additionally, an attention mechanism (AM) module is incorporated to enable the network to focus more on human movement characteristics, enhancing its sensitivity to diverse location features. Furthermore, the multilayer perceptron (MLP) module is employed to effectively discern intricate patterns and features within the input data, facilitating its classification into multiple categories. Experimental evaluations conducted on publicly available datasets demonstrate that the proposed approach achieves state-of-the-art performance levels. Notably, in comparison to existing state-of-the-art (SOTA) methods, this approach exhibits a marked improvement in accuracy. These results corroborate the efficacy of the I3D-AM-MLP framework, indicating its significance in extracting advanced human movement feature expressions and automating the assessment of functional movement screening. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

Review

Jump to: Research

31 pages, 1117 KiB

Open AccessReview

Traffic Sign Detection and Recognition Using YOLO Object Detection Algorithm: A Systematic Review

by Marco Flores-Calero, César A. Astudillo, Diego Guevara, Jessica Maza, Bryan S. Lita, Bryan Defaz, Juan S. Ante, David Zabala-Blanco and José María Armingol Moreno

Mathematics 2024, 12(2), 297; https://doi.org/10.3390/math12020297 - 17 Jan 2024

Cited by 61 | Viewed by 20695

Abstract

Context: YOLO (You Look Only Once) is an algorithm based on deep neural networks with real-time object detection capabilities. This state-of-the-art technology is widely available, mainly due to its speed and precision. Since its conception, YOLO has been applied to detect and recognize traffic signs, pedestrians, traffic lights, vehicles, and so on. Objective: The goal of this research is to systematically analyze the YOLO object detection algorithm, applied to traffic sign detection and recognition systems, from five relevant aspects of this technology: applications, datasets, metrics, hardware, and challenges. Method: This study performs a systematic literature review (SLR) of studies on traffic sign detection and recognition using YOLO published in the years 2016–2022. Results: The search found 115 primary studies relevant to the goal of this research. After analyzing these investigations, the following relevant results were obtained. The most common applications of YOLO in this field are vehicular security and intelligent and autonomous vehicles. The majority of the sign datasets used to train, test, and validate YOLO-based systems are publicly available, with an emphasis on datasets from Germany and China. It has also been discovered that most works present sophisticated detection, classification, and processing speed metrics for traffic sign detection and recognition systems by using the different versions of YOLO. In addition, the most popular desktop data processing hardwares are Nvidia RTX 2080 and Titan Tesla V100 and, in the case of embedded or mobile GPU platforms, Jetson Xavier NX. Finally, seven relevant challenges that these systems face when operating in real road conditions have been identified. With this in mind, research has been reclassified to address these challenges in each case. Conclusions: This SLR is the most relevant and current work in the field of technology development applied to the detection and recognition of traffic signs using YOLO. In addition, insights are provided about future work that could be conducted to improve the field. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Journal Menu

Journal Browser

Advanced Methods and Applications with Deep Learning in Object Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI