MDPI - Publisher of Open Access Journals

29 pages, 3400 KiB

Open AccessArticle

Synthetic Data Generation for Machine Learning-Based Hazard Prediction in Area-Based Speed Control Systems

by Mariusz Rychlicki and Zbigniew Kasprzyk

Appl. Sci. 2025, 15(15), 8531; https://doi.org/10.3390/app15158531 (registering DOI) - 31 Jul 2025

This work focuses on the possibilities of generating synthetic data for machine learning in hazard prediction in area-based speed monitoring systems. The purpose of the research conducted was to develop a methodology for generating realistic synthetic data to support the design of a [...] Read more.

This work focuses on the possibilities of generating synthetic data for machine learning in hazard prediction in area-based speed monitoring systems. The purpose of the research conducted was to develop a methodology for generating realistic synthetic data to support the design of a continuous vehicle speed monitoring system to minimize the risk of traffic accidents caused by speeding. The SUMO traffic simulator was used to model driver behavior in the analyzed area and within a given road network. Data from OpenStreetMap and field measurements from over a dozen speed detectors were integrated. Preliminary tests were carried out to record vehicle speeds. Based on these data, several simulation scenarios were run and compared to real-world observations using average speed, the percentage of speed limit violations, root mean square error (RMSE), and percentage compliance. A new metric, the Combined Speed Accuracy Score (CSAS), has been introduced to assess the consistency of simulation results with real-world data. For this study, a basic hazard prediction model was developed using LoRaWAN sensor network data and environmental contextual variables, including time, weather, location, and accident history. The research results in a method for evaluating and selecting the simulation scenario that best represents reality and drivers’ propensities to exceed speed limits. The results and findings demonstrate that it is possible to produce synthetic data with a level of agreement exceeding 90% with real data. Thus, it was shown that it is possible to generate synthetic data for machine learning in hazard prediction for area-based speed control systems using traffic simulators. Full article

► Show Figures

Figure 1

17 pages, 91001 KiB

Open AccessArticle

PONet: A Compact RGB-IR Fusion Network for Vehicle Detection on OrangePi AIpro

by Junyu Huang, Jialing Lian, Fangyu Cao, Jiawei Chen, Renbo Luo, Jinxin Yang and Qian Shi

Remote Sens. 2025, 17(15), 2650; https://doi.org/10.3390/rs17152650 (registering DOI) - 30 Jul 2025

Abstract

Multi-modal object detection that fuses RGB (Red-Green-Blue) and infrared (IR) data has emerged as an effective approach for addressing challenging visual conditions such as low illumination, occlusion, and adverse weather. However, most existing multi-modal detectors prioritize accuracy while neglecting computational efficiency, making them [...] Read more.

Multi-modal object detection that fuses RGB (Red-Green-Blue) and infrared (IR) data has emerged as an effective approach for addressing challenging visual conditions such as low illumination, occlusion, and adverse weather. However, most existing multi-modal detectors prioritize accuracy while neglecting computational efficiency, making them unsuitable for deployment on resource-constrained edge devices. To address this limitation, we propose PONet, a lightweight and efficient multi-modal vehicle detection network tailored for real-time edge inference. PONet incorporates Polarized Self-Attention to improve feature adaptability and representation with minimal computational overhead. In addition, a novel fusion module is introduced to effectively integrate RGB and IR modalities while preserving efficiency. Experimental results on the VEDAI dataset demonstrate that PONet achieves a competitive detection accuracy of 82.2% mAP@0.5 while sustaining a throughput of 34 FPS on the OrangePi AIpro 20T device. With only 3.76 M parameters and 10.2 GFLOPs (Giga Floating Point Operations), PONet offers a practical solution for edge-oriented remote sensing applications requiring a balance between detection precision and computational cost. Full article

► Show Figures

Figure 1

22 pages, 9071 KiB

Open AccessArticle

Integrating UAV-Based RGB Imagery with Semi-Supervised Learning for Tree Species Identification in Heterogeneous Forests

by Bingru Hou, Chenfeng Lin, Mengyuan Chen, Mostafa M. Gouda, Yunpeng Zhao, Yuefeng Chen, Fei Liu and Xuping Feng

Remote Sens. 2025, 17(15), 2541; https://doi.org/10.3390/rs17152541 - 22 Jul 2025

Viewed by 271

Abstract

The integration of unmanned aerial vehicle (UAV) remote sensing and deep learning has emerged as a highly effective strategy for inventorying forest resources. However, the spatiotemporal variability of forest environments and the scarcity of annotated data hinder the performance of conventional supervised deep-learning [...] Read more.

The integration of unmanned aerial vehicle (UAV) remote sensing and deep learning has emerged as a highly effective strategy for inventorying forest resources. However, the spatiotemporal variability of forest environments and the scarcity of annotated data hinder the performance of conventional supervised deep-learning models. To overcome these challenges, this study has developed efficient tree (ET), a semi-supervised tree detector designed for forest scenes. ET employed an enhanced YOLO model (YOLO-Tree) as a base detector and incorporated a teacher–student semi-supervised learning (SSL) framework based on pseudo-labeling, effectively leveraging abundant unlabeled data to bolster model robustness. The results revealed that SSL significantly improved outcomes in scenarios with sparse labeled data, specifically when the annotation proportion was below 50%. Additionally, employing overlapping cropping as a data augmentation strategy mitigated instability during semi-supervised training under conditions of limited sample size. Notably, introducing unlabeled data from external sites enhances the accuracy and cross-site generalization of models trained on diverse datasets, achieving impressive results with F1, mAP50, and mAP50-95 scores of 0.979, 0.992, and 0.871, respectively. In conclusion, this study highlights the potential of combining UAV-based RGB imagery with SSL to advance tree species identification in heterogeneous forests. Full article

(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)

► Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 441

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

► Show Figures

Figure 1

15 pages, 6454 KiB

Open AccessArticle

xLSTM-Based Urban Traffic Flow Prediction for Intelligent Transportation Governance

by Chung-I Huang, Jih-Sheng Chang, Jun-Wei Hsieh, Jyh-Horng Wu and Wen-Yi Chang

Appl. Sci. 2025, 15(14), 7859; https://doi.org/10.3390/app15147859 - 14 Jul 2025

Viewed by 329

Abstract

Urban traffic congestion poses persistent challenges to mobility, public safety, and governance efficiency in metropolitan areas. This study proposes an intelligent traffic flow forecasting framework based on an extended Long Short-Term Memory (xLSTM) model, specifically designed for real-time congestion prediction and proactive police [...] Read more.

Urban traffic congestion poses persistent challenges to mobility, public safety, and governance efficiency in metropolitan areas. This study proposes an intelligent traffic flow forecasting framework based on an extended Long Short-Term Memory (xLSTM) model, specifically designed for real-time congestion prediction and proactive police dispatch support. Utilizing a real-world dataset collected from over 300 vehicle detector (VD) sensors, the proposed model integrates vehicle volume, speed, and lane occupancy data at five-minute intervals. Methodologically, the xLSTM model incorporates matrix-based memory cells and exponential gating mechanisms to enhance spatio-temporal learning capabilities. Model performance is evaluated using multiple metrics, including congestion classification accuracy, F1-score, MAE, RMSE, and inference latency. The xLSTM model achieves a congestion prediction accuracy of 87.3%, an F1-score of 0.882, and an average inference latency of 41.2 milliseconds—outperforming baseline LSTM, GRU, and Transformer-based models in both accuracy and speed. These results validate the system’s suitability for real-time deployment in police control centers, where timely prediction of traffic congestion enables anticipatory patrol allocation and dynamic signal adjustment. By bridging AI-driven forecasting with public safety operations, this research contributes a validated and scalable approach to intelligent transportation governance, enhancing the responsiveness of urban mobility systems and advancing smart city initiatives. Full article

► Show Figures

Figure 1

30 pages, 4582 KiB

Open AccessReview

Review on Rail Damage Detection Technologies for High-Speed Trains

by Yu Wang, Bingrong Miao, Ying Zhang, Zhong Huang and Songyuan Xu

Appl. Sci. 2025, 15(14), 7725; https://doi.org/10.3390/app15147725 - 10 Jul 2025

Viewed by 529

Abstract

From the point of view of the intelligent operation and maintenance of high-speed train tracks, this paper examines the research status of high-speed train rail damage detection technology in the field of high-speed train track operation and maintenance detection in recent years, summarizes [...] Read more.

From the point of view of the intelligent operation and maintenance of high-speed train tracks, this paper examines the research status of high-speed train rail damage detection technology in the field of high-speed train track operation and maintenance detection in recent years, summarizes the damage detection methods for high-speed trains, and compares and analyzes different detection technologies and application research results. The analysis results show that the detection methods for high-speed train rail damage mainly focus on the research and application of non-destructive testing technology and methods, as well as testing platform equipment. Detection platforms and equipment include a new type of vortex meter, integrated track recording vehicles, laser rangefinders, thermal sensors, laser vision systems, LiDAR, new ultrasonic detectors, rail detection vehicles, rail detection robots, laser on-board rail detection systems, track recorders, self-moving trolleys, etc. The main research and application methods include electromagnetic detection, optical detection, ultrasonic guided wave detection, acoustic emission detection, ray detection, vortex detection, and vibration detection. In recent years, the most widely studied and applied methods have been rail detection based on LiDAR detection, ultrasonic detection, eddy current detection, and optical detection. The most important optical detection method is machine vision detection. Ultrasonic detection can detect internal damage of the rail. LiDAR detection can detect dirt around the rail and the surface, but the cost of this kind of equipment is very high. And the application cost is also very high. In the future, for high-speed railway rail damage detection, the damage standards must be followed first. In terms of rail geometric parameters, the domestic standard (TB 10754-2018) requires a gauge deviation of ±1 mm, a track direction deviation of 0.3 mm/10 m, and a height deviation of 0.5 mm/10 m, and some indicators are stricter than European standard EN-13848. In terms of damage detection, domestic flaw detection vehicles have achieved millimeter-level accuracy in crack detection in rail heads, rail waists, and other parts, with a damage detection rate of over 85%. The accuracy of identifying track components by the drone detection system is 93.6%, and the identification rate of potential safety hazards is 81.8%. There is a certain gap with international standards, and standards such as EN 13848 have stricter requirements for testing cycles and data storage, especially in quantifying damage detection requirements, real-time damage data, and safety, which will be the key research and development contents and directions in the future. Full article

► Show Figures

Figure 1

31 pages, 28041 KiB

Open AccessArticle

Cyberattack Resilience of Autonomous Vehicle Sensor Systems: Evaluating RGB vs. Dynamic Vision Sensors in CARLA

by Mustafa Sakhai, Kaung Sithu, Min Khant Soe Oke and Maciej Wielgosz

Appl. Sci. 2025, 15(13), 7493; https://doi.org/10.3390/app15137493 - 3 Jul 2025

Viewed by 490

Abstract

Autonomous vehicles (AVs) rely on a heterogeneous sensor suite of RGB cameras, LiDAR, GPS/IMU, and emerging event-based dynamic vision sensors (DVS) to perceive and navigate complex environments. However, these sensors can be deceived by realistic cyberattacks, undermining safety. In this work, we systematically [...] Read more.

Autonomous vehicles (AVs) rely on a heterogeneous sensor suite of RGB cameras, LiDAR, GPS/IMU, and emerging event-based dynamic vision sensors (DVS) to perceive and navigate complex environments. However, these sensors can be deceived by realistic cyberattacks, undermining safety. In this work, we systematically implement seven attack vectors in the CARLA simulator—salt and pepper noise, event flooding, depth map tampering, LiDAR phantom injection, GPS spoofing, denial of service, and steering bias control—and measure their impact on a state-of-the-art end-to-end driving agent. We then equip each sensor with tailored defenses (e.g., adaptive median filtering for RGB and spatial clustering for DVS) and integrate a unsupervised anomaly detector (EfficientAD from anomalib) trained exclusively on benign data. Our detector achieves clear separation between normal and attacked conditions (mean RGB anomaly scores of 0.00 vs. 0.38; DVS: 0.61 vs. 0.76), yielding over 95% detection accuracy with fewer than 5% false positives. Defense evaluations reveal that GPS spoofing is fully mitigated, whereas RGB- and depth-based attacks still induce 30–45% trajectory drift despite filtering. Notably, our research-focused evaluation of DVS sensors suggests potential intrinsic resilience advantages in high-dynamic-range scenarios, though their asynchronous output necessitates carefully tuned thresholds. These findings underscore the critical role of multi-modal anomaly detection and demonstrate that DVS sensors exhibit greater intrinsic resilience in high-dynamic-range scenarios, suggesting their potential to enhance AV cybersecurity when integrated with conventional sensors. Full article

(This article belongs to the Special Issue Intelligent Autonomous Vehicles: Development and Challenges)

► Show Figures

Figure 1

13 pages, 353 KiB

Open AccessArticle

Lightweight Object Detector Based on Images Captured Using Unmanned Aerial Vehicle

by Dike Chen, Jiacheng Sui, Ji Zhang and Hongyuan Wang

Appl. Sci. 2025, 15(13), 7482; https://doi.org/10.3390/app15137482 - 3 Jul 2025

Viewed by 210

Abstract

This study aims to investigate the flight endurance problems that unmanned aerial vehicles (UAVs) face when carrying out filming tasks, the relatively limited computational resources of xmini platforms carried by UAVs, and the need for fast decision making and responses when processing image [...] Read more.

This study aims to investigate the flight endurance problems that unmanned aerial vehicles (UAVs) face when carrying out filming tasks, the relatively limited computational resources of xmini platforms carried by UAVs, and the need for fast decision making and responses when processing image data in real-time. In this study, an improved Yolov8s-CFS model based on Yolov8s is proposed to address the need for a lightweight solution when UAVs are used to perform filming tasks. First, the Bottlenet in C2f is replaced by the FasterNet Block to achieve an overall lightweighting effect; second, in order to reduce the problem of model accuracy degradation due to excessive lightweighting, this study introduces the self-weight coordinate attention (SWCA) in the C2f-Faster module connected to each detect head. This results in the C2f-Faster-SWCA module, which provides a better solution to mitigate the model accuracy degradation that may occur due to excessive lightweighting. The experimental results show that the number of parameters in the Yolov8-CFS model is decreased by 17.4% with respect to the baseline on the Visdrone2019 dataset; in addition, its average accuracy remains at 40.1%. In summary, the Yolov8-CFS model reduces the number of parameters and model complexity while ensuring the accuracy of the model, facilitating its application in mobile deployment scenarios. Full article

► Show Figures

Figure 1

22 pages, 9809 KiB

Open AccessArticle

Real-Time Multi-Camera Tracking for Vehicles in Congested, Low-Velocity Environments: A Case Study on Drive-Thru Scenarios

by Carlos Gellida-Coutiño, Reyes Rios-Cabrera, Alan Maldonado-Ramirez and Anand Sanchez-Orta

Electronics 2025, 14(13), 2671; https://doi.org/10.3390/electronics14132671 - 1 Jul 2025

Viewed by 417

Abstract

In this paper we propose a novel set of techniques for real-time Multi-Target Multi-Camera (MTMC) tracking of vehicles in congested, low speed environments, such as those of drive-thru scenarios, where metrics such as the number of vehicles, time of stay, and interactions between [...] Read more.

In this paper we propose a novel set of techniques for real-time Multi-Target Multi-Camera (MTMC) tracking of vehicles in congested, low speed environments, such as those of drive-thru scenarios, where metrics such as the number of vehicles, time of stay, and interactions between vehicles and staff are needed and must be highly accurate. Traditional methods of tracking based on Intersection over Union (IoU) and basic appearance features produce fragmented trajectories of misidentifications under these conditions. Furthermore, detectors, such as YOLO (You Only Look Once) architectures, exhibit different types of errors due to vehicle proximity, lane changes, and occlusions. Our methodology introduces a new tracker algorithm, Multi-Object Tracker based on Corner Displacement (MTCD), that improves the robustness against bounding box deformations by analysing corner displacement patterns and several other factors involved. The proposed solution was validated on real-world drive-thru footage, outperforming standard IoU-based trackers like Nvidia Discriminative Correlation Filter (NvDCF) tracker. By maintaining accurate cross-camera trajectories, our framework enables the extraction of critical operational metrics, including vehicle dwell times and person–vehicle interaction patterns, which are essential for optimizing service efficiency. This study tackles persistent tracking challenges in constrained environments, showcasing practical applications for real-world surveillance and logistics systems where precision is critical. The findings underscore the benefits of incorporating geometric resilience and delayed decision-making into MTMC architectures. Furthermore, our approach offers the advantage of seamless integration with existing camera infrastructure, eliminating the need for new deployments. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

36 pages, 4653 KiB

Open AccessArticle

A Novel Method for Traffic Parameter Extraction and Analysis Based on Vehicle Trajectory Data for Signal Control Optimization

by Yizhe Wang, Yangdong Liu and Xiaoguang Yang

Appl. Sci. 2025, 15(13), 7155; https://doi.org/10.3390/app15137155 - 25 Jun 2025

Viewed by 334

Abstract

As urban traffic systems become increasingly complex, traditional traffic data collection methods based on fixed detectors face challenges such as poor data quality and acquisition difficulties. Traditional methods also lack the ability to capture complete vehicle path information essential for signal optimization. While [...] Read more.

As urban traffic systems become increasingly complex, traditional traffic data collection methods based on fixed detectors face challenges such as poor data quality and acquisition difficulties. Traditional methods also lack the ability to capture complete vehicle path information essential for signal optimization. While vehicle trajectory data can provide rich spatiotemporal information, its sampling characteristics present new technical challenges for traffic parameter extraction. This study addresses the key issue of extracting traffic parameters suitable for signal timing optimization from sampled trajectory data by proposing a comprehensive method for traffic parameter extraction and analysis based on vehicle trajectory data. The method comprises five modules: data preprocessing, basic feature processing, exploratory data analysis, key feature extraction, and data visualization. An innovative algorithm is proposed to identify which intersections vehicles pass through, effectively solving the challenge of mapping GPS points to road network nodes. A dual calculation method based on instantaneous speed and time difference is adopted, improving parameter estimation accuracy through multi-source data fusion. A highly automated processing toolchain based on Python and MATLAB is developed. The method advances the state of the art through a novel polygon-based trajectory mapping algorithm and a systematic multi-source parameter extraction framework specifically designed for signal control optimization. Validation using actual trajectory data containing 2.48 million records successfully eliminated 30.80% redundant data and accurately identified complete paths for 7252 vehicles. The extracted multi-dimensional parameters, including link flow, average speed, travel time, and OD matrices, accurately reflect network operational status, identifying congestion hotspots, tidal traffic characteristics, and unstable road segments. The research outcomes provide a feasible technical solution for areas lacking traditional detection equipment. The extracted parameters can directly support signal optimization applications such as traffic signal coordination, timing optimization, and congestion management, providing crucial support for implementing data-driven intelligent traffic control. This research presents a theoretical framework validated with real-world data, providing a foundation for future implementation in operational signal control systems. Full article

(This article belongs to the Special Issue Research and Estimation of Traffic Flow Characteristics)

► Show Figures

Figure 1

23 pages, 2630 KiB

Open AccessArticle

Machine Learning Traffic Flow Prediction Models for Smart and Sustainable Traffic Management

by Rusul Abduljabbar, Hussein Dia and Sohani Liyanage

Infrastructures 2025, 10(7), 155; https://doi.org/10.3390/infrastructures10070155 - 24 Jun 2025

Cited by 1 | Viewed by 960

Abstract

Sustainable traffic management relies on accurate traffic flow prediction to reduce congestion, fuel consumption, and emissions and minimise the external environmental impacts of traffic operations. This study contributes to this objective by developing and evaluating advanced machine learning models that leverage multisource data [...] Read more.

Sustainable traffic management relies on accurate traffic flow prediction to reduce congestion, fuel consumption, and emissions and minimise the external environmental impacts of traffic operations. This study contributes to this objective by developing and evaluating advanced machine learning models that leverage multisource data to predict traffic patterns more effectively, allowing for the deployment of proactive measures to prevent or reduce traffic congestion and idling times, leading to enhanced eco-friendly mobility. Specifically, this paper evaluates the impact of multisource sensor inputs and spatial detector interactions on machine learning-based traffic flow prediction. Using a dataset of 839,377 observations from 14 detector stations along Melbourne’s Eastern Freeway, Bidirectional Long Short-Term Memory (BiLSTM) models were developed to assess predictive accuracy under different input configurations. The results demonstrated that incorporating speed and occupancy inputs alongside traffic flow improves prediction accuracy by up to 16% across all detector stations. This study also investigated the role of spatial flow input interactions from upstream and downstream detectors in enhancing prediction performance. The findings confirm that including neighbouring detectors improves prediction accuracy, increasing performance from 96% to 98% for eastbound and westbound directions. These findings highlight the benefits of optimised sensor deployment, data integration, and advanced machine-learning techniques for smart and eco-friendly traffic systems. Additionally, this study provides a foundation for data-driven, adaptive traffic management strategies that contribute to sustainable road network planning, reducing vehicle idling, fuel consumption, and emissions while enhancing urban mobility and supporting sustainability goals. Furthermore, the proposed framework aligns with key United Nations Sustainable Development Goals (SDGs), particularly those promoting sustainable cities, resilient infrastructure, and climate-responsive planning. Full article

(This article belongs to the Special Issue Sustainable Road Design and Traffic Management)

► Show Figures

Figure 1

32 pages, 8925 KiB

Open AccessArticle

HSF-DETR: Hyper Scale Fusion Detection Transformer for Multi-Perspective UAV Object Detection

by Yi Mao, Haowei Zhang, Rui Li, Feng Zhu, Rui Sun and Pingping Ji

Remote Sens. 2025, 17(12), 1997; https://doi.org/10.3390/rs17121997 - 9 Jun 2025

Viewed by 680

Abstract

Unmanned aerial vehicle (UAV) imagery detection faces challenges in preserving small object features during multi-level downsampling, handling angle and altitude-dependent variations in aerial scenes, achieving accurate localization in dense environments, and performing real-time detection. To address these limitations, we propose HSF-DETR, a lightweight [...] Read more.

Unmanned aerial vehicle (UAV) imagery detection faces challenges in preserving small object features during multi-level downsampling, handling angle and altitude-dependent variations in aerial scenes, achieving accurate localization in dense environments, and performing real-time detection. To address these limitations, we propose HSF-DETR, a lightweight transformer-based detector specifically designed for UAV imagery. First, we design a hybrid progressive fusion network (HPFNet) as the backbone, which adaptively modulates receptive fields to capture multi-scale information while preserving fine-grained details critical for small object detection. Second, building upon features extracted by HPFNet, we develop MultiScaleNet, which enhances feature representation through dual-layer optimization and cross-domain feature learning, significantly improving the model’s capability to handle complex aerial scenarios with diverse object orientations. Finally, to address spatial–semantic alignment challenges, we devise a position-aware align context and spatial tuning (PACST) module that ensures effective feature calibration through precise alignment and adaptive fusion across scales. This hierarchical architecture is complemented by our novel AdaptDist-IoU loss with dynamic weight allocation, which enhances localization accuracy, particularly in dense environments. Extensive experiments using standard detection metrics (mAP50 and mAP50:95) on the VisDrone2019 test dataset demonstrate that HSF-DETR achieves superior performance with 0.428 mAP50 (+5.4%) and 0.253 mAP50:95 (+4%) when compared with RT-DETR, while maintaining real-time inference (69.3 FPS) on an NVIDIA RTX 4090D GPU with only 15.24M parameters and 63.6 GFLOPs. Further validation across multiple public remote sensing datasets confirms the robust generalization capability of HSF-DETR in diverse aerial scenarios, offering a practical solution for resource-constrained UAV applications where both detection quality and processing speed are crucial. Full article

(This article belongs to the Special Issue Deep Learning-Based Small-Target Detection in Remote Sensing)

► Show Figures

Graphical abstract

22 pages, 12020 KiB

Open AccessArticle

TFF-Net: A Feature Fusion Graph Neural Network-Based Vehicle Type Recognition Approach for Low-Light Conditions

by Huizhi Xu, Wenting Tan, Yamei Li and Yue Tian

Sensors 2025, 25(12), 3613; https://doi.org/10.3390/s25123613 - 9 Jun 2025

Viewed by 641

Abstract

Accurate vehicle type recognition in low-light environments remains a critical challenge for intelligent transportation systems (ITSs). To address the performance degradation caused by insufficient lighting, complex backgrounds, and light interference, this paper proposes a Twin-Stream Feature Fusion Graph Neural Network (TFF-Net) model. The [...] Read more.

Accurate vehicle type recognition in low-light environments remains a critical challenge for intelligent transportation systems (ITSs). To address the performance degradation caused by insufficient lighting, complex backgrounds, and light interference, this paper proposes a Twin-Stream Feature Fusion Graph Neural Network (TFF-Net) model. The model employs multi-scale convolutional operations combined with an Efficient Channel Attention (ECA) module to extract discriminative local features, while independent convolutional layers capture hierarchical global representations. These features are mapped as nodes to construct fully connected graph structures. Hybrid graph neural networks (GNNs) process the graph structures and model spatial dependencies and semantic associations. TFF-Net enhances the representation of features by fusing local details and global context information from the output of GNNs. To further improve its robustness, we propose an Adaptive Weighted Fusion-Bagging (AWF-Bagging) algorithm, which dynamically assigns weights to base classifiers based on their F1 scores. TFF-Net also includes dynamic feature weighting and label smoothing techniques for solving the category imbalance problem. Finally, the proposed TFF-Net is integrated into YOLOv11n (a lightweight real-time object detector) with an improved adaptive loss function. For experimental validation in low-light scenarios, we constructed the low-light vehicle dataset VDD-Light based on the public dataset UA-DETRAC. Experimental results demonstrate that our model achieves 2.6% and 2.2% improvements in mAP50 and mAP50-95 metrics over the baseline model. Compared to mainstream models and methods, the proposed model shows excellent performance and practical deployment potential. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

24 pages, 822 KiB

Open AccessArticle

Survey on Image-Based Vehicle Detection Methods

by Mortda A. A. Adam and Jules R. Tapamo

World Electr. Veh. J. 2025, 16(6), 303; https://doi.org/10.3390/wevj16060303 - 29 May 2025

Viewed by 815

Abstract

Vehicle detection is essential for real-world applications such as road surveillance, intelligent transportation systems, and autonomous driving, where high accuracy and real-time performance are critical. However, achieving robust detection remains challenging due to scene complexity, occlusion, scale variation, and varying lighting conditions. Over [...] Read more.

Vehicle detection is essential for real-world applications such as road surveillance, intelligent transportation systems, and autonomous driving, where high accuracy and real-time performance are critical. However, achieving robust detection remains challenging due to scene complexity, occlusion, scale variation, and varying lighting conditions. Over the past two decades, numerous studies have been proposed to address these issues. This study presents a comprehensive and structured survey of image-based vehicle detection methods, systematically comparing classical machine learning techniques based on handcrafted features with modern deep learning approaches. Deep learning methods are categorized into one-stage detectors (e.g., YOLO, SSD, FCOS, CenterNet), two-stage detectors (e.g., Faster R-CNN, Mask R-CNN), transformer-based detectors (e.g., DETR, Swin Transformer), and GAN-based methods, highlighting architectural trade-offs concerning speed, accuracy, and practical deployment. We analyze widely adopted performance metrics from recent studies, evaluate characteristics and limitations of popular vehicle detection datasets, and explicitly discuss technical challenges, including domain generalization, environmental variability, computational constraints, and annotation quality. The survey concludes by clearly identifying open research challenges and promising future directions, such as efficient edge deployment strategies, multimodal data fusion, transformer-based enhancements, and integration with Vehicle-to-Everything (V2X) communication systems. Full article

(This article belongs to the Special Issue Vehicle Safe Motion in Mixed Vehicle Technologies Environment)

► Show Figures

Figure 1

22 pages, 8270 KiB

Open AccessArticle

DFE-YOLO: A Multi-Scale-Enhanced Detection Network for Dense Object Detection in Traffic Monitoring

by Qingyi Li, Yi Li and Yanfeng Lu

Electronics 2025, 14(11), 2108; https://doi.org/10.3390/electronics14112108 - 22 May 2025

Viewed by 820

Abstract

The accuracy of object detection is crucial for the safety and efficiency of traffic management in monitoring systems. Existing detectors, however, struggle significantly within complex urban scenarios where high-density occlusions among the targets occur, as well as extreme scale variations resulting from the [...] Read more.

The accuracy of object detection is crucial for the safety and efficiency of traffic management in monitoring systems. Existing detectors, however, struggle significantly within complex urban scenarios where high-density occlusions among the targets occur, as well as extreme scale variations resulting from the size differences of vehicles and distance variations to the camera. To remedy these issues, we introduce DFE-YOLO, an enhanced multi-scale detection framework built upon YOLOv8 that fuses features from various layers at different scales through our ‘four adaptive spatial feature fusion’ module, which performs adaptive spatial fusion via learnable weights normalized by softmax and thereby allows effective feature aggregation across scales. The second contribution is DySample, which uses a lightweight, content-aware, point-based upsampling method to improve multi-scale feature representation as well as reduce imbalance across different object scales. The experiments conducted on the VisDrone-2019 and BDD100K benchmarks showed significantly superior performance against state-of-the-art detectors. Specifically, DFE-YOLO achieved a +4% and +5.1% boost over YOLOv10 in AP and APsmall. This study offers a useful fix for smart transport systems. Full article

(This article belongs to the Special Issue Object Detection in Autonomous Driving)

► Show Figures

Figure 1

Search Results (505)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (505)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI