Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (212)

Search Parameters:
Keywords = YOLOx

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 3667 KiB  
Article
Improving the Recognition of Bamboo Color and Spots Using a Novel YOLO Model
by Yunlong Zhang, Tangjie Nie, Qingping Zeng, Lijie Chen, Wei Liu, Wei Zhang and Long Tong
Plants 2025, 14(15), 2287; https://doi.org/10.3390/plants14152287 - 24 Jul 2025
Viewed by 225
Abstract
The sheaths of bamboo shoots, characterized by distinct colors and spotting patterns, are key phenotypic markers influencing species classification, market value, and genetic studies. This study introduces YOLOv8-BS, a deep learning model optimized for detecting these traits in Chimonobambusa utilis using a dataset [...] Read more.
The sheaths of bamboo shoots, characterized by distinct colors and spotting patterns, are key phenotypic markers influencing species classification, market value, and genetic studies. This study introduces YOLOv8-BS, a deep learning model optimized for detecting these traits in Chimonobambusa utilis using a dataset from Jinfo Mountain, China. Enhanced by data augmentation techniques, including translation, flipping, and contrast adjustment, YOLOv8-BS outperformed benchmark models (YOLOv7, YOLOv5, YOLOX, and Faster R-CNN) in color and spot detection. For color detection, it achieved a precision of 85.9%, a recall of 83.4%, an F1-score of 84.6%, and an average precision (AP) of 86.8%. For spot detection, it recorded a precision of 90.1%, a recall of 92.5%, an F1-score of 91.1%, and an AP of 96.1%. These results demonstrate superior accuracy and robustness, enabling precise phenotypic analysis for bamboo germplasm evaluation and genetic diversity studies. YOLOv8-BS supports precision agriculture by providing a scalable tool for sustainable bamboo-based industries. Future improvements could enhance model adaptability for fine-grained varietal differences and real-time applications. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

30 pages, 4239 KiB  
Article
Real-Time Object Detection for Edge Computing-Based Agricultural Automation: A Case Study Comparing the YOLOX and YOLOv12 Architectures and Their Performance in Potato Harvesting Systems
by Joonam Kim, Giryeon Kim, Rena Yoshitoshi and Kenichi Tokuda
Sensors 2025, 25(15), 4586; https://doi.org/10.3390/s25154586 - 24 Jul 2025
Viewed by 254
Abstract
In this paper, we presents a case study involving the implementation experience and a methodological framework through a comprehensive comparative analysis of the YOLOX and YOLOv12 object detection models for agricultural automation systems deployed in the Jetson AGX Orin edge computing platform. We [...] Read more.
In this paper, we presents a case study involving the implementation experience and a methodological framework through a comprehensive comparative analysis of the YOLOX and YOLOv12 object detection models for agricultural automation systems deployed in the Jetson AGX Orin edge computing platform. We examined the architectural differences between the models and their impact on detection capabilities in data-imbalanced potato-harvesting environments. Both models were trained on identical datasets with images capturing potatoes, soil clods, and stones, and their performances were evaluated through 30 independent trials under controlled conditions. Statistical analysis confirmed that YOLOX achieved a significantly higher throughput (107 vs. 45 FPS, p < 0.01) and superior energy efficiency (0.58 vs. 0.75 J/frame) than YOLOv12, meeting real-time processing requirements for agricultural automation. Although both models achieved an equivalent overall detection accuracy (F1-score, 0.97), YOLOv12 demonstrated specialized capabilities for challenging classes, achieving 42% higher recall for underrepresented soil clod objects (0.725 vs. 0.512, p < 0.01) and superior precision for small objects (0–3000 pixels). Architectural analysis identified a YOLOv12 residual efficient layer aggregation network backbone and area attention mechanism as key enablers of balanced precision–recall characteristics, which were particularly valuable for addressing agricultural data imbalance. However, NVIDIA Nsight profiling revealed implementation inefficiencies in the YOLOv12 multiprocess architecture, which prevented the theoretical advantages from being fully realized in edge computing environments. These findings provide empirically grounded guidelines for model selection in agricultural automation systems, highlighting the critical interplay between architectural design, implementation efficiency, and application-specific requirements. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

23 pages, 13739 KiB  
Article
Traffic Accident Rescue Action Recognition Method Based on Real-Time UAV Video
by Bo Yang, Jianan Lu, Tao Liu, Bixing Zhang, Chen Geng, Yan Tian and Siyu Zhang
Drones 2025, 9(8), 519; https://doi.org/10.3390/drones9080519 - 24 Jul 2025
Viewed by 379
Abstract
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and [...] Read more.
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and localization annotation. A total of 5082 keyframes were labeled with 1–5 targets each, and 14,412 instances of data were prepared (including flight altitude and camera angles) for action classification and position annotation. To mitigate the challenges posed by high-resolution drone footage with excessive redundant information, we propose the SlowFast-Traffic (SF-T) framework, a spatio-temporal sequence-based algorithm for recognizing traffic accident rescue actions. For more efficient extraction of target–background correlation features, we introduce the Actor-Centric Relation Network (ACRN) module, which employs temporal max pooling to enhance the time-dimensional features of static backgrounds, significantly reducing redundancy-induced interference. Additionally, smaller ROI feature map outputs are adopted to boost computational speed. To tackle class imbalance in incident samples, we integrate a Class-Balanced Focal Loss (CB-Focal Loss) function, effectively resolving rare-action recognition in specific rescue scenarios. We replace the original Faster R-CNN with YOLOX-s to improve the target detection rate. On our proposed dataset, the SF-T model achieves a mean average precision (mAP) of 83.9%, which is 8.5% higher than that of the standard SlowFast architecture while maintaining a processing speed of 34.9 tasks/s. Both accuracy-related metrics and computational efficiency are substantially improved. The proposed method demonstrates strong robustness and real-time analysis capabilities for modern traffic rescue action recognition. Full article
(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)
Show Figures

Figure 1

24 pages, 4442 KiB  
Article
Time-Series Correlation Optimization for Forest Fire Tracking
by Dongmei Yang, Guohao Nie, Xiaoyuan Xu, Debin Zhang and Xingmei Wang
Forests 2025, 16(7), 1101; https://doi.org/10.3390/f16071101 - 3 Jul 2025
Viewed by 301
Abstract
Accurate real-time tracking of forest fires using UAV platforms is crucial for timely early warning, reliable spread prediction, and effective autonomous suppression. Existing detection-based multi-object tracking methods face challenges in accurately associating targets and maintaining smooth tracking trajectories in complex forest environments. These [...] Read more.
Accurate real-time tracking of forest fires using UAV platforms is crucial for timely early warning, reliable spread prediction, and effective autonomous suppression. Existing detection-based multi-object tracking methods face challenges in accurately associating targets and maintaining smooth tracking trajectories in complex forest environments. These difficulties stem from the highly nonlinear movement of flames relative to the observing UAV and the lack of robust fire-specific feature modeling. To address these challenges, we introduce AO-OCSORT, an association-optimized observation-centric tracking framework designed to enhance robustness in dynamic fire scenarios. AO-OCSORT builds on the YOLOX detector. To associate detection results across frames and form smooth trajectories, we propose a temporal–physical similarity metric that utilizes temporal information from the short-term motion of targets and incorporates physical flame characteristics derived from optical flow and contours. Subsequently, scene classification and low-score filtering are employed to develop a hierarchical association strategy, reducing the impact of false detections and interfering objects. Additionally, a virtual trajectory generation module is proposed, employing a kinematic model to maintain trajectory continuity during flame occlusion. Locally evaluated on the 1080P-resolution FireMOT UAV wildfire dataset, AO-OCSORT achieves a 5.4% improvement in MOTA over advanced baselines at 28.1 FPS, meeting real-time requirements. This improvement enhances the reliability of fire front localization, which is crucial for forest fire management. Furthermore, AO-OCSORT demonstrates strong generalization, achieving 41.4% MOTA on VisDrone, 80.9% on MOT17, and 92.2% MOTA on DanceTrack. Full article
(This article belongs to the Special Issue Advanced Technologies for Forest Fire Detection and Monitoring)
Show Figures

Figure 1

16 pages, 1934 KiB  
Article
Research on Obtaining Pepper Phenotypic Parameters Based on Improved YOLOX Algorithm
by Yukang Huo, Rui-Feng Wang, Chang-Tao Zhao, Pingfan Hu and Haihua Wang
AgriEngineering 2025, 7(7), 209; https://doi.org/10.3390/agriengineering7070209 - 2 Jul 2025
Cited by 2 | Viewed by 370
Abstract
Pepper is a vital crop with extensive agricultural and industrial applications. Accurate phenotypic measurement, including plant height and stem diameter, is essential for assessing yield and quality, yet manual measurement is time-consuming and labor-intensive. This study proposes a deep learning-based phenotypic measurement method [...] Read more.
Pepper is a vital crop with extensive agricultural and industrial applications. Accurate phenotypic measurement, including plant height and stem diameter, is essential for assessing yield and quality, yet manual measurement is time-consuming and labor-intensive. This study proposes a deep learning-based phenotypic measurement method for peppers. A Pepper-mini dataset was constructed using offline augmentation. To address challenges in multi-plant growth environments, an improved YOLOX-tiny detection model incorporating a CA attention mechanism was developed, achieving a mAP of 95.16%. A detection box filtering method based on Euclidean distance was introduced to identify target plants. Further processing using HSV threshold segmentation, morphological operations, and connected component denoising enabled accurate region selection. Measurement algorithms were then applied, yielding high correlations with true values: R2 = 0.973 for plant height and R2 = 0.842 for stem diameter, with average errors of 0.443 cm and 0.0765 mm, respectively. This approach demonstrates a robust and efficient solution for automated phenotypic analysis in pepper cultivation. Full article
Show Figures

Figure 1

34 pages, 18851 KiB  
Article
Dual-Branch Multi-Dimensional Attention Mechanism for Joint Facial Expression Detection and Classification
by Cheng Peng, Bohao Li, Kun Zou, Bowen Zhang, Genan Dai and Ah Chung Tsoi
Sensors 2025, 25(12), 3815; https://doi.org/10.3390/s25123815 - 18 Jun 2025
Viewed by 367
Abstract
This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network [...] Read more.
This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network in the feature pyramid network in the You Only Look Once X (YOLOX) framework with a novel architecture involving three attention mechanisms—batch, channel, and neighborhood—which respectively explores the three input dimensions—batch, channel, and spatial. Correlations across a batch of images in the individual path of the dual incoming paths are first extracted by a self attention mechanism in the batch dimension; these two paths are fused together to consolidate their information and then split again into two separate paths; the information along the channel dimension is extracted using a generalized form of channel attention, an adaptive graph channel attention, which provides each element of the incoming signal with a weight that is adapted to the incoming signal. The combination of these two paths, together with two skip connections from the input to the batch attention to the output of the adaptive channel attention, then passes into a residual network, with neighborhood attention to extract fine features in the spatial dimension. This novel dual path architecture has been shown experimentally to achieve a better balance between the competing demands in an SDAC problem than other competing approaches. Ablation studies enable the determination of the relative importance of these three attention mechanisms. Competitive results are obtained on two non-aligned face expression recognition datasets, RAF-DB and SFEW, when compared with other state-of-the-art methods. Full article
Show Figures

Figure 1

22 pages, 4741 KiB  
Article
Research on Tunnel Crack Identification Localization and Segmentation Method Based on Improved YOLOX and UNETR++
by Wei Sun, Xiaohu Liu and Zhiyong Lei
Sensors 2025, 25(11), 3417; https://doi.org/10.3390/s25113417 - 29 May 2025
Viewed by 502
Abstract
To address the challenges in identifying and segmenting fine irregular cracks in tunnels, this paper proposes a new crack identification, localization and segmentation method based on improved YOLOX and UNETR++. The improved YOLOX recognition algorithm builds upon the original YOLOX network architecture. It [...] Read more.
To address the challenges in identifying and segmenting fine irregular cracks in tunnels, this paper proposes a new crack identification, localization and segmentation method based on improved YOLOX and UNETR++. The improved YOLOX recognition algorithm builds upon the original YOLOX network architecture. It replaces the original CSPDarknet backbone with EfficientNet to enhance multi-scale feature extraction while preserving fine texture characteristics of tunnel cracks. By integrating a lightweight ECA module, the proposed method significantly improves sensitivity to subtle crack features, enabling high-precision identification and localization of fine irregular cracks. The UNETR++ segmentation network is adopted to realize efficient and accurate segmentation of fine irregular cracks in tunnels through global feature capture capability and a multi-scale feature fusion mechanism. The experimental results demonstrate that the proposed method achieves integrated processing of crack identification, localization and segmentation, especially for fine and irregular cracks identification and segmentation. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 2497 KiB  
Article
The Research on an Improved YOLOX-Based Algorithm for Small-Object Road Vehicle Detection
by Zhixun Liu and Zhenyou Zhang
Electronics 2025, 14(11), 2179; https://doi.org/10.3390/electronics14112179 - 27 May 2025
Cited by 1 | Viewed by 455
Abstract
To address the challenges of missed detections and false positives caused by dense vehicle distribution, occlusions, and small object sizes in complex traffic scenarios, this paper proposes an improved YOLOX-based vehicle detection algorithm with three key innovations. First, we design a novel Wavelet-Enhanced [...] Read more.
To address the challenges of missed detections and false positives caused by dense vehicle distribution, occlusions, and small object sizes in complex traffic scenarios, this paper proposes an improved YOLOX-based vehicle detection algorithm with three key innovations. First, we design a novel Wavelet-Enhanced Convolution (WEC) module that expands the receptive field to enhance the model’s global perception capability. Building upon this foundation, we integrate the SimAM attention mechanism, which improves feature saturation by adaptively fusing semantic features across different channels and spatial locations, thereby strengthening the network’s multi-scale generalization ability. Furthermore, we develop a Varifocal Intersection over Union (VIoU) bounding-box regression loss function that optimizes convergence in multi-scale feature learning while enhancing global feature extraction capabilities. The experimental results on the VisDrone dataset demonstrate that our improved model achieves performance gains of 0.9% mAP and 1.8% mAP75 compared to the baseline version, effectively improving vehicle detection accuracy. Full article
Show Figures

Figure 1

29 pages, 11492 KiB  
Article
Sustainable Real-Time Driver Gaze Monitoring for Enhancing Autonomous Vehicle Safety
by Jong-Bae Kim
Sustainability 2025, 17(9), 4114; https://doi.org/10.3390/su17094114 - 1 May 2025
Viewed by 638
Abstract
Despite advances in autonomous driving technology, current systems still require drivers to remain alert at all times. These systems issue warnings regardless of whether the driver is actually gazing at the road, which can lead to driver fatigue and reduced responsiveness over time, [...] Read more.
Despite advances in autonomous driving technology, current systems still require drivers to remain alert at all times. These systems issue warnings regardless of whether the driver is actually gazing at the road, which can lead to driver fatigue and reduced responsiveness over time, ultimately compromising safety. This paper proposes a sustainable real-time driver gaze monitoring method to enhance the safety and reliability of autonomous vehicles. The method uses a YOLOX-based face detector to detect the driver’s face and facial features, analyzing their size, position, shape, and orientation to determine whether the driver is gazing forward. By accurately assessing the driver’s gaze direction, the method adjusts the intensity and frequency of alerts, helping to reduce unnecessary warnings and improve overall driving safety. Experimental results demonstrate that the proposed method achieves a gaze classification accuracy of 97.3% and operates robustly in real-time under diverse environmental conditions, including both day and night. These results suggest that the proposed method can be effectively integrated into Level 3 and higher autonomous driving systems, where monitoring driver attention remains critical for safe operation. Full article
Show Figures

Figure 1

35 pages, 6431 KiB  
Article
Delving into YOLO Object Detection Models: Insights into Adversarial Robustness
by Kyriakos D. Apostolidis and George A. Papakostas
Electronics 2025, 14(8), 1624; https://doi.org/10.3390/electronics14081624 - 17 Apr 2025
Viewed by 1970
Abstract
This paper provides a comprehensive study of the security of YOLO (You Only Look Once) model series for object detection, emphasizing their evolution, technical innovations, and performance across the COCO dataset. The robustness of YOLO models under adversarial attacks and image corruption, offering [...] Read more.
This paper provides a comprehensive study of the security of YOLO (You Only Look Once) model series for object detection, emphasizing their evolution, technical innovations, and performance across the COCO dataset. The robustness of YOLO models under adversarial attacks and image corruption, offering insights into their resilience and adaptability, is analyzed in depth. As real-time object detection plays an increasingly vital role in applications such as autonomous driving, security, and surveillance, this review aims to clarify the strengths and limitations of each YOLO iteration, serving as a valuable resource for researchers and practitioners aiming to optimize model selection and deployment in dynamic, real-world environments. The results reveal that YOLOX models, particularly their large variants, exhibit superior robustness compared to other YOLO versions, maintaining higher accuracy under challenging conditions. Our findings serve as a valuable resource for researchers and practitioners aiming to optimize YOLO models for dynamic and adversarial real-world environments while guiding future research toward developing more resilient object detection systems. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

17 pages, 5429 KiB  
Article
The Development of a Lightweight DE-YOLO Model for Detecting Impurities and Broken Rice Grains
by Zhenwei Liang, Xingyue Xu, Deyong Yang and Yanbin Liu
Agriculture 2025, 15(8), 848; https://doi.org/10.3390/agriculture15080848 - 14 Apr 2025
Cited by 2 | Viewed by 517
Abstract
A rice impurity detection algorithm model, DE-YOLO, based on YOLOX-s improvement is proposed to address the issues of small crop target recognition and the similarity of impurities in rice impurity detection. This model achieves correct recognition, classification, and detection of rice target crops [...] Read more.
A rice impurity detection algorithm model, DE-YOLO, based on YOLOX-s improvement is proposed to address the issues of small crop target recognition and the similarity of impurities in rice impurity detection. This model achieves correct recognition, classification, and detection of rice target crops with similar colors in complex environments. Firstly, changing the CBS module to the DBS module in the entire network model and replacing the standard convolution with Depthwise Separable Convolution (DSConv) can effectively reduce the number of parameters and the computational complexity, making the model lightweight. The ECANet module is introduced into the backbone feature extraction network, utilizing the weighted selection feature to cluster the network in the region of interest, enhancing attention to rice impurities and broken grains, and compensating for the reduced accuracy caused by model light weighting. The loss problem of class imbalance is optimized using the Focal Loss function. The experimental results demonstrate that the DE-YOLO model has an average accuracy (mAP) of 97.55% for detecting rice impurity crushing targets, which is 2.9% higher than the average accuracy of the original YOLOX algorithm. The recall rate (R) is 94.46%, the F1 value is 0.96, the parameter count is reduced by 48.89%, and the GFLOPS is reduced by 46.33%. This lightweight model can effectively detect rice impurity/broken targets and provide technical support for monitoring the rice impurity/ broken rate. Full article
Show Figures

Figure 1

26 pages, 7941 KiB  
Article
An Edge-Computing-Driven Approach for Augmented Detection of Construction Materials: An Example of Scaffold Component Counting
by Xianzhong Zhao, Bo Cheng, Yujie Lu and Zhaoqi Huang
Buildings 2025, 15(7), 1190; https://doi.org/10.3390/buildings15071190 - 5 Apr 2025
Viewed by 546
Abstract
Construction material management is crucial for project progression. Counting massive amounts of scaffold components is a key step for efficient material management. However, traditional counting methods are time-consuming and laborious. Utilizing a vision-based method with edge devices for counting these materials undoubtedly offers [...] Read more.
Construction material management is crucial for project progression. Counting massive amounts of scaffold components is a key step for efficient material management. However, traditional counting methods are time-consuming and laborious. Utilizing a vision-based method with edge devices for counting these materials undoubtedly offers a promising solution. This study proposed an edge-computing-driven approach for detecting and counting scaffold components. Two algorithm refinements of YOLOX, including generalized intersection over union (GIoU) and soft non-maximum suppression (Soft-NMS), were introduced to enhance detection accuracy in conditions of occlusion. An automated pruning method was proposed to compress the model, achieving a 60.2% reduction in computation and a 9.1% increase in inference speed. Two practical case studies demonstrated that the method, when deployed on edge devices, achieved 98.9% accuracy and reduced time consumption for counting tasks by 87.9% compared to the conventional method. This research provides an edge-computing-driven framework for counting massive materials, establishing a comprehensive workflow for intelligent applications in construction management. The paper concludes with limitations of the current study and suggestions for future work. Full article
Show Figures

Figure 1

13 pages, 3659 KiB  
Article
A Non-Contact Privacy Protection Bed Angle Estimation Method Based on LiDAR
by Yezhao Ju, Yuanji Li, Haiyang Zhang, Le Xin, Changming Zhao and Ziyi Xu
Sensors 2025, 25(7), 2226; https://doi.org/10.3390/s25072226 - 2 Apr 2025
Viewed by 2528
Abstract
Accurate bed angle monitoring is crucial in healthcare settings, particularly in Intensive Care Units (ICUs), where improper bed positioning can lead to severe complications such as ventilator-associated pneumonia. Traditional camera-based solutions, while effective, often raise significant privacy concerns. This study proposes a non-intrusive [...] Read more.
Accurate bed angle monitoring is crucial in healthcare settings, particularly in Intensive Care Units (ICUs), where improper bed positioning can lead to severe complications such as ventilator-associated pneumonia. Traditional camera-based solutions, while effective, often raise significant privacy concerns. This study proposes a non-intrusive bed angle detection system based on LiDAR technology, utilizing the Intel RealSense L515 sensor. By leveraging time-of-flight principles, the system enables real-time, privacy-preserving monitoring of head-of-bed elevation angles without direct visual surveillance. Our methodology integrates advanced techniques, including coordinate system transformation, plane fitting, and a deep learning framework combining YOLO-X with an enhanced A2J algorithm. Customized loss functions further improve angle estimation accuracy. Experimental results in ICU environments demonstrate the system’s effectiveness, with an average angle detection error of less than 3 degrees. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

18 pages, 2108 KiB  
Article
A Lightweight Approach to Comprehensive Fabric Anomaly Detection Modeling
by Shuqin Cui, Weihong Liu and Min Li
Sensors 2025, 25(7), 2038; https://doi.org/10.3390/s25072038 - 25 Mar 2025
Viewed by 581
Abstract
In order to solve the problem of high computational resource consumption in fabric anomaly detection, we propose a lightweight network, GH-YOLOx, which integrates ghost convolutions and hierarchical GHNetV2 backbone together to capture both local and global anomaly features. At the same time, other [...] Read more.
In order to solve the problem of high computational resource consumption in fabric anomaly detection, we propose a lightweight network, GH-YOLOx, which integrates ghost convolutions and hierarchical GHNetV2 backbone together to capture both local and global anomaly features. At the same time, other innovative components, such as GhostConv, dynamic convolutions, feature fusion modules, and a shared group convolution head, are applied to effectively handle multi-scale issues. Lamp pruning accelerates inference, while channel-wise knowledge distillation enhances the pruned model’s accuracy. Experiments on fabric datasets demonstrate that GH-YOLOx can effectively reduce the number of parameters while achieving a higher detection rate than other lightweight models. Overall, our solution offers a practical approach to real-time fabric anomaly detection on mobile and embedded devices. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 12823 KiB  
Article
Remote Sensing Small Object Detection Network Based on Multi-Scale Feature Extraction and Information Fusion
by Junsuo Qu, Tong Liu, Zongbing Tang, Yifei Duan, Heng Yao and Jiyuan Hu
Remote Sens. 2025, 17(5), 913; https://doi.org/10.3390/rs17050913 - 5 Mar 2025
Viewed by 1241
Abstract
Nowadays, object detection algorithms are widely used in various scenarios. However, there are further small object detection requirements in some special scenarios. Due to the problems related to small objects, such as their less available features, unbalanced samples, higher positioning accuracy requirements, and [...] Read more.
Nowadays, object detection algorithms are widely used in various scenarios. However, there are further small object detection requirements in some special scenarios. Due to the problems related to small objects, such as their less available features, unbalanced samples, higher positioning accuracy requirements, and fewer data sets, a small object detection algorithm is more complex than a general object detection algorithm. The detection effect of the model for small objects is not ideal. Therefore, this paper takes YOLOXs as the benchmark network and enhances the feature information on small objects by improving the network’s structure so as to improve the detection effect of the model for small objects. This specific research is presented as follows: Aiming at the problem of a neck network based on an FPN and its variants being prone to information loss in the feature fusion of non-adjacent layers, this paper proposes a feature fusion and distribution module, which replaces the information transmission path, from deep to shallow, in the neck network of YOLOXs. This method first fuses and extracts the feature layers used by the backbone network for prediction to obtain global feature information containing multiple-size objects. Then, the global feature information is distributed to each prediction branch to ensure that the high-level semantic and fine-grained information are more efficiently integrated so as to help the model effectively learn the discriminative information on small objects and classify them correctly. Finally, after testing on the VisDrone2021 dataset, which corresponds to a standard image size of 1080p (1920 × 1080), the resolution of each image is high and the video frame rate contained in the dataset is usually 30 frames/second (fps), with a high resolution in time, it can be used to detect objects of various sizes and for dynamic object detection tasks. And when we integrated the module into a YOLOXs network (named the FE-YOLO network) with the three improvement points of the feature layer, channel number, and maximum pool, the mAP and APs were increased by 1.0% and 0.8%, respectively. Compared with YOLOV5m, YOLOV7-Tiny, FCOS, and other advanced models, it can obtain the best performance. Full article
Show Figures

Figure 1

Back to TopTop