MDPI - Publisher of Open Access Journals

21 pages, 15647 KiB

Open AccessArticle

Research on Oriented Object Detection in Aerial Images Based on Architecture Search with Decoupled Detection Heads

by Yuzhe Kang, Bohao Zheng and Wei Shen

Appl. Sci. 2025, 15(15), 8370; https://doi.org/10.3390/app15158370 - 28 Jul 2025

Viewed by 253

Object detection in aerial images can provide great support in traffic planning, national defense reconnaissance, hydrographic surveys, infrastructure construction, and other fields. Objects in aerial images are characterized by small pixel–area ratios, dense arrangements between objects, and arbitrary inclination angles. In response to [...] Read more.

Object detection in aerial images can provide great support in traffic planning, national defense reconnaissance, hydrographic surveys, infrastructure construction, and other fields. Objects in aerial images are characterized by small pixel–area ratios, dense arrangements between objects, and arbitrary inclination angles. In response to these characteristics and problems, we improved the feature extraction network Inception-ResNet using the Fast Architecture Search (FAS) module and proposed a one-stage anchor-free rotation object detector. The structure of the object detector is simple and only consists of convolution layers, which reduces the number of model parameters. At the same time, the label sampling strategy in the training process is optimized to resolve the problem of insufficient sampling. Finally, a decoupled object detection head is used to separate the bounding box regression task from the object classification task. The experimental results show that the proposed method achieves mean average precision (mAP) of 82.6%, 79.5%, and 89.1% on the DOTA1.0, DOTA1.5, and HRSC2016 datasets, respectively, and the detection speed reaches 24.4 FPS, which can meet the needs of real-time detection. Full article

(This article belongs to the Special Issue Innovative Applications of Artificial Intelligence in Engineering)

► Show Figures

Figure 1

16 pages, 10129 KiB

Open AccessArticle

PestOOD: An AI-Enabled Solution for Advancing Grain Security via Out-of-Distribution Pest Detection

by Jida Tian, Chuanyang Ma, Jiangtao Li and Huiling Zhou

Electronics 2025, 14(14), 2868; https://doi.org/10.3390/electronics14142868 - 18 Jul 2025

Viewed by 179

Abstract

Detecting stored-grain pests on the surface of the grain pile plays an important role in integrated pest management (IPM), which is crucial for grain security. Recently, numerous deep learning-based pest detection methods have been proposed. However, a critical limitation of existing methods is [...] Read more.

Detecting stored-grain pests on the surface of the grain pile plays an important role in integrated pest management (IPM), which is crucial for grain security. Recently, numerous deep learning-based pest detection methods have been proposed. However, a critical limitation of existing methods is their inability to detect out-of-distribution (OOD) categories that are unseen during training. When encountering such objects, these methods often misclassify them as in-distribution (ID) categories. To address this challenge, we propose a one-stage framework named PestOOD for out-of-distribution stored-grain pest detection via flow-based feature reconstruction. Specifically, we propose a novel Flow-Based OOD Feature Generation (FOFG) module that generates OOD features for detector training via feature reconstruction. This helps the detector learn to recognize OOD objects more effectively. Additionally, to prevent network overfitting that may lead to an excessive focus on ID feature extraction, we propose a Noisy DropBlock (NDB) module and integrate it into the backbone network. Finally, to ensure effective network convergence, a Stage-Wise Training Strategy (STS) is proposed. We conducted extensive experiments on our previously established multi-class stored-grain pest dataset. The results show that our proposed PestOOD demonstrates superior performance over state-of-the-art methods, providing an effective AI-enabled solution to ensure grain security. Full article

(This article belongs to the Special Issue Security Challenges and Opportunities of Artificial Intelligence/Big Data Scenarios)

► Show Figures

Figure 1

22 pages, 6902 KiB

Open AccessArticle

The Robust Vessel Segmentation and Centerline Extraction: One-Stage Deep Learning Approach

by Rostislav Epifanov, Yana Fedotova, Savely Dyachuk, Alexandr Gostev, Andrei Karpenko and Rustam Mullyadzhanov

J. Imaging 2025, 11(7), 209; https://doi.org/10.3390/jimaging11070209 - 26 Jun 2025

Viewed by 828

Abstract

The accurate segmentation of blood vessels and centerline extraction are critical in vascular imaging applications, ranging from preoperative planning to hemodynamic modeling. This study introduces a novel one-stage method for simultaneous vessel segmentation and centerline extraction using a multitask neural network. We designed [...] Read more.

The accurate segmentation of blood vessels and centerline extraction are critical in vascular imaging applications, ranging from preoperative planning to hemodynamic modeling. This study introduces a novel one-stage method for simultaneous vessel segmentation and centerline extraction using a multitask neural network. We designed a hybrid architecture that integrates convolutional and graph layers, along with a task-specific loss function, to effectively capture the topological relationships between segmentation and centerline extraction, leveraging their complementary features. The proposed end-to-end framework directly predicts the centerline as a polyline with real-valued coordinates, thereby eliminating the need for post-processing steps commonly required by previous methods that infer centerlines either implicitly or without ensuring point connectivity. We evaluated our approach on a combined dataset of 142 computed tomography angiography images of the thoracic and abdominal regions from LIDC-IDRI and AMOS datasets. The results demonstrate that our method achieves superior centerline extraction performance (Surface Dice with threshold of 3 mm: 97.65%

\pm

2.07%) compared to state-of-the-art techniques, and attains the highest subvoxel resolution (Surface Dice with threshold of 1 mm: 72.52%

\pm

8.96%). In addition, we conducted a robustness analysis to evaluate the model stability under small rigid and deformable transformations of the input data, and benchmarked its robustness against the widely used VMTK toolkit. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

23 pages, 8979 KiB

Open AccessArticle

Beef Carcass Grading with EfficientViT: A Lightweight Vision Transformer Approach

by Hyunwoo Lim and Eungyeol Song

Appl. Sci. 2025, 15(11), 6302; https://doi.org/10.3390/app15116302 - 4 Jun 2025

Viewed by 800

Abstract

Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage [...] Read more.

Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage automated grading model based on EfficientViT, a lightweight vision transformer architecture. Unlike conventional two-stage methods that require prior segmentation of the loin region, our model directly predicts beef quality grades from raw RGB images, significantly simplifying the pipeline and reducing computational overhead. We evaluate the proposed model against representative convolutional neural networks (VGG-16, ResNeXt-50, DenseNet-121) as well as two-stage combinations of segmentation and classification models. Experiments were conducted on a publicly available beef carcass dataset consisting of over 77,000 labeled images. EfficientViT achieves the highest accuracy (98.46%) and F1-score (0.9867) among all evaluated models while maintaining low inference latency (3.92 ms) and compact parameter size (36.4 MB). In particular, it outperforms CNNs in predicting the top grade (1++), where global visual patterns such as marbling distribution are crucial. Furthermore, we employ Grad-CAM and attention map visualizations to analyze the model’s focus regions and demonstrate that EfficientViT captures holistic contextual features better than CNNs. The model also exhibits robustness across varying loin area proportions. Our findings suggest that EfficientViT is not only accurate but also efficient and interpretable, making it a strong candidate for real-time industrial applications in beef quality grading. Full article

► Show Figures

Figure 1

40 pages, 3546 KiB

Open AccessArticle

Hybrid AI-Based Framework for Renewable Energy Forecasting: One-Stage Decomposition and Sample Entropy Reconstruction with Least-Squares Regression

by Nahed Zemouri, Hatem Mezaache, Zakaria Zemali, Fabio La Foresta, Mario Versaci and Giovanni Angiulli

Energies 2025, 18(11), 2942; https://doi.org/10.3390/en18112942 - 3 Jun 2025

Viewed by 699

Abstract

Accurate renewable energy forecasting is crucial for grid stability and efficient energy management. This study introduces a hybrid model that combines signal decomposition and artificial intelligence to enhance the prediction of solar radiation and wind speed. The framework uses a one-stage decomposition strategy, [...] Read more.

Accurate renewable energy forecasting is crucial for grid stability and efficient energy management. This study introduces a hybrid model that combines signal decomposition and artificial intelligence to enhance the prediction of solar radiation and wind speed. The framework uses a one-stage decomposition strategy, applying variational mode decomposition and an improved empirical mode decomposition method with adaptive noise. This process effectively extracts meaningful components while reducing background noise, improving data quality, and minimizing uncertainty. The complexity of these components is assessed using entropy-based selection to retain only the most relevant features. The refined data are then fed into advanced predictive models, including a bidirectional neural network for capturing long-term dependencies, an extreme learning machine, and a support vector regression model. These models address nonlinear patterns in the historical data. To optimize forecasting accuracy, outputs from all models are combined using a least-squares regression technique that assigns optimal weights to each prediction. The hybrid model was tested on datasets from three geographically diverse locations, encompassing varying weather conditions. Results show a notable improvement in accuracy, achieving a root mean square error as low as 2.18 and a coefficient of determination near 0.999. Compared to traditional methods, forecasting errors were reduced by up to 30%, demonstrating the model’s effectiveness in supporting sustainable and reliable energy systems. Full article

► Show Figures

Figure 1

29 pages, 9314 KiB

Open AccessArticle

SFRADNet: Object Detection Network with Angle Fine-Tuning Under Feature Matching

by Keliang Liu, Yantao Xi, Donglin Jing, Xue Zhang and Mingfei Xu

Remote Sens. 2025, 17(9), 1622; https://doi.org/10.3390/rs17091622 - 2 May 2025

Viewed by 509

Abstract

Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and [...] Read more.

Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and orientation. However, these methods solve scale and orientation issues separately and ignore their deeper coupling relationships. When the scale features extracted by the network are significantly mismatched with the object, it is difficult for the detection head to effectively capture orientation of object, resulting in misalignment between object and bounding box. Therefore, we propose a one-stage detector—Scale First Refinement-Angle Detection Network (SFRADNet), which aims to fine-tune the rotation angle under precise scale feature matching. We introduce the Group Learning Large Kernel Network (GL²KNet) as the backbone of SFRADNet and employ a Shape-Aware Spatial Feature Extraction Module (SA-SFEM) as the primary component of the detection head. Specifically, within GL²KNet, we construct diverse receptive fields with varying dilation rates to capture features across different spatial coverage ranges. Building on this, we utilize multi-scale features within the layers and apply weighted aggregation based on a Scale Selection Matrix (SSMatrix). The SSMatrix dynamically adjusts the receptive field coverage according to the target size, enabling more refined selection of scale features. Based on precise scale features captured, we first design a Directed Guiding Box (DGBox) within the SA-SFEM, using its shape and position information to supervise the sampling points of the convolution kernels, thereby fitting them to deformations of object. This facilitates the extraction of orientation features near the object region, allowing for accurate refinement of both scale and orientation. Experiments show that our network achieves a mAP of 80.10% on the DOTA-v1.0 dataset, while reducing computational complexity compared to the baseline model. Full article

(This article belongs to the Special Issue Remote Sensing of Target Object Detection and Identification (Third Edition))

► Show Figures

Figure 1

24 pages, 11098 KiB

Open AccessArticle

PERE: Prior-Enhanced and Resolution-Extended Object Detection for Industrial Laminated Panel Scenes

by Haoyu Wang, Yiqiang Wu, Jinshuo Liang and Xie Xie

Appl. Sci. 2025, 15(8), 4468; https://doi.org/10.3390/app15084468 - 18 Apr 2025

Viewed by 436

Abstract

Laminated panels are widely used in industry, and their quality inspection has traditionally relied on manual labor, which is time-consuming and prone to errors. Automated detection can significantly improve efficiency and reduce human error. With prior knowledge, object detectors focus on updating model [...] Read more.

Laminated panels are widely used in industry, and their quality inspection has traditionally relied on manual labor, which is time-consuming and prone to errors. Automated detection can significantly improve efficiency and reduce human error. With prior knowledge, object detectors focus on updating model structures to improve performance. Despite initial success, most methods become increasingly complex and time-consuming for industrial applications while also neglecting the object distributions in the industrial dataset, especially in the context of industrial laminated panels. All these issues have led to missed and false detections of objects in this scene. We therefore propose a prior-enhanced resolution-extended object detector framework (PERE) for industrial scenarios to solve these issues while enhancing detection accuracy and efficiency. PERE explores the spatial connection of objects and seeks the latent information within the process of forward propagation. PERE introduces the prior-enhanced network (MRPE) and the resolution-extended network (REN) to replace initial modules in one-stage object detectors. MRPE extracts prior knowledge from the spatial distribution of objects in industrial scenes, migrating false detections caused by feature similarities. REN incorporates super-resolution information during the upsampling process to minimize the risk of missing tiny targets. At the same time, we have built a new dataset SPI for studying this topic. Comprehensive experiments show that PERE significantly improves efficiency and performance in object detection within industrial scenes. Full article

► Show Figures

Figure 1

20 pages, 4080 KiB

Open AccessArticle

LLM-WFIN: A Fine-Grained Large Language Model (LLM)-Oriented Website Fingerprinting Attack via Fusing Interrupt Trace and Network Traffic

by Jiajia Jiao, Hong Yang and Ran Wen

Electronics 2025, 14(7), 1263; https://doi.org/10.3390/electronics14071263 - 23 Mar 2025

Cited by 1 | Viewed by 1092

Abstract

Popular Large Language Models (LLMs) access uses website browsing and also faces website fingerprinting attacks. Website fingerprinting attacks have increasingly threatened website users to the leakage of browsing privacy. In addition to the often-used network traffic analysis, interrupt tracing exploits the microarchitectural side [...] Read more.

Popular Large Language Models (LLMs) access uses website browsing and also faces website fingerprinting attacks. Website fingerprinting attacks have increasingly threatened website users to the leakage of browsing privacy. In addition to the often-used network traffic analysis, interrupt tracing exploits the microarchitectural side channels to be a new compromising method and assists website fingerprinting attacks on non-LLM websites with up to 96.6% classification accuracy. More importantly, our observations show that LLM website access performs inherent defense and decreases the attack classification accuracy to 6.5%. This resistance highlights the need to develop new website fingerprinting attacks for LLM websites. Therefore, we propose a fine-grained LLM-oriented website fingerprinting attack via fusing interrupt trace and network traffic (LLM-WFIN) to identify the browsing website and the content type accurately. A prior-fusion-based one-stage classifier and post-fusion-based two-stage classifier are trained to enhance website fingerprinting attacks. The comprehensive results and ablation study on 25 popular LLM websites and varying machine learning methods demonstrate that LLM-WFIN using post-fusion achieves 97.2% attack classification accuracy with no defense and outperforms prior-fusion with 81.6% attack classification accuracy with effective defenses. Full article

(This article belongs to the Special Issue AI in Cybersecurity, 2nd Edition)

► Show Figures

Figure 1

16 pages, 4324 KiB

Open AccessArticle

A Two-Stage Corrosion Defect Detection Method for Substation Equipment Based on Object Detection and Semantic Segmentation

by Zhigao Wang, Xinsheng Lan, Yong Zhou, Fangqiang Wang, Mei Wang, Yang Chen, Guoliang Zhou and Qing Hu

Energies 2024, 17(24), 6404; https://doi.org/10.3390/en17246404 - 19 Dec 2024

Cited by 3 | Viewed by 792

Abstract

Corrosion defects will increase the risk of power equipment failure, which will directly affect the stable operation of power systems. Although existing methods can detect the corrosion of equipment, these methods are often poor in real-time. This study presents a two-stage detection approach [...] Read more.

Corrosion defects will increase the risk of power equipment failure, which will directly affect the stable operation of power systems. Although existing methods can detect the corrosion of equipment, these methods are often poor in real-time. This study presents a two-stage detection approach that combines YOLOv8 and DDRNet to achieve real-time and precise corrosion area localization. In the first stage, the YOLOv8 network is used to identify and locate substation equipment, and the detected ROI areas are passed to the DDRNet network in the second stage for semantic segmentation. To enhance the performance of both YOLOv8 and DDRNet, a multi-head attention block is integrated into their algorithms. Additionally, to address the challenge posed by the scarcity of corrosion defect samples, this study augmented the dataset using the cut-copy-paste method. Experimental results indicate that the improved YOLOv8 and DDRNet, incorporating the multi-head attention block, boost the mAP and mIoU by 5.8 and 9.7, respectively, when compared to the original method on our self-built dataset. These findings also validate the effectiveness of our data augmentation technique in enhancing the model’s detection accuracy for corrosion categories. Ultimately, the effectiveness of the proposed two-stage detection method in the real-time detection of substation equipment corrosion defects is verified, and it is 48.7% faster than the one-stage method. Full article

(This article belongs to the Topic Advances in Non-Destructive Testing Methods, 2nd Edition)

► Show Figures

Figure 1

18 pages, 12664 KiB

Open AccessArticle

The Modeling and Simulation of Non-Isolated DC–DC Converters for Optimizing Photovoltaic Systems Applied in Positive Energy Districts

by Tohid Hashemi and Hamed Jafari Kaleybar

Designs 2024, 8(6), 130; https://doi.org/10.3390/designs8060130 - 4 Dec 2024

Viewed by 1556

Abstract

DC–DC converters are critical for energy management in positive energy districts (PEDs) because they allow for efficient conversion between different voltage levels, enabling the integration of various renewable energy sources, energy storage systems, and loads. The demand for high-voltage gain DC–DC converters in [...] Read more.

DC–DC converters are critical for energy management in positive energy districts (PEDs) because they allow for efficient conversion between different voltage levels, enabling the integration of various renewable energy sources, energy storage systems, and loads. The demand for high-voltage gain DC–DC converters in photovoltaic power systems has surged in recent times. Despite the numerous converter topologies reported, there is a focused effort to streamline components, particularly switching devices, passive elements, and overall converter losses. This paper introduces the single switching impedance network (SSIN)-based converter as a unique DC–DC converter topology, designed in both one-stage and double-stage configurations for photovoltaic applications. One of the main characteristics of the SSIN converter is that it needs just one switch and three capacitors for the n-stage. A comparative analysis with conventional boost converter topology demonstrates the SSIN-based converter’s capability to achieve a desirable output voltage that closely approximates an ideal sine waveform. Furthermore, the application of advanced control strategies to the proposed converter highlights its superior performance and robustness in maintaining output voltage stability under varying conditions. These characteristics make the SSIN-based converter particularly well-suited for PED applications, where efficiency, reliability, and the seamless integration of renewable energy sources are crucial. Full article

(This article belongs to the Special Issue Design and Applications of Positive Energy Districts)

► Show Figures

Figure 1

14 pages, 4005 KiB

Open AccessArticle

A Directional Enhanced Adaptive Detection Framework for Small Targets

by Chao Li, Yifan Chang, Shimeng Yang, Kaiju Li and Guangqiang Yin

Electronics 2024, 13(22), 4535; https://doi.org/10.3390/electronics13224535 - 19 Nov 2024

Viewed by 665

Abstract

Due to the challenges posed by limited size and features, positional and noise issues, and dataset imbalance and simplicity, small object detection is one of the most challenging tasks in the field of object detection. Consequently, an increasing number of researchers are focusing [...] Read more.

Due to the challenges posed by limited size and features, positional and noise issues, and dataset imbalance and simplicity, small object detection is one of the most challenging tasks in the field of object detection. Consequently, an increasing number of researchers are focusing on this area. In this paper, we propose a Directional Enhanced Adaptive (DEA) detection framework for small targets. This framework effectively combines the detection accuracy advantages of two-stage methods with the detection speed advantages of one-stage methods. Additionally, we introduce a Multi-Scale Object Adaptive Slicing (MASA) module and an improved IoU-based aggregation module that integrate with this framework to enhance detection performance. For better comparison, we use the F1 score as one of the evaluation metrics. The experimental results demonstrate that our DEA framework improves the performance of various backbone detection networks and achieves better comprehensive detection performance than other proposed methods, even though our network has not been trained on the test dataset while others have. Full article

(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)

► Show Figures

Figure 1

23 pages, 14450 KiB

Open AccessArticle

Side-Scan Sonar Image Generation Under Zero and Few Samples for Underwater Target Detection

by Liang Li, Yiping Li, Hailin Wang, Chenghai Yue, Peiyan Gao, Yuliang Wang and Xisheng Feng

Remote Sens. 2024, 16(22), 4134; https://doi.org/10.3390/rs16224134 - 6 Nov 2024

Cited by 7 | Viewed by 2134

Abstract

The acquisition of side-scan sonar (SSS) images is complex, expensive, and time-consuming, making it difficult and sometimes impossible to obtain rich image data. Therefore, we propose a novel image generation algorithm to solve the problem of insufficient training datasets for SSS-based target detection. [...] Read more.

The acquisition of side-scan sonar (SSS) images is complex, expensive, and time-consuming, making it difficult and sometimes impossible to obtain rich image data. Therefore, we propose a novel image generation algorithm to solve the problem of insufficient training datasets for SSS-based target detection. For zero-sample detection, we proposed a two-step style transfer approach. The ray tracing method was first used to obtain an optically rendered image of the target. Subsequently, UA-CycleGAN, which combines U-net, soft attention, and HSV loss, was proposed for generating high-quality SSS images. A one-stage image-generation approach was proposed for few-sample detection. The proposed ADA-StyleGAN3 incorporates an adaptive discriminator augmentation strategy into StyleGAN3 to solve the overfitting problem of the generative adversarial network caused by insufficient training data. ADA-StyleGAN3 generated high-quality and diverse SSS images. In simulation experiments, the proposed image-generation algorithm was evaluated subjectively and objectively. We also compared the proposed algorithm with other classical methods to demonstrate its advantages. In addition, we applied the generated images to a downstream target detection task, and the detection results further demonstrated the effectiveness of the image generation algorithm. Finally, the generalizability of the proposed algorithm was verified using a public dataset. Full article

► Show Figures

Graphical abstract

8 pages, 2328 KiB

Open AccessProceeding Paper

Object Detection for Autonomous Logistics: A YOLOv4 Tiny Approach with ROS Integration and LOCO Dataset Evaluation

by Souhaila Khalfallah, Mohamed Bouallegue and Kais Bouallegue

Eng. Proc. 2024, 67(1), 65; https://doi.org/10.3390/engproc2024067065 - 12 Oct 2024

Cited by 4 | Viewed by 1544

Abstract

This paper presents an object detection model for logistics-centered objects deployed and used by autonomous warehouse robots. Using the Robot Operating System (ROS) infrastructure, our work leverages the set of provided models and a dataset to create a complex system that can meet [...] Read more.

This paper presents an object detection model for logistics-centered objects deployed and used by autonomous warehouse robots. Using the Robot Operating System (ROS) infrastructure, our work leverages the set of provided models and a dataset to create a complex system that can meet the guidelines of the Autonomous Mobile Robots (AMRs). We describe an innovative method, and the primary emphasis is placed on the Logistics Objects in Context (LOCO) dataset. The importance is on training the model and determining optimal performance and accuracy for the implemented object detection task. Using neural networks as pattern recognition tools, we took advantage of the one-stage detection architecture YOLO that prioritizes speed and accuracy. Focusing on a lightweight variant of this architecture, YOLOv4 Tiny, we were able to optimize for deployment on resource-constrained edge devices without compromising detection accuracy, resulting in a significant performance boost over previous benchmarks. The YOLOv4 Tiny model was implemented with Darknet, especially for its adaptability to ROS Melodic framework and capability to fit edge devices. Notably, our network achieved a mean average precision (mAP) of 46% and an intersection over union (IoU) of 50%, surpassing the baseline metrics established by the initial LOCO study. These results demonstrate a significant improvement in performance and accuracy for real-world logistics applications of AMRs. Our contribution lies in providing valuable insights into the capabilities of AMRs within the logistics environment, thus paving the way for further advancements in this field. Full article

(This article belongs to the Proceedings of The 3rd International Electronic Conference on Processes)

► Show Figures

Figure 1

21 pages, 3267 KiB

Open AccessArticle

Attention-Guided Sample-Based Feature Enhancement Network for Crowded Pedestrian Detection Using Vision Sensors

by Shuyuan Tang, Yiqing Zhou, Jintao Li, Chang Liu and Jinglin Shi

Sensors 2024, 24(19), 6350; https://doi.org/10.3390/s24196350 - 30 Sep 2024

Viewed by 1157

Abstract

Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded [...] Read more.

Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded occlusion patterns critically limit the efficacy of both one-stage and two-stage pedestrian detectors, leading to suboptimal detection performance. To address this, we introduce a novel architecture termed the Attention-Guided Feature Enhancement Network (AGFEN), designed within the deep convolutional neural network framework. AGFEN improves the semantic information of high-level features by mapping it onto low-level feature details through sampling, creating an effect comparable to mask modulation. This technique enhances both channel-level and spatial-level features concurrently without incurring additional annotation costs. Furthermore, we transition from a traditional one-to-one correspondence between proposals and predictions to a one-to-multiple paradigm, facilitating non-maximum suppression using the prediction set as the fundamental unit. Additionally, we integrate these methodologies by aggregating local features between regions of interest (RoI) through the reuse of classification weights, effectively mitigating false positives. Our experimental evaluations on three widely used datasets demonstrate that AGFEN achieves a 2.38% improvement over the baseline detector on the CrowdHuman dataset, underscoring its effectiveness and potential for advancing pedestrian detection technologies. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 14147 KiB

Open AccessArticle

Few-Shot Object Detection for Remote Sensing Imagery Using Segmentation Assistance and Triplet Head

by Jing Zhang, Zhaolong Hong, Xu Chen and Yunsong Li

Remote Sens. 2024, 16(19), 3630; https://doi.org/10.3390/rs16193630 - 29 Sep 2024

Cited by 9 | Viewed by 3416

Abstract

The emergence of few-shot object detection provides a new approach to address the challenge of poor generalization ability due to data scarcity. Currently, extensive research has been conducted on few-shot object detection in natural scene datasets, and notable progress has been made. However, [...] Read more.

The emergence of few-shot object detection provides a new approach to address the challenge of poor generalization ability due to data scarcity. Currently, extensive research has been conducted on few-shot object detection in natural scene datasets, and notable progress has been made. However, in the realm of remote sensing, this technology is still lagging behind. Furthermore, many established methods rely on two-stage detectors, prioritizing accuracy over speed, which hinders real-time applications. Considering both detection accuracy and speed, in this paper, we propose a simple few-shot object detection method based on the one-stage detector YOLOv5 with transfer learning. First, we propose a Segmentation Assistance (SA) module to guide the network’s attention toward foreground targets. This module assists in training and enhances detection accuracy without increasing inference time. Second, we design a novel detection head called the Triplet Head (Tri-Head), which employs a dual distillation mechanism to mitigate the issue of forgetting base-class knowledge. Finally, we optimize the classification loss function to emphasize challenging samples. Evaluations on the NWPUv2 and DIOR datasets showcase the method’s superiority. Full article

► Show Figures

Figure 1

Search Results (162)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (162)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI