MDPI - Publisher of Open Access Journals

29 pages, 2924 KB

Open AccessArticle

Driven by Deformable Convolution and Multi-Plane Scale Constraint: A Hazy Image Dehazing–Stitching System

by Sheng Hu, Han Xiao, Cong Liu, Haina Song, Min Liu, Liang Li and Hongzhang Liu

Sensors 2026, 26(5), 1551; https://doi.org/10.3390/s26051551 - 1 Mar 2026

Viewed by 385

Adverse weather conditions, such as fog, degrade image quality and affect the performance of deep learning-based image processing algorithms, whereas advanced driver assistance systems (ADASs) urgently demand image clarity and large-field-of-view perception in foggy environments. Existing image dehazing methods rarely consider the non-uniform [...] Read more.

Adverse weather conditions, such as fog, degrade image quality and affect the performance of deep learning-based image processing algorithms, whereas advanced driver assistance systems (ADASs) urgently demand image clarity and large-field-of-view perception in foggy environments. Existing image dehazing methods rarely consider the non-uniform and dense distribution of particles in fog, leading to severe attenuation of background information. Image stitching, owing to the low-brightness and low-texture characteristics of ADAS scenarios and differences between sensors, faces challenges such as difficult feature point extraction and matching and poor stitching quality. To address these issues, this study proposes a non-uniform dehazing method based on Deformable Convolution v4 (DCNv4), designing a DCNv4-based transform-like network to achieve long-range dependence and adaptive spatial aggregation, combined with a lightweight Retinex-inspired Transformer for color correction and structure refinement. Meanwhile, a multi-plane scale constraint module is introduced based on the LightGlue feature matching network to improve matching accuracy and homography matrix estimation precision, and an adaptive fusion stitching method is adopted to eliminate artifacts and transition zones. Experimental results show that the proposed method effectively improves feature matching accuracy and homography matrix calculation precision, achieving Peak Signal-to-Noise Ratios (PSNRs) of 22.78 dB and 24.34 dB on the NH-HAZE and BRAS datasets, respectively, which are superior to those of existing methods. This provides a reliable environmental perception solution for autonomous driving in foggy environments, verifying its effectiveness and practicality. Full article

(This article belongs to the Special Issue Image Processing and Pattern Recognition Based on Deep Learning for Sensing Applications—3rd Edition)

► Show Figures

Figure 1

25 pages, 3298 KB

Open AccessArticle

FDE-YOLO: An Improved Algorithm for Small Target Detection in UAV Images

by Jialiang Li, Xu Guo, Xu Zhao and Jie Jin

Mathematics 2026, 14(4), 663; https://doi.org/10.3390/math14040663 - 13 Feb 2026

Viewed by 610

Abstract

Accurate small object detection in unmanned aerial vehicle (UAV) imagery is fundamental to numerous safety-critical applications, including intelligent transportation, urban surveillance, and disaster assessment. However, extreme scale compression, dense object distributions, and complex backgrounds severely constrain the feature representation capability of existing detectors, [...] Read more.

Accurate small object detection in unmanned aerial vehicle (UAV) imagery is fundamental to numerous safety-critical applications, including intelligent transportation, urban surveillance, and disaster assessment. However, extreme scale compression, dense object distributions, and complex backgrounds severely constrain the feature representation capability of existing detectors, leading to degraded reliability in real-world deployments. To overcome these limitations, we propose FDE-YOLO, a lightweight yet high-performance detection framework built upon YOLOv11 with three complementary architectural innovations. The Fine-Grained Detection Pyramid (FGDP) integrates space-to-depth convolution with a CSP-MFE module that fuses multi-granularity features through parallel local, context, and global branches, capturing comprehensive small target information while avoiding computational overhead from layer stacking. The Dynamic Detection Fusion Head (DDFHead) unifies scale-aware, spatial-aware, and task-aware attention mechanisms via sequential refinement with DCNv4 and FReLU activation, adaptively enhancing discriminative capability for densely clustered targets in complex scenes. The EdgeSpaceNet module explicitly fuses Sobel-extracted boundary features with spatial convolution outputs through residual connections, recovering edge details typically lost in standard operations while reducing parameter count via depthwise separable convolutions. Extensive experiments on the VisDrone2019 dataset demonstrate that FDE-YOLO achieves 53.6% precision, 42.5% recall, 43.3% mAP50, and 26.3% mAP50:95, surpassing YOLOv11s by 2.8%, 4.4%, 4.1%, and 2.8% respectively, with only 10.25 M parameters. The proposed approach outperforms UAV-specialized methods including Drone-YOLO and MASF-YOLO while using significantly fewer parameters (37.5% and 29.8% reductions respectively), demonstrating superior efficiency. Cross-dataset evaluations on UAV-DT and NWPU VHR-10 further confirm strong generalization capability with 1.6% and 1.5% mAP50 improvements respectively, validating FDE-YOLO as an effective and efficient solution for reliable UAV-based small object detection in real-world scenarios. Full article

(This article belongs to the Special Issue New Advances in Image Processing and Computer Vision)

► Show Figures

Figure 1

27 pages, 4033 KB

Open AccessArticle

DCDW-YOLOv11: An Intelligent Defect-Detection Method for Key Transmission-Line Equipment

by Dezhi Wang, Riqing Song, Minghui Liu, Xingqian Wang, Chengyu Zhang, Ziang Wang and Dongxue Zhao

Sensors 2026, 26(3), 1029; https://doi.org/10.3390/s26031029 - 4 Feb 2026

Viewed by 521

Abstract

The detection of defects in key transmission-line equipment under complex environments often suffers from insufficient accuracy and reliability due to background interference and multi-scale feature variations. To address this issue, this paper proposes an improved defect detection model based on YOLOv11, named DCDW-YOLOv11. [...] Read more.

The detection of defects in key transmission-line equipment under complex environments often suffers from insufficient accuracy and reliability due to background interference and multi-scale feature variations. To address this issue, this paper proposes an improved defect detection model based on YOLOv11, named DCDW-YOLOv11. The model introduces deformable convolution C2f_DCNv3 in the backbone network to enhance adaptability to geometric deformations of targets, and incorporates the convolutional block attention module (CBAM) to highlight defect features while suppressing background interference. In the detection head, a dynamic head structure (DyHead) is adopted to achieve cross-layer multi-scale feature fusion and collaborative perception, along with the WIoU loss function to optimize bounding box regression and sample weight allocation. Experimental results demonstrate that on the transmission-line equipment defect dataset, DCDW-YOLOv11 achieves an accuracy, recall, and mAP of 94.4%, 92.8%, and 96.3%, respectively, representing improvements of 2.8%, 7.0%, and 4.4% over the original YOLOv11, and outperforming other mainstream detection models. The proposed method can provide high-precision and highly reliable defect detection support for intelligent inspection of transmission lines in complex scenarios. Full article

(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 3rd Edition)

► Show Figures

Figure 1

19 pages, 42892 KB

Open AccessArticle

DMR-YOLO: An Improved Wind Turbine Blade Surface Damage Detection Method Based on YOLOv8

by Lijuan Shi, Sifan Wang, Jian Zhao, Zhejun Kuang, Liu Wang, Lintao Ma, Han Yang and Haiyan Wang

Appl. Sci. 2026, 16(3), 1333; https://doi.org/10.3390/app16031333 - 28 Jan 2026

Cited by 1 | Viewed by 446

Abstract

Wind turbine blades (WTBs) are inevitably exposed to harsh environmental conditions, leading to surface damages such as cracks and corrosion that compromise power generation efficiency. While UAV-based inspection offers significant potential, it frequently encounters challenges in handling irregular defect shapes and preserving fine [...] Read more.

Wind turbine blades (WTBs) are inevitably exposed to harsh environmental conditions, leading to surface damages such as cracks and corrosion that compromise power generation efficiency. While UAV-based inspection offers significant potential, it frequently encounters challenges in handling irregular defect shapes and preserving fine edge details. To address these limitations, this paper proposes DMR-YOLO, an Improved Wind Turbine Blade Surface Damage Detection Method Based on YOLOv8. The proposed framework incorporates three key innovations: First, a C2f-DCNv2-MPCA module is designed to dynamically adjust feature weights, enabling the model to more effectively focus on the geometric structural details of irregular defects. Secondly, a Multi-Scale Edge Perception Enhancement (MEPE) module is introduced to extract edge textures directly within the network. This approach prevents the decoupling of edge features from global context information, effectively resolving the issue of edge information loss and enhancing the recognition of small targets. Finally, the detection head is optimized using a Re-parameterized Shared Convolution Detection Head (RSCD) strategy. By employing weight sharing combined with Diverse Branch Blocks (DBB), this design significantly reduces computational redundancy while maintaining high localization accuracy. Experimental results demonstrate that DMR-YOLO outperforms the baseline YOLOv8n, achieving a 1.8% increase in mAP@0.5 to 82.2%, with a notable 3.2% improvement in the “damage” category. Furthermore, the computational load is reduced by 9.9% to 7.3 GFLOPs, while maintaining an inference speed of 92.6 FPS, providing an effective solution for real-time wind farm defect detection. Full article

► Show Figures

Figure 1

23 pages, 3475 KB

Open AccessArticle

YOLO-GSD-seg: YOLO for Guide Rail Surface Defect Segmentation and Detection

by Shijun Lai, Zuoxi Zhao, Yalong Mi, Kai Yuan and Qian Wang

Appl. Sci. 2026, 16(3), 1261; https://doi.org/10.3390/app16031261 - 26 Jan 2026

Viewed by 574

Abstract

To address the challenges of accurately extracting features from elongated scratches, irregular defects, and small-scale surface flaws on high-precision linear guide rails, this paper proposes a novel instance segmentation algorithm tailored for guide rail surface defect detection. The algorithm integrates the YOLOv8 instance [...] Read more.

To address the challenges of accurately extracting features from elongated scratches, irregular defects, and small-scale surface flaws on high-precision linear guide rails, this paper proposes a novel instance segmentation algorithm tailored for guide rail surface defect detection. The algorithm integrates the YOLOv8 instance segmentation framework with deformable convolutional networks and multi-scale feature fusion to enhance defect feature extraction and segmentation performance. A dedicated guide rail surface Defect (GSD) segmentation dataset is constructed to support model training and evaluation. In the backbone, the DCNv3 module is incorporated to strengthen the extraction of elongated and irregular defect features while simultaneously reducing model parameters. In the feature fusion network, a multi-scale feature fusion module and a triple-feature encoding module are introduced to jointly capture global contextual information and preserve fine-grained local defect details. Furthermore, a Channel and Position Attention Module (CPAM) is employed to integrate global and local features, improving the model’s sensitivity to channel and positional cues of small-target defects and thereby enhancing segmentation accuracy. Experimental results show that, compared with the original YOLOv8n-Seg, the proposed method achieves improvements of 3.9% and 3.8% in Box and Mask mAP₅₀, while maintaining a real-time inference speed of 148 FPS. Additional evaluations on the public MSD dataset further demonstrate the model’s strong versatility and robustness. Full article

(This article belongs to the Special Issue Deep Learning-Based Computer Vision Technology and Its Applications)

► Show Figures

Figure 1

24 pages, 2902 KB

Open AccessArticle

Research on Prolonged Violation Behavior Recognition in Construction Sites Based on Artificial Intelligence

by Kai Yu, Zhenyue Wang, Lujie Zhou, Xuesong Yang, Zhaoxiang Mu and Tianyu Wang

Symmetry 2026, 18(1), 204; https://doi.org/10.3390/sym18010204 - 22 Jan 2026

Viewed by 435

Abstract

Prolonged violation behavior is characterized by sustained temporal presence, slow action changes, and similarity to normal behavior. Due to the complex construction environment, intelligent recognition algorithms face significant challenges. This paper proposes an improved YOLOv8-based model, DGEA-YOLOv8, to address these issues, using “playing [...] Read more.

Prolonged violation behavior is characterized by sustained temporal presence, slow action changes, and similarity to normal behavior. Due to the complex construction environment, intelligent recognition algorithms face significant challenges. This paper proposes an improved YOLOv8-based model, DGEA-YOLOv8, to address these issues, using “playing with mobile phones” as a case study. The model integrates the DCNv3 module in the backbone to enhance behavior deformation adaptability and the GELAN module to improve lightweight performance and global perception in resource-limited environments. An ECA attention mechanism is added to enhance small target detection, while the ASPP module boosts multi-scale perception. ByteTrack is incorporated for continuous tracking of prolonged violation behavior in construction scenarios. Experimental results show that DGEA-YOLOv8 achieves 94.5% mAP50, a 2.95% improvement over the YOLOv8s baseline, with better data capture rates and lower ID change rates compared to algorithms like Deepsort and Strongsort. A construction-specific dataset of over 3000 images verifies the model’s effectiveness. From the perspective of data symmetry, the proposed model demonstrates strong capability in addressing asymmetric feature distributions and behavioral imbalance inherent in prolonged violations, restoring spatiotemporal consistency in detection. In conclusion, DGEA-YOLOv8 provides a precise, efficient, and adaptive solution for recognizing prolonged violation behaviors in construction sites. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

25 pages, 5001 KB

Open AccessArticle

SAR-to-Optical Remote Sensing Image Translation Method Based on InternImage and Cascaded Multi-Head Attention

by Cheng Xu and Yingying Kong

Remote Sens. 2026, 18(1), 55; https://doi.org/10.3390/rs18010055 - 24 Dec 2025

Viewed by 809

Abstract

Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has [...] Read more.

Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has become a research hotspot in recent years to enhance the interpretability of SAR images. This paper proposes a deep learning-based method for SAR-to-optical remote sensing image translation. The network comprises three parts: a global representor, a generator with cascaded multi-head attention, and a multi-scale discriminator. The global representor, built upon InternImage with deformable convolution v3 (DCNv3) as its core operator, leverages its global receptive field and adaptive spatial aggregation capabilities to extract global semantic features from SAR images. The generator follows the classic “encoder-bottleneck-decoder” structure, where the encoder focuses on extracting local detail features from SAR images. The cascaded multi-head attention module within the bottleneck layer optimizes local detail features and facilitates feature interaction between global semantics and local details. The discriminator adopts a multi-scale structure based on the local receptive field PatchGAN, enabling joint global and local discrimination. Furthermore, for the first time in SAR image translation tasks, structural similarity index metric (SSIM) loss is combined with adversarial loss, perceptual loss, and feature matching loss as the loss function. A series of experiments demonstrate the effectiveness and reliability of the proposed method. Compared to mainstream image translation methods, our method ultimately generates higher-quality optical remote sensing images that are semantically consistent, texturally authentic, clearly detailed, and visually reasonable appearances. Full article

► Show Figures

Figure 1

18 pages, 3847 KB

Open AccessArticle

Research on the Detection of Ocean Internal Waves Based on the Improved Faster R-CNN in SAR Images

by Gaoyuan Shen, Zhi Zeng, Hao Huang, Zhifan Jiao and Jun Song

J. Mar. Sci. Eng. 2026, 14(1), 23; https://doi.org/10.3390/jmse14010023 - 23 Dec 2025

Viewed by 689

Abstract

Ocean internal waves occur in stably stratified seawater and play a crucial role in energy cascade, material transport, and military activities. However, the complex and irregular spatial patterns of internal waves pose significant challenges for accurate detection in SAR images when using conventional [...] Read more.

Ocean internal waves occur in stably stratified seawater and play a crucial role in energy cascade, material transport, and military activities. However, the complex and irregular spatial patterns of internal waves pose significant challenges for accurate detection in SAR images when using conventional convolutional neural networks, which often lack adaptability to geometric variations. To address this problem, this paper proposes a refined Faster R-CNN detection framework, termed “rFaster R-CNN”, and adopts a transfer learning strategy to enhance model generalization and robustness. In the feature extraction stage, a backbone network called “ResNet50_CDCN” that integrates the CBAM attention mechanism and DCNv2 deformable convolution is constructed to enhance the feature expression ability of key regions in the images. Experimental results show that in the internal wave dataset constructed in this paper, this network improves the detection accuracy by approximately 3% compared to the original ResNet50 network. At the region proposal stage, this paper further adds two small-scale anchors and combines the ROI Align and FPN modules, effectively enhancing the spatial hierarchical information and semantic expression ability of ocean internal waves. compared with classical object detection algorithms such as SSD, YOLO, and RetinaNet, the proposed “rFaster R-CNN” achieves superior detection performance, showing significant improvements in both accuracy and robustness. Full article

(This article belongs to the Special Issue Artificial Intelligence and Its Application in Ocean Engineering)

► Show Figures

Figure 1

26 pages, 20666 KB

Open AccessArticle

DRC²-Net: A Context-Aware and Geometry-Adaptive Network for Lightweight SAR Ship Detection

by Abdelrahman Yehia, Naser El-Sheimy, Ashraf Helmy, Ibrahim Sh. Sanad and Mohamed Hanafy

Sensors 2025, 25(22), 6837; https://doi.org/10.3390/s25226837 - 8 Nov 2025

Viewed by 711

Abstract

Synthetic Aperture Radar (SAR) ship detection remains challenging due to background clutter, target sparsity, and fragmented or partially occluded ships, particularly at small scales. To address these issues, we propose the Deformable Recurrent Criss-Cross Attention Network (

{DRC}^{2}

-Net), a lightweight and [...] Read more.

Synthetic Aperture Radar (SAR) ship detection remains challenging due to background clutter, target sparsity, and fragmented or partially occluded ships, particularly at small scales. To address these issues, we propose the Deformable Recurrent Criss-Cross Attention Network (

{DRC}^{2}

-Net), a lightweight and efficient detection framework built upon the YOLOX-Tiny architecture. The model incorporates two SAR-specific modules: a Recurrent Criss-Cross Attention (RCCA) module to enhance contextual awareness and reduce false positives and a Deformable Convolutional Networks v2 (DCNv2) module to capture geometric deformations and scale variations adaptively. These modules expand the Effective Receptive Field (ERF) and improve feature adaptability under complex conditions. DRC²-Net is trained on the SSDD and iVision-MRSSD datasets, encompassing highly diverse SAR imagery including inshore and offshore scenes, variable sea states, and complex coastal backgrounds. The model maintains a compact architecture with 5.05 M parameters, ensuring strong generalization and real-time applicability. On the SSDD dataset, it outperforms the YOLOX-Tiny baseline with AP@50 of 93.04% (+0.9%),

{AP}_{s}

of 91.15% (+1.31%),

{AP}_{m}

of 88.30% (+1.22%), and

{AP}_{l}

of 89.47% (+13.32%). On the more challenging iVision-MRSSD dataset, it further demonstrates improved scale-aware detection, achieving higher AP across small, medium, and large targets. These results confirm the effectiveness and robustness of

{DRC}^{2}

-Net for multi-scale ship detection in complex SAR environments, consistently surpassing state-of-the-art detectors. Full article

(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Methods and Applications—2nd Edition)

► Show Figures

Figure 1

21 pages, 4796 KB

Open AccessArticle

Real-Time Lightweight Vehicle Object Detection via Layer-Adaptive Model Pruning

by Yu Zhang, Junhui Zhang, Feng Du, Wenjie Kang, Cen Wang and Guofei Li

Electronics 2025, 14(21), 4149; https://doi.org/10.3390/electronics14214149 - 23 Oct 2025

Cited by 2 | Viewed by 1273

Abstract

With the rapid advancement in autonomous driving technology, vehicle object detection has become a crucial component of perception systems, where accuracy and inference speed directly influence driving safety. To address the limitations of existing lightweight detection models in small-object perception and deployment efficiency, [...] Read more.

With the rapid advancement in autonomous driving technology, vehicle object detection has become a crucial component of perception systems, where accuracy and inference speed directly influence driving safety. To address the limitations of existing lightweight detection models in small-object perception and deployment efficiency, this study proposes an enhanced YOLOv8n-based framework, termed YOLOv8n-ALM. The proposed model integrates Mixed Local Channel Attention (MLCA), a Task-Aligned Dynamic Detection Head (TADDH), and Layer-Adaptive Magnitude-based Pruning (LAMP). Specifically, MLCA enhances the representation of salient regions, TADDH aligns classification and regression tasks while leveraging DCNv2 for improved spatial adaptability, and LAMP compresses the network to accelerate inference. Experiments conducted on the KITTI dataset demonstrate that YOLOv8n-ALM improves mAP@0.5 by 2.2% and precision by 5.8%, while reducing parameters by 65.33% and computational load by 29.63%. These results underscore the proposed method’s capability to achieve real-time, compact, and accurate vehicle detection, demonstrating strong potential for deployment in intelligent vehicles and embedded systems. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection and Tracking)

► Show Figures

Figure 1

18 pages, 10539 KB

Open AccessArticle

Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO

by Tao Hu, Jinbo Qiu, Libo Zheng, Zehai Yu and Cong Liu

Electronics 2025, 14(20), 4132; https://doi.org/10.3390/electronics14204132 - 21 Oct 2025

Viewed by 724

Abstract

To address the challenges of low illumination, heavy dust, and severe occlusion in fully mechanized mining faces, this paper proposes a shearer drum detection algorithm named DCS-YOLO. To enhance the model’s ability to effectively capture features under drum deformation and occlusion, a C3k2_DCNv4 [...] Read more.

To address the challenges of low illumination, heavy dust, and severe occlusion in fully mechanized mining faces, this paper proposes a shearer drum detection algorithm named DCS-YOLO. To enhance the model’s ability to effectively capture features under drum deformation and occlusion, a C3k2_DCNv4 module based on deformable convolution (DCNv4) is incorporated into the network. This module adaptively adjusts convolution sampling points according to the drum’s size and position, enabling efficient and precise multi-scale feature extraction. To overcome the limitations of conventional convolution in global feature modeling, a convolution and attention fusion module (CAFM) is constructed, which combines lightweight convolution with attention mechanisms to selectively reweight feature maps at different resolutions. Under low-light conditions, the Shape-IoU loss function is employed to achieve accurate regression of irregular drum boundaries while considering both positional and shape similarity. In addition, GSConv is adopted to achieve model lightweighting while maintaining efficient feature extraction capability. Experiments were conducted on a dataset built from shearer drum images collected in underground coal mines. The results demonstrate that, compared with YOLOv11n, the proposed method reduces Params and Flops by 7.7% and 4.6%, respectively, while improving precision, recall, mAP@0.5, and mAP@0.5:0.95 by 2.9%, 3.2%, 1.1%, and 3.3%, respectively. These findings highlight the significant advantages of the proposed approach in both model lightweighting and detection performance. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 49845 KB

Open AccessArticle

DDF-YOLO: A Small Target Detection Model Using Multi-Scale Dynamic Feature Fusion for UAV Aerial Photography

by Ziang Ma, Chao Wang, Chuanzhi Chen, Jinbao Chen and Guang Zheng

Aerospace 2025, 12(10), 920; https://doi.org/10.3390/aerospace12100920 - 13 Oct 2025

Cited by 2 | Viewed by 1557

Abstract

Unmanned aerial vehicle (UAV)-based object detection shows promising potential in intelligent transportation and disaster response. However, detecting small targets remains challenging due to inherent limitations (long-distance and low-resolution imaging) and environmental interference (complex backgrounds and occlusions). To address these issues, this paper proposes [...] Read more.

Unmanned aerial vehicle (UAV)-based object detection shows promising potential in intelligent transportation and disaster response. However, detecting small targets remains challenging due to inherent limitations (long-distance and low-resolution imaging) and environmental interference (complex backgrounds and occlusions). To address these issues, this paper proposes an enhanced small target detection model, DDF-YOLO, which achieves higher detection performance. First, a dynamic feature extraction module (C2f-DCNv4) employs deformable convolutions to effectively capture features from irregularly shaped objects. In addition, a dynamic upsampling module (DySample) optimizes multi-scale feature fusion by combining shallow spatial details with deep semantic features, preserving critical low-level information while enhancing generalization across scales. Finally, to balance rapid convergence with precise localization, an adaptive Focaler-ECIoU loss function dynamically adjusts training weights based on sample quality during bounding box regression. Extensive experiments on VisDrone2019 and UAVDT benchmarks demonstrate DDF-YOLO’s superiority. Compared to YOLOv8n, our model achieves gains of 8.6% and 4.8% in mAP50, along with improvements of 5.0% and 3.3% in mAP50-95, respectively. Furthermore, it exhibits superior efficiency, requiring only 7.3 GFLOPs and attaining an inference speed of 179 FPS. These results validate the model’s robustness for UAV-based detection, particularly in small-object scenarios. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

16 pages, 7184 KB

Open AccessArticle

Towards Robust Scene Text Recognition: A Dual Correction Mechanism with Deformable Alignment

by Yajiao Feng and Changlu Li

Electronics 2025, 14(19), 3968; https://doi.org/10.3390/electronics14193968 - 9 Oct 2025

Viewed by 892

Abstract

Scene Text Recognition (STR) faces significant challenges under complex degradation conditions, such as distortion, occlusion, and semantic ambiguity. Most existing methods rely heavily on language priors for correction, but effectively constructing language rules remains a complex problem. This paper addresses two key challenges: [...] Read more.

Scene Text Recognition (STR) faces significant challenges under complex degradation conditions, such as distortion, occlusion, and semantic ambiguity. Most existing methods rely heavily on language priors for correction, but effectively constructing language rules remains a complex problem. This paper addresses two key challenges: (1) The over-correction behavior of language models, particularly on semantically deficient input, can result in both recognition errors and loss of critical information. (2) Character misalignment in visual features, which affects recognition accuracy. To address these problems, we propose a Deformable-Alignment-based Dual Correction Mechanism (DADCM) for STR. Our method includes the following key components: (1) We propose a visually guided and language-assisted correction strategy. A dynamic confidence threshold is used to control the degree of language model intervention. (2) We designed a visual backbone network called SCRTNet. The net enhances key text regions through a channel attention module (SENet) and applies deformable convolution (DCNv4) in deep layers to better model distorted or curved text. (3) We propose a deformable alignment module (DAM). The module combines Gumbel-Softmax-based anchor sampling and geometry-aware self-attention to improve character alignment. Experiments on multiple benchmark datasets demonstrate the superiority of our approach. Especially on the Union14M-Benchmark, where the recognition accuracy surpasses previous methods by 1.1%, 1.6%, 3.0%, and 1.3% on the Curved, Multi-Oriented, Contextless, and General subsets, respectively. Full article

► Show Figures

Figure 1

20 pages, 7048 KB

Open AccessArticle

Enhanced Lightweight Object Detection Model in Complex Scenes: An Improved YOLOv8n Approach

by Sohaya El Hamdouni, Boutaina Hdioud and Sanaa El Fkihi

Information 2025, 16(10), 871; https://doi.org/10.3390/info16100871 - 8 Oct 2025

Cited by 1 | Viewed by 1603

Abstract

Object detection has a vital impact on the analysis and interpretation of visual scenes. It is widely utilized in various fields, including healthcare, autonomous driving, and vehicle surveillance. However, complex scenes containing small, occluded, and multiscale objects present significant difficulties for object detection. [...] Read more.

Object detection has a vital impact on the analysis and interpretation of visual scenes. It is widely utilized in various fields, including healthcare, autonomous driving, and vehicle surveillance. However, complex scenes containing small, occluded, and multiscale objects present significant difficulties for object detection. This paper introduces a lightweight object detection algorithm, utilizing YOLOv8n as the baseline model, to address these problems. Our method focuses on four steps. Firstly, we add a layer for small object detection to enhance the feature expression capability of small objects. Secondly, to handle complex forms and appearances, we employ the C2f-DCNv2 module. This module integrates advanced DCNv2 (Deformable Convolutional Networks v2) by substituting the final C2f module in the backbone. Thirdly, we designed the CBAM, a lightweight attention module. We integrate it into the neck section to address missed detections. Finally, we use Ghost Convolution (GhostConv) as a light convolutional layer. This alternates with ordinary convolution in the neck. It ensures good detection performance while decreasing the number of parameters. Experimental performance on the PASCAL VOC dataset demonstrates that our approach lowers the number of model parameters by approximately 9.37%. The mAP@0.5:0.95 increased by 0.9%, recall (R) increased by 0.8%, mAP@0.5 increased by 0.3%, and precision (P) increased by 0.1% compared to the baseline model. To better evaluate the model’s generalization performance in real-world driving scenarios, we conducted additional experiments using the KITTI dataset. Compared to the baseline model, our approach yielded a 0.8% improvement in mAP@0.5 and 1.3% in mAP@0.5:0.95. This result indicates strong performance in more dynamic and challenging conditions. Full article

(This article belongs to the Special Issue Addressing Real-World Challenges in Recognition and Classification with Cutting-Edge AI Models and Methods)

► Show Figures

Graphical abstract

17 pages, 2172 KB

Open AccessArticle

GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery

by Zhiyong Ju, Jiacheng Shui and Jiameng Huang

Electronics 2025, 14(19), 3831; https://doi.org/10.3390/electronics14193831 - 27 Sep 2025

Cited by 2 | Viewed by 1962

Abstract

To enhance small object detection in UAV aerial imagery suffering from low resolution and complex backgrounds, this paper proposes GLDS-YOLO, an improved lightweight detection model. The model integrates four core modules: Group Shuffle Attention (GSA) to strengthen small-scale feature perception, Large Separable Kernel [...] Read more.

To enhance small object detection in UAV aerial imagery suffering from low resolution and complex backgrounds, this paper proposes GLDS-YOLO, an improved lightweight detection model. The model integrates four core modules: Group Shuffle Attention (GSA) to strengthen small-scale feature perception, Large Separable Kernel Attention (LSKA) to capture global semantic context, DCNv4 to enhance feature adaptability with reduced parameters, and further proposes a novel Small-object-enhanced Multi-scale and Structure Detail Enhancement (SMSDE) module, which enhances edge-detail representation of small objects while maintaining lightweight efficiency. Experiments on VisDrone2019 and DOTA1.0 demonstrate that GLDS-YOLO achieves superior detection performance. On VisDrone2019, it improves mAP@0.5 and mAP@0.5:0.95 by 12.1% and 7%, respectively, compared with YOLOv11n, while maintaining competitive results on DOTA. These results confirm the model’s effectiveness, robustness, and adaptability for complex small object detection tasks in UAV scenarios. Full article

► Show Figures

Figure 1

Search Results (61)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (61)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI