MDPI - Publisher of Open Access Journals

22 pages, 3892 KB

Open AccessArticle

Structure-Aware Progressive Multi-Modal Fusion Network for RGB-T Crack Segmentation

by Zhengrong Yuan, Xin Ding, Xinhong Xia, Yibin He, Hui Fang, Bo Yang and Wei Fu

J. Imaging 2025, 11(11), 384; https://doi.org/10.3390/jimaging11110384 (registering DOI) - 1 Nov 2025

Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability [...] Read more.

Crack segmentation in images plays a pivotal role in the monitoring of structural surfaces, serving as a fundamental technique for assessing structural integrity. However, existing methods that rely solely on RGB images exhibit high sensitivity to light conditions, which significantly restricts their adaptability in complex environmental scenarios. To address this, we propose a structure-aware progressive multi-modal fusion network (SPMFNet) for RGB-thermal (RGB-T) crack segmentation. The main idea is to integrate complementary information from RGB and thermal images and incorporate structural priors (edge information) to achieve accurate segmentation. Here, to better fuse multi-layer features from different modalities, a progressive multi-modal fusion strategy is designed. In the shallow encoder layers, two gate control attention (GCA) modules are introduced to dynamically regulate the fusion process through a gating mechanism, allowing the network to adaptively integrate modality-specific structural details based on the input. In the deeper layers, two attention feature fusion (AFF) modules are employed to enhance semantic consistency by leveraging both local and global attention, thereby facilitating the effective interaction and complementarity of high-level multi-modal features. In addition, edge prior information is introduced to encourage the predicted crack regions to preserve structural integrity, which is constrained by a joint loss of edge-guided loss, multi-scale focal loss, and adaptive fusion loss. Experimental results on publicly available RGB-T crack detection datasets demonstrate that the proposed method outperforms both classical and advanced approaches, verifying the effectiveness of the progressive fusion strategy and the utilization of the structural prior. Full article

(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))

► Show Figures

Figure 1

19 pages, 2289 KB

Open AccessArticle

Real-Time Detection and Segmentation of Oceanic Whitecaps via EMA-SE-ResUNet

by Wenxuan Chen, Yongliang Wei and Xiangyi Chen

Electronics 2025, 14(21), 4286; https://doi.org/10.3390/electronics14214286 (registering DOI) - 31 Oct 2025

Abstract

Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a [...] Read more.

Oceanic whitecaps are caused by wave breaking and are very important in air–sea interactions. Usually, whitecap coverage is considered a key factor in representing the role of whitecaps. However, the accurate identification of whitecap coverage in videos under dynamic marine conditions is a tough task. An EMA-SE-ResUNet deep learning model was proposed in this study to address this challenge. Based on a foundation of residual network (ResNet)-50 as the encoder and U-Net as the decoder, the model incorporated efficient multi-scale attention (EMA) module and squeeze-and-excitation network (SENet) module to improve its performance. By employing a dynamic weight allocation strategy and a channel attention mechanism, the model effectively strengthens the feature representation capability for whitecap edges while suppressing interference from wave textures and illumination noise. The model’s adaptability to complex sea surface scenarios was enhanced through the integration of data augmentation techniques and an optimized joint loss function. By applying the proposed model to a dataset collected by a shipborne camera system deployed during a comprehensive fishery resource survey in the northwest Pacific, the model results outperformed main segmentation algorithms, including U-Net, DeepLabv3+, HRNet, and PSPNet, in key metrics: whitecap intersection over union (IoU_W) = 73.32%, pixel absolute error (PAE) = 0.081%, and whitecap F1-score (F1_W) = 84.60. Compared to the traditional U-Net model, it achieved an absolute improvement of 2.1% in IoU_W while reducing computational load (GFLOPs) by 57.3% and achieving synergistic optimization of accuracy and real-time performance. This study can provide highly reliable technical support for studies on air–sea flux quantification and marine aerosol generation. Full article

► Show Figures

Figure 1

26 pages, 4332 KB

Open AccessArticle

CDSANet: A CNN-ViT-Attention Network for Ship Instance Segmentation

by Weidong Zhu, Piao Wang and Kuifeng Luan

J. Imaging 2025, 11(11), 383; https://doi.org/10.3390/jimaging11110383 (registering DOI) - 31 Oct 2025

Abstract

Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8 [...] Read more.

Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8 framework mainly relies on convolutional neural networks and CIoU loss, which are less effective in modeling global–local interactions and producing accurate mask boundaries. To address these issues, we propose CDSANet, a novel one-stage ship instance segmentation network. CDSANet integrates convolutional operations, Vision Transformers, and attention mechanisms within a unified architecture. The backbone adopts a Convolutional Vision Transformer Attention (CVTA) module to enhance both local feature extraction and global context perception. The neck employs dynamic-weighted DOWConv to adaptively handle multi-scale ship instances, while SIoU loss improves localization accuracy and orientation robustness. Additionally, CBAM enhances the network’s focus on salient regions, and a MixUp-based augmentation strategy is used to improve model generalization. Extensive experiments on the proposed VLRSSD dataset demonstrate that CDSANet achieves state-of-the-art performance with a mask AP (50–95) of 75.9%, surpassing the YOLOv8 baseline by 1.8%. Full article

(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions (2nd Edition))

► Show Figures

Figure 1

36 pages, 64731 KB

Open AccessArticle

Automated Detection of Embankment Piping and Leakage Hazards Using UAV Visible Light Imagery: A Frequency-Enhanced Deep Learning Approach for Flood Risk Prevention

by Jian Liu, Zhonggen Wang, Renzhi Li, Ruxin Zhao and Qianlin Zhang

Remote Sens. 2025, 17(21), 3602; https://doi.org/10.3390/rs17213602 (registering DOI) - 31 Oct 2025

Viewed by 25

Abstract

Embankment piping and leakage are primary causes of flood control infrastructure failure, accounting for more than 90% of embankment failures worldwide and posing significant threats to public safety and economic stability. Current manual inspection methods are labor-intensive, hazardous, and inadequate for emergency flood [...] Read more.

Embankment piping and leakage are primary causes of flood control infrastructure failure, accounting for more than 90% of embankment failures worldwide and posing significant threats to public safety and economic stability. Current manual inspection methods are labor-intensive, hazardous, and inadequate for emergency flood season monitoring, while existing automated approaches using thermal infrared imaging face limitations in cost, weather dependency, and deployment flexibility. This study addresses the critical scientific challenge of developing reliable, cost-effective automated detection systems for embankment safety monitoring using Unmanned Aerial Vehicle (UAV)-based visible light imagery. The fundamental problem lies in extracting subtle textural signatures of piping and leakage from complex embankment surface patterns under varying environmental conditions. To solve this challenge, we propose the Embankment-Frequency Network (EmbFreq-Net), a frequency-enhanced deep learning framework that leverages frequency-domain analysis to amplify hazard-related features while suppressing environmental noise. The architecture integrates dynamic frequency-domain feature extraction, multi-scale attention mechanisms, and lightweight design principles to achieve real-time detection capabilities suitable for emergency deployment and edge computing applications. This approach transforms traditional post-processing workflows into an efficient real-time edge computing solution, significantly improving computational efficiency and enabling immediate on-site hazard assessment. Comprehensive evaluations on a specialized embankment hazard dataset demonstrate that EmbFreq-Net achieves 77.68% mAP@0.5, representing a 4.19 percentage point improvement over state-of-the-art methods, while reducing computational requirements by 27.0% (4.6 vs. 6.3 Giga Floating-Point Operations (GFLOPs)) and model parameters by 21.7% (2.02M vs. 2.58M). These results demonstrate the method’s potential for transforming embankment safety monitoring from reactive manual inspection to proactive automated surveillance, thereby contributing to enhanced flood risk management and infrastructure resilience. Full article

(This article belongs to the Special Issue Advancing UAV-Based Remote Sensing: Innovations, Techniques and Applications)

► Show Figures

Figure 1

22 pages, 2417 KB

Open AccessArticle

Intelligent Load Forecasting for Central Air Conditioning Using an Optimized Hybrid Deep Learning Framework

by Wei He, Rui Hua, Yulong Xiao, Yuce Liu, Chaohui Zhou and Chaoshun Li

Energies 2025, 18(21), 5736; https://doi.org/10.3390/en18215736 (registering DOI) - 31 Oct 2025

Viewed by 50

Abstract

Accurate load forecasting of central air conditioning (CAC) systems is crucial for enhancing energy efficiency and minimizing operational costs. However, the complex nonlinear correlations among meteorological factors, water system dynamics, and cooling demand make this task challenging. To address these issues, this study [...] Read more.

Accurate load forecasting of central air conditioning (CAC) systems is crucial for enhancing energy efficiency and minimizing operational costs. However, the complex nonlinear correlations among meteorological factors, water system dynamics, and cooling demand make this task challenging. To address these issues, this study proposes a novel hybrid forecasting model termed IWOA-BiTCN-BiGRU-SA, which integrates the Improved Whale Optimization Algorithm (IWOA), Bidirectional Temporal Convolutional Networks (BiTCN), Bidirectional Gated Recurrent Units (BiGRU), and a Self-attention mechanism (SA). BiTCN is adopted to extract temporal dependencies and multi-scale features, BiGRU captures long-term bidirectional correlations, and the self-attention mechanism enhances feature weighting adaptively. Furthermore, IWOA is employed to optimize the hyperparameters of BiTCN and BiGRU, improving training stability and generalization. Experimental results based on real CAC operational data demonstrate that the proposed model outperforms traditional methods such as LSTM, GRU, and TCN, as well as hybrid deep learning benchmark models. Compared to all comparison models, the root mean square error (RMSE) decreased by 13.72% to 56.66%. This research highlights the application potential of the IWSO-BiTCN-BiGRU-Attention framework in practical load forecasting and intelligent energy management for large-scale CAC systems. Full article

(This article belongs to the Special Issue Development of Artificial Intelligence in Green Buildings and Renewable Energy)

► Show Figures

Figure 1

25 pages, 12749 KB

Open AccessArticle

ADFE-DET: An Adaptive Dynamic Feature Enhancement Algorithm for Weld Defect Detection

by Xiaocui Wu, Changjun Liu, Hao Zhang and Pengyu Xu

Appl. Sci. 2025, 15(21), 11595; https://doi.org/10.3390/app152111595 - 30 Oct 2025

Viewed by 86

Abstract

Welding is a critical joining process in modern manufacturing, with defects contributing to 50–80% of structural failures. Traditional inspection methods are often inefficient, subjective, and inconsistent. To address challenges in weld defect detection—including scale variation, morphological complexity, low contrast, and sample imbalance—this paper [...] Read more.

Welding is a critical joining process in modern manufacturing, with defects contributing to 50–80% of structural failures. Traditional inspection methods are often inefficient, subjective, and inconsistent. To address challenges in weld defect detection—including scale variation, morphological complexity, low contrast, and sample imbalance—this paper proposes ADFE-DET, an adaptive dynamic feature enhancement algorithm. The approach introduces three core innovations: the Dynamic Selection Cross-stage Cascade Feature Block (DSCFBlock) captures fine texture features via edge-preserving dynamic selection attention; the Adaptive Hierarchical Spatial Feature Pyramid Network (AHSFPN) achieves adaptive multi-scale feature integration through directional channel attention and hierarchical fusion; and the Multi-Directional Differential Lightweight Head (MDDLH) enables precise defect localization via multi-directional differential convolution while maintaining a lightweight architecture. Experiments on three public datasets (Weld-DET, NEU-DET, PKU-Market-PCB) show that ADFE-DET improves mAP50 by 2.16%, 2.73%, and 1.81%, respectively, over baseline YOLOv11n, while reducing parameters by 34.1%, computational complexity by 4.6%, and achieving 105 FPS inference speed. The results demonstrate that ADFE-DET provides an effective and practical solution for intelligent industrial weld quality inspection. Full article

► Show Figures

Figure 1

23 pages, 3168 KB

Open AccessArticle

Spatio-Temporal Feature Fusion-Based Hybrid GAT-CNN-LSTM Model for Enhanced Short-Term Power Load Forecasting

by Jia Huang, Qing Wei, Tiankuo Wang, Jiajun Ding, Longfei Yu, Diyang Wang and Zhitong Yu

Energies 2025, 18(21), 5686; https://doi.org/10.3390/en18215686 - 29 Oct 2025

Viewed by 196

Abstract

Conventional power load forecasting frameworks face limitations in dynamic spatial topology capture and long-term dependency modeling. To address these issues, this study proposes a hybrid GAT-CNN-LSTM architecture for enhanced short-term power load forecasting. The model integrates three core components synergistically: Graph Attention Network [...] Read more.

Conventional power load forecasting frameworks face limitations in dynamic spatial topology capture and long-term dependency modeling. To address these issues, this study proposes a hybrid GAT-CNN-LSTM architecture for enhanced short-term power load forecasting. The model integrates three core components synergistically: Graph Attention Network (GAT) dynamically captures spatial correlations via adaptive node weighting, resolving static topology constraints; a CNN-LSTM module extracts multi-scale temporal features—convolutional kernels decompose load fluctuations, while bidirectional LSTM layers model long-term trends; and a gated fusion mechanism adaptively weights and fuses spatio-temporal features, suppressing noise and enhancing sensitivity to critical load periods. Experimental validations on multi-city datasets show significant improvements: the model outperforms baseline models by a notable margin in error reduction, exhibits stronger robustness under extreme weather, and maintains superior stability in multi-step forecasting. This study concludes that the hybrid model balances spatial topological analysis and temporal trend modeling, providing higher accuracy and adaptability for STLF in complex power grid environments. Full article

(This article belongs to the Special Issue Modelling, Analysis and Control of AC/DC Power Systems with High Penetration of Renewable Energy)

► Show Figures

Figure 1

19 pages, 2431 KB

Open AccessArticle

Predicting the Remaining Service Life of Power Transformers Using Machine Learning

by Zimo Gao, Binkai Yu, Jiahe Guang, Shanghua Jiang, Xinze Cong, Minglei Zhang and Lin Yu

Processes 2025, 13(11), 3459; https://doi.org/10.3390/pr13113459 - 28 Oct 2025

Viewed by 260

Abstract

In response to the insufficient adaptability of power transformer remaining useful life (RUL) prediction under complex working conditions and the difficulty of multi-scale feature fusion, this study proposes an industrial time series prediction model based on the parallel Transformer–BiGRU–GlobalAttention model. The parallel Transformer [...] Read more.

In response to the insufficient adaptability of power transformer remaining useful life (RUL) prediction under complex working conditions and the difficulty of multi-scale feature fusion, this study proposes an industrial time series prediction model based on the parallel Transformer–BiGRU–GlobalAttention model. The parallel Transformer encoder captures long-range temporal dependencies, the BiGRU network enhances local sequence associations through bidirectional modeling, the global attention mechanism dynamically weights key temporal features, and cross-attention achieves spatiotemporal feature interaction and fusion. Experiments were conducted based on the public ETT transformer temperature dataset, employing sliding window and piecewise linear label processing techniques, with MAE, MSE, and RMSE as evaluation metrics. The results show that the model achieved excellent predictive performance on the test set, with an MSE of 0.078, MAE of 0.233, and RMSE of 11.13. Compared with traditional LSTM, CNN-BiGRU-Attention, and other methods, the model achieved improvements of 17.2%, 6.0%, and 8.9%, respectively. Ablation experiments verified that the global attention mechanism rationalizes the feature contribution distribution, with the core temporal feature OT having a contribution rate of 0.41. Multiple experiments demonstrated that this method has higher precision compared with other methods. Full article

(This article belongs to the Section Energy Systems)

► Show Figures

Figure 1

28 pages, 2524 KB

Open AccessArticle

A Multimodal Analysis of Automotive Video Communication Effectiveness: The Impact of Visual Emotion, Spatiotemporal Cues, and Title Sentiment

by Yawei He, Zijie Feng and Wen Liu

Electronics 2025, 14(21), 4200; https://doi.org/10.3390/electronics14214200 - 27 Oct 2025

Viewed by 206

Abstract

To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of [...] Read more.

To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of both the textual (titles) and visual (frames) dimensions of videos. For visual analysis, we introduce FER-MA-YOLO, a novel facial expression recognition model tailored to the demands of computational communication research. Enhanced with a Dense Growth Feature Fusion (DGF) module and a multiscale Dilated Attention Module (MDAM), it enables accurate quantification of on-screen emotional dynamics, which is essential for testing our hypotheses on user engagement. For textual analysis, we employ a BERT model to quantify the sentiment intensity of video titles. Applying this framework to 968 videos from the Bilibili platform, our regression analysis—which modeled four distinct engagement dimensions (reach, support, discussion, and interaction) separately, in addition to a composite effectiveness score—reveals several key insights: emotionally charged titles significantly boost user interaction; visually, the on-screen proportion of human elements positively predicts engagement, while excessively high visual information entropy weakens it. Furthermore, neutral expressions boost view counts, and happy expressions drive interaction. This study offers a multimodal computational framework that integrates textual and visual analysis and provides empirical, data-driven insights for optimizing automotive video content strategies, contributing to the growing application of computational methods in communication research. Full article

(This article belongs to the Special Issue Advances in Data-Driven Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 2632 KB

Open AccessArticle

Adverse-Weather Image Restoration Method Based on VMT-Net

by Zhongmin Liu, Xuewen Yu and Wenjin Hu

J. Imaging 2025, 11(11), 376; https://doi.org/10.3390/jimaging11110376 - 26 Oct 2025

Viewed by 254

Abstract

To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling [...] Read more.

To address global semantic loss, local detail blurring, and spatial–semantic conflict during image restoration under adverse weather conditions, we propose an image restoration network that integrates Mamba with Transformer architectures. We first design a Vision-Mamba–Transformer (VMT) module that combines the long-range dependency modeling of Vision Mamba with the global contextual reasoning of Transformers, facilitating the joint modeling of global structures and local details, thus mitigating information loss and detail blurring during restoration. Second, we introduce an Adaptive Content Guidance (ACG) module that employs dynamic gating and spatial–channel attention to enable effective inter-layer feature fusion, thereby enhancing cross-layer semantic consistency. Finally, we embed the VMT and ACG modules into a U-Net backbone, achieving efficient integration of multi-scale feature modeling and cross-layer fusion, significantly improving reconstruction quality under complex weather conditions. The experimental results show that on Snow100K-S/L, VMT-Net improves PSNR over the baseline by approximately 0.89 dB and 0.36 dB, with SSIM gains of about 0.91% and 0.11%, respectively. On Outdoor-Rain and Raindrop, it performs similarly to the baseline and exhibits superior detail recovery in real-world scenes. Overall, the method demonstrates robustness and strong detail restoration across diverse adverse-weather conditions. Full article

(This article belongs to the Topic Transformer and Deep Learning Applications in Image Processing)

► Show Figures

Figure 1

27 pages, 2176 KB

Open AccessArticle

Intelligent Fault Diagnosis of Rolling Bearings Based on Digital Twin and Multi-Scale CNN-AT-BiGRU Model

by Jiayu Shi, Liang Qi, Shuxia Ye, Changjiang Li, Chunhui Jiang, Zhengshun Ni, Zheng Zhao, Zhe Tong, Siyu Fei, Runkang Tang, Danfeng Zuo and Jiajun Gong

Symmetry 2025, 17(11), 1803; https://doi.org/10.3390/sym17111803 - 26 Oct 2025

Viewed by 494

Abstract

Rolling bearings constitute critical rotating components within rolling mill equipment. Production efficiency and the operational safety of the whole mechanical system are directly governed by their operational health state. To address the dual challenges of the over-reliance of conventional diagnostic methods on expert [...] Read more.

Rolling bearings constitute critical rotating components within rolling mill equipment. Production efficiency and the operational safety of the whole mechanical system are directly governed by their operational health state. To address the dual challenges of the over-reliance of conventional diagnostic methods on expert experience and the scarcity of fault samples in industrial scenarios, we propose a virtual–physical data fusion-optimized intelligent fault diagnosis framework. Initially, a dynamics-based digital twin model for rolling bearings is developed by leveraging their geometric symmetry. It is capable of generating comprehensive fault datasets through parametric adjustments of bearing dimensions and operational environments in virtual space. Subsequently, a symmetry-informed architecture is constructed, which integrates multi-scale convolutional neural networks with attention mechanisms and bidirectional gated recurrent units (MCNN-AT-BiGRU). This architecture enables spatiotemporal feature extraction and enhances critical fault characteristics. The experimental results demonstrate 99.5% fault identification accuracy under single operating conditions. It maintains stable performance under low SNR conditions. Furthermore, the framework exhibits superior generalization capability and transferability across the different bearing types. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

23 pages, 11034 KB

Open AccessArticle

UEBNet: A Novel and Compact Instance Segmentation Network for Post-Earthquake Building Assessment Using UAV Imagery

by Ziying Gu, Shumin Wang, Kangsan Yu, Yuanhao Wang and Xuehua Zhang

Remote Sens. 2025, 17(21), 3530; https://doi.org/10.3390/rs17213530 - 24 Oct 2025

Viewed by 294

Abstract

Unmanned aerial vehicle (UAV) remote sensing is critical in assessing post-earthquake building damage. However, intelligent disaster assessment via remote sensing faces formidable challenges from complex backgrounds, substantial scale variations in targets, and diverse spatial disaster dynamics. To address these issues, we propose UEBNet, [...] Read more.

Unmanned aerial vehicle (UAV) remote sensing is critical in assessing post-earthquake building damage. However, intelligent disaster assessment via remote sensing faces formidable challenges from complex backgrounds, substantial scale variations in targets, and diverse spatial disaster dynamics. To address these issues, we propose UEBNet, a high-precision post-earthquake building instance segmentation model that systematically enhances damage recognition by integrating three key modules. Firstly, the Depthwise Separable Convolutional Block Attention Module suppresses background noise that visually resembles damaged structures. This is achieved by expanding the receptive field using multi-scale pooling and dilated convolutions. Secondly, the Multi-feature Fusion Module generates scale-robust feature representations for damaged buildings with significant size differences by processing feature streams from different receptive fields in parallel. Finally, the Adaptive Multi-Scale Interaction Module accurately reconstructs the irregular contours of damaged buildings through an advanced feature alignment mechanism. Extensive experiments were conducted using UAV imagery collected after the Ms 6.8 earthquake in Tingri County, Tibet Autonomous Region, China, on 7 January 2025, and the Ms 6.2 earthquake in Jishishan County, Gansu Province, China, on 18 December 2023. Results indicate that UEBNet enhances segmentation mean Average Precision (

{m A P}^{s e g}

) and bounding box mean Average Precision (

{m A P}^{b o x}

) by 3.09% and 2.20%, respectively, with equivalent improvements of 2.65% in F1-score and 1.54% in overall accuracy, outperforming state-of-the-art instance segmentation models. These results demonstrate the effectiveness and reliability of UEBNet in accurately segmenting earthquake-damaged buildings in complex post-disaster scenarios, offering valuable support for emergency response and disaster relief. Full article

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

► Show Figures

Figure 1

37 pages, 14970 KB

Open AccessArticle

Research on Strawberry Visual Recognition and 3D Localization Based on Lightweight RAFS-YOLO and RGB-D Camera

by Kaixuan Li, Xinyuan Wei, Qiang Wang and Wuping Zhang

Agriculture 2025, 15(21), 2212; https://doi.org/10.3390/agriculture15212212 - 24 Oct 2025

Viewed by 386

Abstract

Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with [...] Read more.

Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with an RGB-D camera. Built upon the YOLOv11 framework, an enhanced RAFS-YOLO model is developed, incorporating three core modules to strengthen multi-scale feature fusion and spatial modeling capabilities. Specifically, the CRA module enhances spatial relationship perception through cross-layer attention, the HSFPN module performs hierarchical semantic filtering to suppress redundant features, and the DySample module dynamically optimizes the upsampling process to improve computational efficiency. By integrating the trained model with RGB-D depth data, the method achieves precise 3D localization of strawberries through coordinate mapping based on detection box centers. Experimental results indicate that RAFS-YOLO surpasses YOLOv11n, improving precision, recall, and mAP@50 by 4.2%, 3.8%, and 2.0%, respectively, while reducing parameters by 36.8% and computational cost by 23.8%. The 3D localization attains millimeter-level precision, with average RMSE values ranging from 0.21 to 0.31 cm across all axes. Overall, the proposed approach achieves a balance between detection accuracy, model efficiency, and localization precision, providing a reliable perception framework for intelligent strawberry-picking robots. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

14 pages, 4834 KB

Open AccessArticle

Crowd Gathering Detection Method Based on Multi-Scale Feature Fusion and Convolutional Attention

by Kamil Yasen, Juting Zhou, Nan Zhou, Ke Qin, Zhiguo Wang and Ye Li

Sensors 2025, 25(21), 6550; https://doi.org/10.3390/s25216550 - 24 Oct 2025

Viewed by 215

Abstract

With rapid urbanization and growing population inflows into metropolitan areas, crowd gatherings have become increasingly frequent and dense, posing significant challenges to public safety management. Although existing crowd gathering detection methods have achieved notable progress, they still face major limitations: most rely heavily [...] Read more.

With rapid urbanization and growing population inflows into metropolitan areas, crowd gatherings have become increasingly frequent and dense, posing significant challenges to public safety management. Although existing crowd gathering detection methods have achieved notable progress, they still face major limitations: most rely heavily on local texture or density features and lack the capacity to model contextual information, making them ineffective under severe occlusions and complex backgrounds. Additionally, fixed-scale feature extraction strategies struggle to adapt to crowd regions with varying densities and scales, and insufficient attention to densely populated areas hinders the capture of critical local features. To overcome these challenges, we propose a point-supervised framework named Multi-Scale Convolutional Attention Network (MSCANet). MSCANet adopts a context-aware architecture and integrates multi-scale feature extraction modules and convolutional attention mechanisms, enabling it to dynamically adapt to varying crowd densities while focusing on key regions. This enhances feature representation in complex scenes and improves detection performance. Extensive experiments on public datasets demonstrate that MSCANet achieves high counting accuracy and robustness, particularly in dense and occluded environments, showing strong potential for real-world deployment. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

21 pages, 9302 KB

Open AccessArticle

Research on Small Object Detection in Degraded Visual Scenes: An Improved DRF-YOLO Algorithm Based on YOLOv11

by Yan Gu, Lingshan Chen and Tian Su

World Electr. Veh. J. 2025, 16(11), 591; https://doi.org/10.3390/wevj16110591 - 23 Oct 2025

Viewed by 521

Abstract

Object detection in degraded environments such as low-light and nighttime conditions remains a challenging task, as conventional computer vision techniques often fail to achieve high precision and robust performance. With the increasing adoption of deep learning, this paper aims to enhance object detection [...] Read more.

Object detection in degraded environments such as low-light and nighttime conditions remains a challenging task, as conventional computer vision techniques often fail to achieve high precision and robust performance. With the increasing adoption of deep learning, this paper aims to enhance object detection under such adverse conditions by proposing an improved version of YOLOv11, named DRF-YOLO (Degradation-Robust and Feature-enhanced YOLO). The proposed framework incorporates three innovative components: (1) a lightweight Cross Stage Partial Multi-Scale Edge Enhancement (CSP-MSEE) module that combines multi-scale feature extraction with edge enhancement to strengthen feature representation; (2) a Focal Modulation attention mechanism that improves the network’s responsiveness to target regions and contextual information; and (3) a self-developed Dynamic Interaction Head (DIH) that enhances detection accuracy and spatial adaptability for small objects. In addition, a lightweight unsupervised image enhancement algorithm, Zero-DCE (Zero-Reference Deep Curve Estimation), is introduced prior to training to improve image contrast and detail, and Generalized Intersection over Union (GIoU) is employed as the bounding box regression loss. To evaluate the effectiveness of DRF-YOLO, experiments are conducted on two representative low-light datasets: ExDark and the nighttime subset of BDD100K, which include images of vehicles, pedestrians, and other road objects. Results show that DRF-YOLO achieves improvements of 3.4% and 2.3% in mAP@0.5 compared with the original YOLOv11, demonstrating enhanced robustness and accuracy in degraded environments while maintaining lightweight efficiency. Full article

► Show Figures

Figure 1

Search Results (571)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (571)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI