Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,273)

Search Parameters:
Keywords = integration of multiscale information

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3604 KB  
Article
Intelligent Early Warning and Sustainable Engineering Prevention for Coal Mine Shaft Rupture
by Qiukai Gai, Gang Yang, Qingli Liu, Qiang Fu, Shiqi Liu, Qing Ma and Chao Lian
Processes 2025, 13(12), 4016; https://doi.org/10.3390/pr13124016 - 12 Dec 2025
Abstract
Shaft lifting is an important process of coal mining, and its integrity is a prerequisite for ensuring efficient mining. The non-mining-induced rupture of vertical shafts in coal mines, primarily caused by the consolidation settlement of overlying unconsolidated strata due to aquifer dewatering, poses [...] Read more.
Shaft lifting is an important process of coal mining, and its integrity is a prerequisite for ensuring efficient mining. The non-mining-induced rupture of vertical shafts in coal mines, primarily caused by the consolidation settlement of overlying unconsolidated strata due to aquifer dewatering, poses a significant threat to mining safety. Accurately predicting such ruptures remains challenging due to the multicollinearity and complex interactions among multiple influencing factors. This study proposes a novel multiscale discriminant analysis model, termed the SDA-PCA-FDA model, which integrates Stepwise Discriminant Analysis (SDA), Principal Component Analysis (PCA), and Fisher’s Discriminant Analysis (FDA). Initially, SDA screened five principal controlling factors from nine original variables. Subsequently, PCA was applied to reorganize these factors into three principal components, effectively eliminating information redundancy. Finally, the FDA model was established based on these components. Validation results demonstrated that the SDA-PCA-FDA model achieved high correct classification rates of 96.43% and 91.67% on the training and testing sets, respectively, significantly outperforming traditional FDA, PCA-FDA, and SDA-FDA models. Applied to engineering practice in the Yanzhou Mining Area, the model successfully predicted the rupture risk of the main shaft, consistent with field observations. Furthermore, to achieve sustainable governance, the “Friction Pile Method” was proposed as a preventive measure. Numerical simulations using NM2dc software determined the optimal governance parameters: a pile height of 112.86 m, a stiffness coefficient of 0.9, and a pile–shaft spacing of 10 m. A comparative analysis incorporating techno-economic sustainability indicators confirmed the superior effectiveness and economic viability of the friction pile method over traditional approaches. This research provides a reliable, multiscale methodology for both the prediction and sustainable governance of non-mining-induced shaft rupture. Full article
(This article belongs to the Special Issue Safety Monitoring and Intelligent Diagnosis of Mining Processes)
Show Figures

Figure 1

22 pages, 1479 KB  
Article
VMPANet: Vision Mamba Skin Lesion Image Segmentation Model Based on Prompt and Attention Mechanism Fusion
by Zinuo Peng, Shuxian Liu and Chenhao Li
J. Imaging 2025, 11(12), 443; https://doi.org/10.3390/jimaging11120443 - 11 Dec 2025
Abstract
In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion [...] Read more.
In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion shapes. To address these challenges, we propose VMPANet, designed to accurately localize critical targets and capture edge structures. VMPANet employs an inverted pyramid convolution to extract multi-scale features while utilizing the visual Mamba module to capture long-range dependencies among image features. Additionally, we leverage previously extracted masks as cues to facilitate efficient feature propagation. Furthermore, VMPANet integrates parallel depthwise separable convolutions to enhance feature extraction and introduces innovative mechanisms for edge enhancement, spatial attention, and channel attention to adaptively extract edge information and complex spatial relationships. Notably, VMPANet refines a novel cross-attention mechanism, which effectively facilitates the interaction between deep semantic cues and shallow texture details, thereby generating comprehensive feature representations while reducing computational load and redundancy. We conducted comparative and ablation experiments on two public skin lesion datasets (ISIC2017 and ISIC2018). The results demonstrate that VMPANet outperforms existing mainstream methods. On the ISIC2017 dataset, its mIoU and DSC metrics are 1.38% and 0.83% higher than those of VM-Unet respectively; on the ISIC2018 dataset, these metrics are 1.10% and 0.67% higher than those of EMCAD, respectively. Moreover, VMPANet boasts a parameter count of only 0.383 M and a computational load of 1.159 GFLOPs. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

23 pages, 7617 KB  
Article
A Dual-Modal Adaptive Pyramid Transformer Algorithm for UAV Cross-Modal Object Detection
by Qiqin Li, Ming Yang, Xiaoqiang Zhang, Nannan Wang, Xiaoguang Tu, Xijun Liu and Xinyu Zhu
Sensors 2025, 25(24), 7541; https://doi.org/10.3390/s25247541 - 11 Dec 2025
Abstract
Unmanned Aerial Vehicles (UAVs) play vital roles in traffic surveillance, disaster management, and border security, highlighting the importance of reliable infrared–visible image detection under complex illumination conditions. However, UAV-based infrared–visible detection still faces challenges in multi-scale target recognition, robustness to lighting variations, and [...] Read more.
Unmanned Aerial Vehicles (UAVs) play vital roles in traffic surveillance, disaster management, and border security, highlighting the importance of reliable infrared–visible image detection under complex illumination conditions. However, UAV-based infrared–visible detection still faces challenges in multi-scale target recognition, robustness to lighting variations, and efficient cross-modal information utilization. To address these issues, this study proposes a lightweight Dual-modality Adaptive Pyramid Transformer (DAP) module integrated into the YOLOv8 framework. The DAP module employs a hierarchical self-attention mechanism and a residual fusion structure to achieve adaptive multi-scale representation and cross-modal semantic alignment while preserving modality-specific features. This design enables effective feature fusion with reduced computational cost, enhancing detection accuracy in complex environments. Experiments on the DroneVehicle and LLVIP datasets demonstrate that the proposed DAP-based YOLOv8 achieves mAP50:95 scores of 61.2% and 62.1%, respectively, outperforming conventional methods. The results validate the capability of the DAP module to optimize cross-modal feature interaction and improve UAV real-time infrared–visible target detection, offering a practical and efficient solution for UAV applications such as traffic monitoring and disaster response. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

29 pages, 6470 KB  
Article
Lightweight YOLO-SR: A Method for Small Object Detection in UAV Aerial Images
by Sirong Liang, Xubin Feng, Meilin Xie, Qiang Tang, Haoran Zhu and Guoliang Li
Appl. Sci. 2025, 15(24), 13063; https://doi.org/10.3390/app152413063 - 11 Dec 2025
Abstract
To address challenges in small object detection within drone aerial imagery—such as sparse feature information, intense background interference, and drastic scale variations—this paper proposes YOLO-SR, a lightweight detection algorithm based on attention enhancement and feature reuse mechanisms. First, we designed the lightweight feature [...] Read more.
To address challenges in small object detection within drone aerial imagery—such as sparse feature information, intense background interference, and drastic scale variations—this paper proposes YOLO-SR, a lightweight detection algorithm based on attention enhancement and feature reuse mechanisms. First, we designed the lightweight feature extraction module C2f-SA, which incorporates Shuffle Attention. By integrating channel shuffling and grouped spatial attention mechanisms, this module dynamically enhances edge and texture feature responses for small objects, effectively improving the discriminative power of shallow-level features. Second, the Spatial Pyramid Pooling Attention (SPPC) module captures multi-scale contextual information through spatial pyramid pooling. Combined with dual-path (channel and spatial) attention mechanisms, it optimizes feature representation while significantly suppressing complex background interference. Finally, the detection head employs a decoupled architecture separating classification and regression tasks, supplemented by a dynamic loss weighting strategy to mitigate small object localization inaccuracies. Experimental results on the RGBT-Tiny dataset demonstrate that compared to the baseline model YOLOv5s, our algorithm achieves a 5.3% improvement in precision, a 13.1% increase in recall, and respective gains of 11.5% and 22.3% in mAP0.5 and mAP0.75, simultaneously reducing the number of parameters by 42.9% (from 7.0 × 106 to 4.0 × 106) and computational cost by 37.2% (from 60.0 GFLOPs to 37.7 GFLOPs). The comprehensive improvement across multiple metrics validates the superiority of the proposed algorithm in both accuracy and efficiency. Full article
Show Figures

Figure 1

31 pages, 4757 KB  
Article
MFEF-YOLO: A Multi-Scale Feature Extraction and Fusion Network for Small Object Detection in Aerial Imagery over Open Water
by Qi Liu, Haiyang Yu, Ping Zhang, Tingting Geng, Xinru Yuan, Bingqian Ji, Shengmin Zhu and Ruopu Ma
Remote Sens. 2025, 17(24), 3996; https://doi.org/10.3390/rs17243996 - 11 Dec 2025
Abstract
Current object detection using UAV platforms in open water faces challenges such as low detection accuracy, limited storage, and constrained computational capabilities. To address these issues, we propose MFEF-YOLO, a small object detection network based on multi-scale feature extraction and fusion. First, we [...] Read more.
Current object detection using UAV platforms in open water faces challenges such as low detection accuracy, limited storage, and constrained computational capabilities. To address these issues, we propose MFEF-YOLO, a small object detection network based on multi-scale feature extraction and fusion. First, we introduce a Dual-Branch Spatial Pyramid Pooling Fast (DBSPPF) module in the backbone network to replace the original SPPF module, while integrating ODConv and C3k2 modules to collectively enhance feature extraction capabilities. Second, we improve small object detection by adding a P2 detection head and reduce model parameters by removing the P5 detection head. Finally, we design an Island-based Multi-scale Feature Fusion Network (IMFFNet) and employ a Coordinate-guided Multi-scale Feature Fusion Module (CMFFM) to strengthen contextual information and boost detection accuracy. We validate the effectiveness of MFEF-YOLO using the public dataset SeaDronesSee and our custom dataset TPDNV. Experimental results show that compared to the baseline model, mAP50 improves by 0.11 and 0.03 using the two datasets, respectively, while model parameters are reduced by 11.54%. Furthermore, DBSPPF and IMFFNet demonstrate superior performance in comparative studies with other methods, confirming their effectiveness. These improvements and outstanding performance make MFEF-YOLO particularly suitable for UAV-based object detection in open waters. Full article
(This article belongs to the Special Issue Deep Learning-Based Small-Target Detection in Remote Sensing)
Show Figures

Figure 1

18 pages, 1070 KB  
Article
Advancing Real-Time Polyp Detection in Colonoscopy Imaging: An Anchor-Free Deep Learning Framework with Adaptive Multi-Scale Perception
by Wanyu Qiu, Xiao Yang, Zirui Liu and Chen Qiu
Sensors 2025, 25(24), 7524; https://doi.org/10.3390/s25247524 - 11 Dec 2025
Abstract
Accurate and real-time detection of polyps in colonoscopy is a critical task for the early prevention of colorectal cancer. The primary difficulties include insufficient extraction of multi-scale contextual cues for polyps of different sizes, inefficient fusion of multi-level features, and a reliance on [...] Read more.
Accurate and real-time detection of polyps in colonoscopy is a critical task for the early prevention of colorectal cancer. The primary difficulties include insufficient extraction of multi-scale contextual cues for polyps of different sizes, inefficient fusion of multi-level features, and a reliance on hand-crafted anchor priors that require extensive tuning and compromise generalization performance. Therefore, we introduce a one-stage anchor-free detector that achieves state-of-the-art accuracy whilst running in real-time on a GTX 1080-Ti GPU workstation. Specifically, to enrich contextual information across a wide spectrum, our Cross-Stage Pyramid Pooling module efficiently aggregates multi-scale contexts through cascaded pooling and cross-stage partial connections. Subsequently, to achieve a robust equilibrium between low-level spatial details and high-level semantics, our Weighted Bidirectional Feature Pyramid Network adaptively integrates features across all scales using learnable channel-wise weights. Furthermore, by reconceptualizing detection as a direct point-to-boundary regression task, our anchor-free head obviates the dependency on hand-tuned priors. This regression is supervised by a Scale-invariant Distance with Aspect-ratio IoU loss, substantially improving localization accuracy for polyps of diverse morphologies. Comprehensive experiments on a large dataset comprising 103,469 colonoscopy frames substantiate the superiority of our method, achieving 98.8% mAP@0.5 and 82.5% mAP@0.5:0.95 at 35.8 FPS. Our method outperforms widely used CNN-based models (e.g., EfficientDet, YOLO series) and recent Transformer-based competitors (e.g., Adamixer, HDETR), demonstrating its potential for clinical application. Full article
(This article belongs to the Special Issue Advanced Biomedical Imaging and Signal Processing)
Show Figures

Figure 1

26 pages, 16103 KB  
Article
Integrating Phenological Features with Time Series Transformer for Accurate Rice Field Mapping in Fragmented and Cloud-Prone Areas
by Tiantian Xu, Peng Cai, Hangan Wei, Huili He and Hao Wang
Sensors 2025, 25(24), 7488; https://doi.org/10.3390/s25247488 - 9 Dec 2025
Viewed by 114
Abstract
Accurate identification and monitoring of rice cultivation areas are essential for food security and sustainable agricultural development. However, regions with frequent cloud cover, high rainfall, and fragmented fields often face challenges due to the absence of temporal features caused by cloud and rain [...] Read more.
Accurate identification and monitoring of rice cultivation areas are essential for food security and sustainable agricultural development. However, regions with frequent cloud cover, high rainfall, and fragmented fields often face challenges due to the absence of temporal features caused by cloud and rain interference, as well as spectral confusion from scattered plots, which hampers rice recognition accuracy. To address these issues, this study employs a Satellite Image Time Series Transformer (SITS-Former) model, enhanced with the integration of diverse phenological features to improve rice phenology representation and enable precise rice identification. The methodology constructs a rice phenological feature set that combines temporal, spatial, and spectral information. Through its self-attention mechanism, the model effectively captures growth dynamics, while multi-scale convolutional modules help suppress interference from non-rice land covers. The study utilized Sentinel-2 satellite data to analyze rice distribution in Wuxi City. The results demonstrated an overall classification accuracy of 0.967, with the estimated planting area matching 91.74% of official statistics. Compared to traditional rice distribution analysis methods, such as Random Forest, this approach outperforms in both accuracy and detailed presentation. It effectively addresses the challenge of identifying fragmented rice fields in regions with persistent cloud cover and heavy rainfall, providing accurate mapping of cultivated areas in difficult climatic conditions while offering valuable baseline data for yield assessments. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

33 pages, 3256 KB  
Article
DMF-Net: A Dynamic Fusion Attention Mechanism-Based Model for Coronary Artery Segmentation
by GuangKun Ma, Linghui Kong, Mo Guan, Yanhong Meng and Deyan Chen
Symmetry 2025, 17(12), 2111; https://doi.org/10.3390/sym17122111 - 8 Dec 2025
Viewed by 137
Abstract
Coronary artery segmentation in CTA images remains challenging due to blurred vessel boundaries, unclear structural details, and sparse vascular distributions. To address these limitations, we propose DMF-Net (Dual-path Multi-scale Fusion Network), a novel multi-scale feature fusion architecture based on UNet++. The network incorporates [...] Read more.
Coronary artery segmentation in CTA images remains challenging due to blurred vessel boundaries, unclear structural details, and sparse vascular distributions. To address these limitations, we propose DMF-Net (Dual-path Multi-scale Fusion Network), a novel multi-scale feature fusion architecture based on UNet++. The network incorporates three key innovations: First, a Dynamic Buffer–Bottleneck–Buffer Layer (DBBLayer) in shallow encoding stages enhances the extraction and preservation of fine vascular structures. Second, an Axial Local–global Hybrid Attention Module (ALHA) in deep encoding stages employs a dual-path mechanism to simultaneously capture vessel trajectories and small branches through integrated global and local pathways. Third, a 2.5D slice strategy improves trajectory capture by leveraging contextual information from adjacent slices. Additionally, a composite loss function combining Dice loss and binary cross-entropy jointly optimizes vascular connectivity and boundary precision. Validated on the ImageCAS dataset, DMF-Net achieves superior performance compared to state-of-the-art methods: 89.45% Dice Similarity Coefficient (DSC) (+3.67% vs. UNet++), 3.85 mm Hausdorff Distance (HD, 49.1% reduction), and 0.95 mm Average Surface Distance (ASD, 42.4% improvement). Subgroup analysis reveals particularly strong performance in clinically challenging scenarios. For small vessels (<2 mm diameter), DMF-Net achieves 85.23 ± 1.34% DSC versus 78.67 ± 1.89% for UNet++ (+6.56%, p < 0.001). At complex bifurcations, HD improves from 9.34 ± 2.15 mm to 4.67 ± 1.28 mm (50.0% reduction, p < 0.001). In low-contrast regions (HU difference < 100), boundary precision (ASD) improves from 2.15 ± 0.54 mm to 1.08 ± 0.32 mm (49.8% improvement, p < 0.001). All improvements are statistically significant (p < 0.001). Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

18 pages, 4553 KB  
Article
Changes of Terrace Distribution in the Qinba Mountain Based on Deep Learning
by Xiaohua Meng, Zhihua Song, Xiaoyun Cui and Peng Shi
Sustainability 2025, 17(24), 10971; https://doi.org/10.3390/su172410971 - 8 Dec 2025
Viewed by 83
Abstract
The Qinba Mountains in China span six provinces, characterized by a large population, rugged terrain, steep peaks, deep valleys, and scarce flat land, making large-scale agricultural development challenging. Terraced fields serve as the core cropland type in this region, playing a vital role [...] Read more.
The Qinba Mountains in China span six provinces, characterized by a large population, rugged terrain, steep peaks, deep valleys, and scarce flat land, making large-scale agricultural development challenging. Terraced fields serve as the core cropland type in this region, playing a vital role in preventing soil erosion on sloping farmland and expanding agricultural production space. They also function as a crucial medium for sustaining the ecosystem services of mountainous areas. As a transitional zone between China’s northern and southern climates and a vital ecological barrier, the Qinba Mountains’ terraced ecosystems have undergone significant spatial changes over the past two decades due to compound factors including the Grain-for-Green Program, urban expansion, and population outflow. However, current large-scale, long-term, high-resolution monitoring studies of terraced fields in this region still face technical bottlenecks. On one hand, traditional remote sensing interpretation methods rely on manually designed features, making them ill-suited for the complex scenarios of fragmented, multi-scale distribution, and terrain shadow interference in Qinba terraced fields. On the other hand, the lack of high-resolution historical imagery means that low-resolution data suffers from insufficient accuracy and spatial detail for capturing dynamic changes in terraced fields. This study aims to fill the technical gap in detailed dynamic monitoring of terraced fields in the Qinba Mountains. By creating image tiles from Landsat-8 satellite imagery collected between 2017 and 2020, it employs three deep learning semantic segmentation models—DeepLabV3 based on ResNet-34, U-Net, and PSPNet deep learning semantic segmentation models. Through optimization strategies such as data augmentation and transfer learning, the study achieves 15-m-resolution remote sensing interpretation of terraced field information in the Qinba Mountains from 2000 to 2020. Comparative results revealed DeepLabV3 demonstrated significant advantages in identifying terraced field types: Mean Pixel Accuracy (MPA) reached 79.42%, Intersection over Union (IoU) was 77.26%, F1 score attained 80.98, and Kappa coefficient reached 0.7148—all outperforming U-Net and PSPNet models. The model’s accuracy is not uniform but is instead highly contingent on the topographic context. The model excels in environments that are archetypal for mid-altitudes with moderately steep slopes. Based on it we create a set of tiles integrating multi-source data from RBG and DEM. The fusion model, which incorporates DEM-derived topographic data, demonstrates improvement across these aspects. Dynamic monitoring based on the optimal model indicates that terraced fields in the Qinba Mountains expanded between 2000 and 2020: the total area was 57.834 km2 in 2000, and by 2020, this had increased to 63,742 km2, representing an approximate growth rate of 8.36%. Sichuan, Gansu, and Shaanxi provinces contributed the majority of this expansion, accounting for 71% of the newly added terraced fields. Over the 20-year period, the center of gravity of terraced fields shifted upward. The area of terraced fields above 500 m in elevation increased, while that below 500 m decreased. Terraced fields surrounding urban areas declined, and mountainous slopes at higher elevations became the primary source of newly constructed terraces. This study not only establishes a technical paradigm for the refined monitoring of terraced field resources in mountainous regions but also provides critical data support and theoretical foundations for implementing sustainable land development in the Qinba Mountains. It holds significant practical value for advancing regional sustainable development. Full article
(This article belongs to the Section Sustainable Agriculture)
Show Figures

Figure 1

17 pages, 2930 KB  
Article
Beyond VI-RADS Uncertainty: Leveraging Spatiotemporal DCE-MRI to Predict Bladder Cancer Muscle Invasion
by Minghui Song, Haonan Ren, Lijuan Wang, Yihang Zhou, Xing Tang, Huanjun Wang, Yan Guo, Yang Liu, Hongbing Lu and Xiaopan Xu
Bioengineering 2025, 12(12), 1338; https://doi.org/10.3390/bioengineering12121338 - 8 Dec 2025
Viewed by 138
Abstract
Background: The Vesical Imaging-Reporting and Data System (VI-RADS) has limited diagnostic accuracy in distinguishing non-muscle-invasive bladder cancer (NMIBC) within VI-RADS categories 2 and 3, despite its value for overall NMIBC assessment. Dynamic contrast-enhanced MRI (DCE-MRI), which reflects tumor vascularity, holds promise for [...] Read more.
Background: The Vesical Imaging-Reporting and Data System (VI-RADS) has limited diagnostic accuracy in distinguishing non-muscle-invasive bladder cancer (NMIBC) within VI-RADS categories 2 and 3, despite its value for overall NMIBC assessment. Dynamic contrast-enhanced MRI (DCE-MRI), which reflects tumor vascularity, holds promise for improving these challenging cases but remains underutilized due to unexploited spatiotemporal information. Methods: We developed a deep learning model to comprehensively quantify spatiotemporal features from multiphase DCE-MRI in 184 patients with VI-RADS 2 or 3 (training: n = 115, validation: n = 20, testing: n = 49). The model integrated multiscale feature extraction and contextual attention mechanisms to enhance diagnostic performance. Results: The model outperformed established benchmarks (e.g., VGG, ResNet) and the conventional VI-RADS ≤ 2 threshold (sensitivity: 0.67 for NMIBC), achieving a sensitivity of 0.90 (95% CI: 0.81–0.96) for NMIBC and an area under the curve (AUC) of 0.82 (95% CI: 0.75–0.89) for overall classification. Visualizations confirmed its ability to identify key spatiotemporal patterns linked to muscle invasion. Conclusions: By leveraging comprehensive spatiotemporal information from DCE-MRI, our deep learning model significantly improves NMIBC diagnosis in VI-RADS 2/3 cases, offering a clinically valuable tool to address the limitations of current VI-RADS assessment. Full article
Show Figures

Figure 1

26 pages, 12819 KB  
Article
Multiscale Attention-Enhanced Complex-Valued Graph U-Net for PolSAR Image Classification
by Wanying Song, Qian Liu, Kuncheng Pu, Yinyin Jiang and Yan Wu
Remote Sens. 2025, 17(24), 3943; https://doi.org/10.3390/rs17243943 - 5 Dec 2025
Viewed by 206
Abstract
The powerful graph convolutional network (GCN) for polarimetric synthetic aperture radar (PolSAR) image classification generally relies on real-valued features, ignoring the phase information and thus limiting the modeling of complex-valued (CV) polarization characteristics. To address this issue, this paper proposes a novel multiscale [...] Read more.
The powerful graph convolutional network (GCN) for polarimetric synthetic aperture radar (PolSAR) image classification generally relies on real-valued features, ignoring the phase information and thus limiting the modeling of complex-valued (CV) polarization characteristics. To address this issue, this paper proposes a novel multiscale attention-enhanced CV graph U-Net model, abbreviated as MAE-CV-GUNet, by embedding CV-GCN into a graph U-Net framework augmented with multiscale attention mechanisms. First, a CV-GCN is constructed based on the real-valued GCN, to effectively capture the intrinsic amplitude and phase information of the PolSAR data, along with the underlying correlations between them. This way can well lead to an improved feature representation for PolSAR images. Based on CV-GCN, a CV graph U-Net (CV-GUNet) architecture is constructed by integrating multiple CV-GCN components, aiming to extract multi-scale features and further enhance the ability to extract discriminative features in the complex domain. Then, a multiscale attention (MSA) mechanism is designed, enabling the proposed MAE-CV-GUNet to adaptively learn the importances of features at various scales, thereby dynamically fusing the multiscale information among them. The comparisons and ablation experiments on three PolSAR datasets show that MAE-CV-GUNet has excellent performance in PolSAR image classification. Full article
Show Figures

Figure 1

24 pages, 15414 KB  
Article
TAF-YOLO: A Small-Object Detection Network for UAV Aerial Imagery via Visible and Infrared Adaptive Fusion
by Zhanhong Zhuo, Ruitao Lu, Yongxiang Yao, Siyu Wang, Zhi Zheng, Jing Zhang and Xiaogang Yang
Remote Sens. 2025, 17(24), 3936; https://doi.org/10.3390/rs17243936 - 5 Dec 2025
Viewed by 310
Abstract
Detecting small objects from UAV-captured aerial imagery is a critical yet challenging task, hindered by factors such as small object size, complex backgrounds, and subtle inter-class differences. Single-modal methods lack the robustness for all-weather operation, while existing multimodal solutions are often too computationally [...] Read more.
Detecting small objects from UAV-captured aerial imagery is a critical yet challenging task, hindered by factors such as small object size, complex backgrounds, and subtle inter-class differences. Single-modal methods lack the robustness for all-weather operation, while existing multimodal solutions are often too computationally expensive for deployment on resource-constrained UAVs. To this end, we propose TAF-YOLO, a lightweight and efficient multimodal detection framework designed to balance accuracy and efficiency. First, we propose an early fusion module, the Two-branch Adaptive Fusion Network (TAFNet), which adaptively integrates visible and infrared information at both pixel and channel levels before the feature extractor, maximizing complementary data while minimizing redundancy. Second, we propose a Large Adaptive Selective Kernel (LASK) module that dynamically expands the receptive field using multi-scale convolutions and spatial attention, preserving crucial details of small objects during downsampling. Finally, we present an optimized feature neck architecture that replaces PANet’s bidirectional path with a more efficient top-down pathway. This is enhanced by a Dual-Stream Attention Bridge (DSAB) that injects high-level semantics into low-level features, improving localization without significant computational overhead. On the VEDAI benchmark, TAF-YOLO achieves 67.2% mAP50, outperforming the CFT model by 2.7% and demonstrating superior performance against seven other YOLO variants. Our work presents a practical and powerful solution that enables real-time, all-weather object detection on resource-constrained UAVs. Full article
Show Figures

Figure 1

16 pages, 5826 KB  
Article
Multi-Scale Feature Fusion Convolutional Neural Network Fault Diagnosis Method for Rolling Bearings
by Wen Yang, Meijuan Hu, Xionglu Peng and Jianghong Yu
Processes 2025, 13(12), 3929; https://doi.org/10.3390/pr13123929 - 4 Dec 2025
Viewed by 229
Abstract
Fault diagnosis methods for rolling bearings are frequently constrained to the automatic extraction of single-scale features from raw vibration signals, overlooking crucial information embedded in data of other scales, which often results in unsatisfactory diagnostic outcomes. To address this, a lightweight neural network [...] Read more.
Fault diagnosis methods for rolling bearings are frequently constrained to the automatic extraction of single-scale features from raw vibration signals, overlooking crucial information embedded in data of other scales, which often results in unsatisfactory diagnostic outcomes. To address this, a lightweight neural network model is proposed, which incorporates an improved Inception module for multi-scale convolutional feature fusion. Initially, this model generates time–frequency maps via continuous wavelet transform. Subsequently, it integrates the Fused-conv and Mbconv modules from the EfficientNet V2 architecture with the Inception module to conduct multi-scale convolution on input features, thereby comprehensively capturing fault information of the bearing. Additionally, it substitutes traditional convolution with depthwise separable convolution to minimize training parameters and introduces an attention mechanism to emphasize significant features while diminishing less relevant ones, thereby enhancing the accuracy of bearing fault diagnosis. Experimental findings indicate that the proposed fault diagnosis model achieves an accuracy of 100% under single-load conditions and 96.2% under variable-load conditions, demonstrating its applicability across diverse data sets and robust generalization capabilities. Full article
(This article belongs to the Section Process Control and Monitoring)
Show Figures

Figure 1

30 pages, 4862 KB  
Article
A Multi-Channel Δ-BiLSTM Framework for Short-Term Bus Load Forecasting Based on VMD and LOWESS
by Yeran Guo, Li Wang and Jie Zhao
Electronics 2025, 14(23), 4772; https://doi.org/10.3390/electronics14234772 - 4 Dec 2025
Viewed by 121
Abstract
Short-term bus load forecasting in distribution networks faces severe challenges of non-stationarity, high-frequency disturbances, and multi-scale coupling arising from renewable integration and emerging loads such as centralized EV charging. Conventional statistical and deep learning approaches often exhibit instability under abrupt fluctuations, whereas decomposition-based [...] Read more.
Short-term bus load forecasting in distribution networks faces severe challenges of non-stationarity, high-frequency disturbances, and multi-scale coupling arising from renewable integration and emerging loads such as centralized EV charging. Conventional statistical and deep learning approaches often exhibit instability under abrupt fluctuations, whereas decomposition-based frameworks risk redundancy and information leakage. This study develops a hybrid forecasting framework that integrates variational mode decomposition (VMD), locally weighted scatterplot smoothing (LOWESS), and a multi-channel differential bidirectional long short-term memory network (Δ-BiLSTM). VMD decomposes the bus load sequence into intrinsic mode functions (IMFs), residuals are adaptively smoothed using LOWESS, and effective channels are selected through correlation-based redundancy control. The Δ-target learning strategy enhances the modeling of ramping dynamics and abrupt transitions, while Bayesian optimization and time-sequenced validation ensure reproducibility and stable training. Case studies on coastal-grid bus load data demonstrate substantial improvements in accuracy. In single-step forecasting, RMSE is reduced by 65.5% relative to ARIMA, and R2 remains above 0.98 for horizons h = 1–3, with slower error growth than LSTM, RNN, and SVM. Segment-wise analysis further shows that, for h=1, the RMSE on the fluctuation, stable, and peak segments is reduced by 69.4%, 62.5%, and 62.4%, respectively, compared with ARIMA. The proposed Δ-BiLSTM exhibits compact error distributions and narrow interquartile ranges, confirming its robustness under peak-load and highly volatile conditions. Furthermore, the framework’s design ensures both transparency and reliable training, contributing to its robustness and practical applicability. Overall, the VMD–LOWESS–Δ-BiLSTM framework achieves superior accuracy, calibration, and robustness in complex, noisy, and non-stationary environments. Its interpretable structure and reproducible training protocol make it a reliable and practical solution for short-term bus load forecasting in modern distribution networks. Full article
Show Figures

Figure 1

24 pages, 5327 KB  
Article
Pedestrian Pose Estimation Based on YOLO-SwinTransformer Hybrid Model
by Jie Wu and Ming Chen
World Electr. Veh. J. 2025, 16(12), 658; https://doi.org/10.3390/wevj16120658 - 4 Dec 2025
Viewed by 180
Abstract
In the context of complex scenarios, identifying the posture of individuals is a critical technology in the fields of intelligent surveillance and autonomous driving. However, existing methods face challenges in effectively balancing real-time performance, occlusion, and recognition accuracy. To address this issue, we [...] Read more.
In the context of complex scenarios, identifying the posture of individuals is a critical technology in the fields of intelligent surveillance and autonomous driving. However, existing methods face challenges in effectively balancing real-time performance, occlusion, and recognition accuracy. To address this issue, we propose a lightweight hybrid model, referred to as YOLO-SwinTransformer, in this study. This model utilizes YOLOv8’s CSP Darknet as the primary network to achieve efficient multi-scale feature extraction. It integrates the Path Aggregation Network aggregation (PANet) and HRNet with high-resolution multi-scale feature extraction, enhancing cross-level semantic information interaction. The primary innovation of this model is the design of a modified Swin Transformer posture identification module, incorporating the Spatial Locality-Aware Module (SLAM) to enhance local feature extraction, achieving a combined modeling of space attention and time-series continuity. This effectively addresses the challenges posed by occlusion and video distortion in identifying posture. Additionally, we have extended the CIoU Loss and weighted mean square error loss functions to improve posture identification strategies, enhancing the precision of key points. Ultimately, extensive experimentation with both the COCO dataset and the self-built realistic road dataset demonstrated that the YOLO-SwinTransformer model achieved a state-of-the-art Average Precision (AP) of 84.9% on the COCO dataset, representing a significant 12.8% enhancement over the YOLOv8 baseline (72.1% AP). More importantly, on our challenging self-built real-world road dataset, the model achieved 82.3% AP (a 13.7% improvement over the baseline’s 68.6% AP), proving its superior robustness in complex occlusion and low-light scenarios. The model’s size is 27.3 M, and its lightweight design enables 39–41 FPS of real-time processing on edge devices, providing a feasible solution for intelligent monitoring and autonomous driving applications with high precision and efficiency. Full article
Show Figures

Figure 1

Back to TopTop