MDPI - Publisher of Open Access Journals

21 pages, 2271 KB

Open AccessArticle

A Domain Adaptation-Based Ocean Mesoscale Eddy Detection Method Under Harsh Sea States

by Chen Zhang, Yujia Zhang, Shaotian Li, Xin Li and Shiqiu Peng

Remote Sens. 2025, 17(19), 3317; https://doi.org/10.3390/rs17193317 - 27 Sep 2025

Under harsh sea states, the dynamic characteristics of ocean mesoscale eddies (OMEs) become significantly more complex, posing substantial challenges to their accurate detection and identification. In this study, we propose an artificial intelligence detection method for OMEs based on the domain adaptation technique [...] Read more.

Under harsh sea states, the dynamic characteristics of ocean mesoscale eddies (OMEs) become significantly more complex, posing substantial challenges to their accurate detection and identification. In this study, we propose an artificial intelligence detection method for OMEs based on the domain adaptation technique to accurately perform pixel-level segmentation and ensure its effectiveness under harsh sea states. The proposed model (LCNN) utilizes large kernel convolution to increase the model’s receptive field and deeply extract eddy features. To deal with the pronounced cross-domain distribution shifts induced by harsh sea states, an adversarial learning framework (ADF) is introduced into LCNN to enforce feature alignment between the source (normal sea states) and target (harsh sea states) domains, which can also significantly improve the segmentation performance in our constructed dataset. The proposed model achieves an accuracy, precision, and Mean Intersection over Union of 1.5%, 6.0%, and 7.2%, respectively, outperforming the existing state-of-the-art technologies. Full article

(This article belongs to the Special Issue AI-Empowered Remote Sensing Monitoring and Geospatial Analysis for Ocean and Coastal Environments)

► Show Figures

Figure 1

17 pages, 2172 KB

Open AccessArticle

GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery

by Zhiyong Ju, Jiacheng Shui and Jiameng Huang

Electronics 2025, 14(19), 3831; https://doi.org/10.3390/electronics14193831 - 27 Sep 2025

Abstract

To enhance small object detection in UAV aerial imagery suffering from low resolution and complex backgrounds, this paper proposes GLDS-YOLO, an improved lightweight detection model. The model integrates four core modules: Group Shuffle Attention (GSA) to strengthen small-scale feature perception, Large Separable Kernel [...] Read more.

To enhance small object detection in UAV aerial imagery suffering from low resolution and complex backgrounds, this paper proposes GLDS-YOLO, an improved lightweight detection model. The model integrates four core modules: Group Shuffle Attention (GSA) to strengthen small-scale feature perception, Large Separable Kernel Attention (LSKA) to capture global semantic context, DCNv4 to enhance feature adaptability with reduced parameters, and further proposes a novel Small-object-enhanced Multi-scale and Structure Detail Enhancement (SMSDE) module, which enhances edge-detail representation of small objects while maintaining lightweight efficiency. Experiments on VisDrone2019 and DOTA1.0 demonstrate that GLDS-YOLO achieves superior detection performance. On VisDrone2019, it improves mAP@0.5 and mAP@0.5:0.95 by 12.1% and 7%, respectively, compared with YOLOv11n, while maintaining competitive results on DOTA. These results confirm the model’s effectiveness, robustness, and adaptability for complex small object detection tasks in UAV scenarios. Full article

► Show Figures

Figure 1

30 pages, 8673 KB

Open AccessArticle

EDTST: Efficient Dynamic Token Selection Transformer for Hyperspectral Image Classification

by Xiang Hu, Zhiwen Zhang, Jianghe Zhai, Longlong Zhang, Yuxiang Tang, Yuanxi Peng and Tong Zhou

Remote Sens. 2025, 17(18), 3180; https://doi.org/10.3390/rs17183180 - 14 Sep 2025

Viewed by 343

Abstract

Hyperspectral images, characterized by rich spectral information, enable precise pixel-level classification and are thus widely employed in remote sensing applications. Although convolutional neural networks (CNNs) have demonstrated effectiveness in hyperspectral image processing, their limited receptive fields constrain their capacity to capture long-range dependencies. [...] Read more.

Hyperspectral images, characterized by rich spectral information, enable precise pixel-level classification and are thus widely employed in remote sensing applications. Although convolutional neural networks (CNNs) have demonstrated effectiveness in hyperspectral image processing, their limited receptive fields constrain their capacity to capture long-range dependencies. Transformers excel at modeling long-range features for hyperspectral image classification (HSIC). Yet, they often overlook effective representation of local spectral–spatial characteristics while incurring computational redundancy from numerous classification-irrelevant tokens. To address these challenges, we propose EDTST, a state-of-the-art Vision Transformer architecture specifically designed for efficient hyperspectral image classification. The model utilizes a large-kernel 3D convolution block to extract deep spectral–spatial features. A 2D convolution block further refines these features, followed by a novel attention mechanism with dynamic token pruning that substantially reduces the computational load by focusing on the most pertinent features. The process concludes with an adaptive average pooling layer and a fully connected layer for classification. Extensive experiments on four standard hyperspectral datasets demonstrate that EDTST achieves the highest classification accuracy, with a notable 3% improvement in overall accuracy on the WHU-Hi-HanChuan dataset, while requiring the shortest training and inference time among all compared state-of-the-art models from the past three years. These results validate the efficacy of our approach in achieving superior performance with markedly improved computational efficiency. Full article

► Show Figures

Figure 1

25 pages, 7057 KB

Open AccessArticle

CSTC: Visual Transformer Network with Multimodal Dual Fusion for Hyperspectral and LiDAR Image Classification

by Yong Mei, Jinlong Fan, Xiangsuo Fan and Qi Li

Remote Sens. 2025, 17(18), 3158; https://doi.org/10.3390/rs17183158 - 11 Sep 2025

Viewed by 335

Abstract

Convolutional neural networks have made significant progress in multimodal remote sensing image classification, but traditional convolutional neural networks are limited by fixed-size convolutional kernels, which are unable to effectively model and adequately extract contextual information; hyperspectral imagery and LiDAR data have comparatively large [...] Read more.

Convolutional neural networks have made significant progress in multimodal remote sensing image classification, but traditional convolutional neural networks are limited by fixed-size convolutional kernels, which are unable to effectively model and adequately extract contextual information; hyperspectral imagery and LiDAR data have comparatively large information differences, which do not allow for effective information interaction and fusion. Based on this, this paper proposes a multimodal dual fusion network (CSTC) based on the Vision Transformer for the collaborative classification of HSI and LiDAR data. The model is designed through a two-branch architecture: the HSI branch extracts spectral–spatial features by dimensionality reduction using principal component analysis and inputs them into the cross-connectivity feature fusion module; the LiDAR branch mines spatial elevation features through the stacked MobileNetV2 module. The features of the two branches are encoded by a Transformer, and the modal interaction fusion is realized by the cross-attention module for the first time. Then, the features are spliced and input into the secondary Transformer for deep cross-modal fusion, and finally, the classification is completed by the multilayer perceptron. Experiments show that the CSTC model achieves overall classification accuracies of 92.32%, 99.81%, 97.90%, and 99.37% on the publicly available MUUFL dataset, Trento dataset, Augsburg dataset, and Houston2013 dataset, respectively, which is superior to the latest HSI–LiDAR separate classification algorithms. The ablation experiments and model performance evaluation experiments further show that the proposed CSTC model achieves excellent results in terms of robustness, adaptability, and parameter scale. Full article

► Show Figures

Figure 1

22 pages, 2230 KB

Open AccessArticle

A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration

by Xun Dou, Ruiang Yang, Zhenlan Dou, Chunyan Zhang, Chen Xu and Jiacheng Li

Sustainability 2025, 17(18), 8162; https://doi.org/10.3390/su17188162 - 10 Sep 2025

Viewed by 277

Abstract

With the advancement of new power system construction, thermostatically controlled loads represented by regional air conditioning systems are being extensively integrated into the grid, leading to a surge in the number of user nodes. This large-scale integration of new loads creates challenges for [...] Read more.

With the advancement of new power system construction, thermostatically controlled loads represented by regional air conditioning systems are being extensively integrated into the grid, leading to a surge in the number of user nodes. This large-scale integration of new loads creates challenges for the grid, as the resulting load data exhibits strong periodicity and randomness over time. These characteristics are influenced by factors like temperature and user behavior. At the same time, spatially adjacent nodes show similarities and clustering in electricity usage. This creates complex spatiotemporal coupling features. These complex spatiotemporal characteristics challenge traditional forecasting methods. Their high model complexity and numerous parameters often lead to overfitting or the curse of dimensionality, which hinders both prediction accuracy and efficiency. To address this issue, this paper proposes a load forecasting method based on spatiotemporal partitioning and collaborative cross-regional attention. First, a spatiotemporal similarity matrix is constructed using the Shape Dynamic Time Warping (ShapeDTW) algorithm and an adaptive Gaussian kernel function based on the Haversine distance. Spectral clustering combined with the Gap Statistic criterion is then applied to adaptively determine the optimal number of partitions, dividing all load nodes in the power grid into several sub-regions with homogeneous spatiotemporal characteristics. Second, for each sub-region, a local Spatiotemporal Graph Convolutional Network (STGCN) model is built. By integrating gated temporal convolution with spatial feature extraction, the model accurately captures the spatiotemporal evolution patterns within each sub-region. On this basis, a cross-regional attention mechanism is designed to dynamically learn the correlation weights among sub-regions, enabling collaborative fusion of global features. Finally, the proposed method is evaluated on a multi-node load dataset. The effectiveness of the approach is validated through comparative experiments and ablation studies (that is, by removing key components of the model to evaluate their contribution to the overall performance). Experimental results demonstrate that the proposed method achieves excellent performance in short-term load forecasting tasks across multiple nodes. Full article

(This article belongs to the Special Issue Energy Conservation Towards a Low-Carbon and Sustainability Future)

► Show Figures

Figure 1

17 pages, 1180 KB

Open AccessArticle

Optimized DSP Framework for 112 Gb/s PM-QPSK Systems with Benchmarking and Complexity–Performance Trade-Off Analysis

by Julien Moussa H. Barakat, Abdullah S. Karar and Bilel Neji

Eng 2025, 6(9), 218; https://doi.org/10.3390/eng6090218 - 2 Sep 2025

Viewed by 473

Abstract

In order to enhance the performance of 112 Gb/s polarization-multiplexed quadrature phase-shift keying (PM-QPSK) coherent optical receivers, a novel digital signal processing (DSP) framework is presented in this study. The suggested method combines cutting-edge signal processing techniques to address important constraints in long-distance, [...] Read more.

In order to enhance the performance of 112 Gb/s polarization-multiplexed quadrature phase-shift keying (PM-QPSK) coherent optical receivers, a novel digital signal processing (DSP) framework is presented in this study. The suggested method combines cutting-edge signal processing techniques to address important constraints in long-distance, high data rate coherent systems. The framework uses overlap frequency domain equalization (OFDE) for chromatic dispersion (CD) compensation, which offers a cheaper computational cost and higher dispersion control precision than traditional time-domain equalization. An adaptive carrier phase recovery (CPR) technique based on mean-squared differential phase (MSDP) estimation is incorporated to manage phase noise induced by cross-phase modulation (XPM), providing dependable correction under a variety of operating situations. When combined, these techniques significantly increase Q factor performance, and optimum systems can handle transmission distances of up to 2400 km. The suggested DSP approach improves phase stability and dispersion tolerance even in the presence of nonlinear impairments, making it a viable and effective choice for contemporary coherent optical networks. The framework’s competitiveness was evaluated by comparing it against the most recent, cutting-edge DSP methods that were released after 2021. These included CPR systems that were based on kernels, transformers, and machine learning. The findings show that although AI-driven approaches had the highest absolute Q factors, they also required a large amount of computing power. On the other hand, the suggested OFDE in conjunction with adaptive CPR achieved Q factors of up to 11.7 dB over extended distances with a significantly reduced DSP effort, striking a good balance between performance and complexity. Its appropriateness for scalable, long-haul 112 Gb/s PM-QPSK systems is confirmed by a complexity versus performance trade-off analysis, providing a workable and efficient substitute for more resource-intensive alternatives. Full article

(This article belongs to the Section Electrical and Electronic Engineering)

► Show Figures

Figure 1

38 pages, 2697 KB

Open AccessArticle

Liver Tumor Segmentation Based on Multi-Scale Deformable Feature Fusion and Global Context Awareness

by Chenghao Zhang, Lingfei Wang, Chunyu Zhang, Yu Zhang, Jin Li and Peng Wang

Biomimetics 2025, 10(9), 576; https://doi.org/10.3390/biomimetics10090576 - 1 Sep 2025

Viewed by 574

Abstract

The highly heterogeneous and irregular morphology of liver tumors presents considerable challenges for automated segmentation. To better capture complex tumor structures, this study proposes a liver tumor segmentation framework based on multi-scale deformable feature fusion and global context modeling. The method incorporates three [...] Read more.

The highly heterogeneous and irregular morphology of liver tumors presents considerable challenges for automated segmentation. To better capture complex tumor structures, this study proposes a liver tumor segmentation framework based on multi-scale deformable feature fusion and global context modeling. The method incorporates three key innovations: (1) a Deformable Large Kernel Attention (D-LKA) mechanism in the encoder to enhance adaptability to irregular tumor features, combining a large receptive field with deformable sensitivity to precisely extract tumor boundaries; (2) a Context Extraction (CE) module in the bottleneck layer to strengthen global semantic modeling and compensate for limited capacity in capturing contextual dependencies; and (3) a Dual Cross Attention (DCA) mechanism to replace traditional skip connections, enabling deep cross-scale and cross-semantic feature fusion, thereby improving feature consistency and expressiveness during decoding. The proposed framework was trained and validated on a combined LiTS and MSD Task08 dataset and further evaluated on the independent 3D-IRCADb01 dataset. Experimental results show that it surpasses several state-of-the-art segmentation models in Intersection over Union (IoU) and other metrics, achieving superior segmentation accuracy and generalization performance. Feature visualizations at both encoding and decoding stages provide intuitive insights into the model’s internal processing of tumor recognition and boundary delineation, enhancing interpretability and clinical reliability. Overall, this approach presents a novel and practical solution for robust liver tumor segmentation, demonstrating strong potential for clinical application and real-world deployment. Full article

(This article belongs to the Special Issue New Biomimetic Advances in Signal and Image Processing for Biomedical Applications 2025)

► Show Figures

Figure 1

28 pages, 2198 KB

Open AccessArticle

A Large Kernel Convolutional Neural Network with a Noise Transfer Mechanism for Real-Time Semantic Segmentation

by Jinhang Liu, Yuhe Du, Jing Wang and Xing Tang

Sensors 2025, 25(17), 5357; https://doi.org/10.3390/s25175357 - 29 Aug 2025

Viewed by 485

Abstract

In semantic segmentation tasks, large kernels and Atrous convolution have been utilized to increase the receptive field, enabling models to achieve competitive performance with fewer parameters. However, due to the fixed size of kernel functions, networks incorporating large convolutional kernels are limited in [...] Read more.

In semantic segmentation tasks, large kernels and Atrous convolution have been utilized to increase the receptive field, enabling models to achieve competitive performance with fewer parameters. However, due to the fixed size of kernel functions, networks incorporating large convolutional kernels are limited in adaptively capturing multi-scale features and fail to effectively leverage global contextual information. To address this issue, we combine Atrous convolution with large kernel convolution, using different dilation rates to compensate for the single-scale receptive field limitation of large kernels. Simultaneously, we employ a dynamic selection mechanism to adaptively highlight the most important spatial features based on global information. Additionally, to enhance the model’s ability to fit the true label distribution, we propose a Multi-Scale Contextual Noise Transfer Matrix (NTM), which uses high-order consistency information from neighborhood representations to estimate NTM and correct supervision signals, thereby improving the model’s generalization capability. Extensive experiments conducted on Cityscapes, ADE20K, and COCO-Stuff-10K demonstrate that this approach achieves a new state-of-the-art balance between speed and accuracy. Specifically, LKNTNet achieves 80.05% mIoU on Cityscapes with an inference speed of 80.7 FPS and 42.7% mIoU on ADE20K with an inference speed of 143.6 FPS. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

30 pages, 21184 KB

Open AccessArticle

FSTC-DiMP: Advanced Feature Processing and Spatio-Temporal Consistency for Anti-UAV Tracking

by Desen Bu, Bing Ding, Xiaozhong Tong, Bei Sun, Xiaoyong Sun, Runze Guo and Shaojing Su

Remote Sens. 2025, 17(16), 2902; https://doi.org/10.3390/rs17162902 - 20 Aug 2025

Viewed by 748

Abstract

The widespread application of UAV technology has brought significant security concerns that cannot be ignored, driving considerable attention to anti-unmanned aerial vehicle (UAV) tracking technologies. Anti-UAV tracking faces challenges, including target entry into and exit from the field of view, thermal crossover, and [...] Read more.

The widespread application of UAV technology has brought significant security concerns that cannot be ignored, driving considerable attention to anti-unmanned aerial vehicle (UAV) tracking technologies. Anti-UAV tracking faces challenges, including target entry into and exit from the field of view, thermal crossover, and interference from similar objects, where Siamese network trackers exhibit notable limitations in anti-UAV tracking. To address these issues, we propose FSTC-DiMP, an anti-UAV tracking algorithm. To better handle feature extraction in low-Signal-to-Clutter-Ratio (SCR) images and expand receptive fields, we introduce the Large Selective Kernel (LSK) attention mechanism, achieving a balance between local feature focus and global information integration. A spatio-temporal consistency-guided re-detection mechanism is designed to mitigate tracking failures caused by target entry into and exit from the field of view or similar-object interference through spatio-temporal relationship analysis. Additionally, a background augmentation module has been developed to more efficiently utilise initial frame information, effectively capturing the semantic features of both targets and their surrounding environments. Experimental results on the AntiUAV410 and AntiUAV600 datasets demonstrate that FSTC-DiMP achieves significant performance improvements in anti-UAV tracking tasks, validating the algorithm’s strong robustness and adaptability to complex environments. Full article

(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)

► Show Figures

Graphical abstract

20 pages, 3862 KB

Open AccessArticle

BlueberryNet: A Lightweight CNN for Real-Time Ripeness Detection in Automated Blueberry Processing Systems

by Bojian Yu, Hongwei Zhao and Xinwei Zhang

Processes 2025, 13(8), 2518; https://doi.org/10.3390/pr13082518 - 10 Aug 2025

Viewed by 530

Abstract

Blueberries are valued for their flavor and health benefits, but inconsistent ripeness at harvest complicates post-harvest food processing such as sorting and quality control. To address this, we propose a lightweight convolutional neural network (CNN) to detect blueberry ripeness in complex field environments, [...] Read more.

Blueberries are valued for their flavor and health benefits, but inconsistent ripeness at harvest complicates post-harvest food processing such as sorting and quality control. To address this, we propose a lightweight convolutional neural network (CNN) to detect blueberry ripeness in complex field environments, supporting efficient and automated food processing workflows. To meet the low-power and low-resource demands of embedded systems used in smart processing lines, we introduce a Grouped Large Kernel Reparameterization (GLKRep) module. This design reduces computational cost while enhancing the model’s ability to recognize ripe blueberries under complex lighting and background conditions. We also propose a Unified Adaptive Multi-Scale Fusion (UMSF) detection head that adaptively integrates multi-scale features using a dynamic receptive field. This enables the model to detect blueberries of various sizes accurately, a common challenge in real-world harvests. During training, a Semantics-Aware IoU (SAIoU) loss function is used to improve the alignment between predicted and ground truth regions by emphasizing semantic consistency. The model achieves 98.1% accuracy with only 2.6M parameters, outperforming existing methods. Its high accuracy, compact size, and low computational load make it suitable for real-time deployment in embedded sorting and grading systems, bridging field detection and downstream food-processing tasks. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

► Show Figures

Figure 1

22 pages, 6201 KB

Open AccessArticle

SOAM Block: A Scale–Orientation-Aware Module for Efficient Object Detection in Remote Sensing Imagery

by Yi Chen, Zhidong Wang, Zhipeng Xiong, Yufeng Zhang and Xinqi Xu

Symmetry 2025, 17(8), 1251; https://doi.org/10.3390/sym17081251 - 6 Aug 2025

Viewed by 349

Abstract

Object detection in remote sensing imagery is critical in environmental monitoring, urban planning, and land resource management. However, the task remains challenging due to significant scale variations, arbitrary object orientations, and complex background clutter. To address these issues, we propose a novel orientation [...] Read more.

Object detection in remote sensing imagery is critical in environmental monitoring, urban planning, and land resource management. However, the task remains challenging due to significant scale variations, arbitrary object orientations, and complex background clutter. To address these issues, we propose a novel orientation module (SOAM Block) that jointly models object scale and directional features while exploiting geometric symmetry inherent in many remote sensing targets. The SOAM Block is constructed upon a lightweight and efficient Adaptive Multi-Scale (AMS) Module, which utilizes a symmetric arrangement of parallel depth-wise convolutional branches with varied kernel sizes to extract fine-grained multi-scale features without dilation, thereby preserving local context and enhancing scale adaptability. In addition, a Strip-based Context Attention (SCA) mechanism is introduced to model long-range spatial dependencies, leveraging horizontal and vertical 1D strip convolutions in a directionally symmetric fashion. This design captures spatial correlations between distant regions and reinforces semantic consistency in cluttered scenes. Importantly, this work is the first to explicitly analyze the coupling between object scale and orientation in remote sensing imagery. The proposed method addresses the limitations of fixed receptive fields in capturing symmetric directional cues of large-scale objects. Extensive experiments are conducted on two widely used benchmarks—DOTA and HRSC2016—both of which exhibit significant scale variations and orientation diversity. Results demonstrate that our approach achieves superior detection accuracy with fewer parameters and lower computational overhead compared to state-of-the-art methods. The proposed SOAM Block thus offers a robust, scalable, and symmetry-aware solution for high-precision object detection in complex aerial scenes. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 3409 KB

Open AccessArticle

Short-Term Prediction Intervals for Photovoltaic Power via Multi-Level Analysis and Dual Dynamic Integration

by Kaiyang Kuang, Jingshan Zhang, Qifan Chen, Yan Zhou, Yan Yan, Litao Dai and Guanghu Wang

Electronics 2025, 14(15), 3068; https://doi.org/10.3390/electronics14153068 - 31 Jul 2025

Viewed by 330

Abstract

There is an obvious correlation between the photovoltaic (PV) output of different physical levels; that is, the overall power change trend of large-scale regional (high-level) stations can provide a reference for the prediction of the output of sub-regional (low-level) stations. The current PV [...] Read more.

There is an obvious correlation between the photovoltaic (PV) output of different physical levels; that is, the overall power change trend of large-scale regional (high-level) stations can provide a reference for the prediction of the output of sub-regional (low-level) stations. The current PV prediction methods have not deeply explored the multi-level PV power generation elements and have not considered the correlation between different levels, resulting in the inability to obtain potential information on PV power generation. Moreover, traditional probabilistic prediction models lack adaptability, which can lead to a decrease in prediction performance under different PV prediction scenarios. Therefore, a probabilistic prediction method for short-term PV power based on multi-level adaptive dynamic integration is proposed in this paper. Firstly, an analysis is conducted on the multi-level PV power stations together with the influence of the trend of high-level PV power generation on the forecast of low-level power generation. Then, the PV data are decomposed into multiple layers using the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and analyzed by combining fuzzy entropy (FE) and mutual information (MI). After that, a new multi-level model prediction method, namely, the improved dual dynamic adaptive stacked generalization (I-Stacking) ensemble learning model, is proposed to construct short-term PV power generation prediction models. Finally, an improved dynamic adaptive kernel density estimation (KDE) method for prediction errors is proposed, which optimizes the performance of the prediction intervals (PIs) through variable bandwidth. Through comparative experiments and analysis using traditional methods, the effectiveness of the proposed method is verified. Full article

(This article belongs to the Special Issue Situational Awareness and Protection Technologies for Low-Carbon Economic Operation of New Power Systems)

► Show Figures

Figure 1

22 pages, 16984 KB

Open AccessArticle

Small Ship Detection Based on Improved Neural Network Algorithm and SAR Images

by Jiaqi Li, Hongyuan Huo, Li Guo, De Zhang, Wei Feng, Yi Lian and Long He

Remote Sens. 2025, 17(15), 2586; https://doi.org/10.3390/rs17152586 - 24 Jul 2025

Cited by 1 | Viewed by 508

Abstract

Synthetic aperture radar images can be used for ship target detection. However, due to the unclear ship outline in SAR images, noise and land background factors affect the difficulty and accuracy of ship (especially small target ship) detection. Therefore, based on the YOLOv5s [...] Read more.

Synthetic aperture radar images can be used for ship target detection. However, due to the unclear ship outline in SAR images, noise and land background factors affect the difficulty and accuracy of ship (especially small target ship) detection. Therefore, based on the YOLOv5s model, this paper improves its backbone network and feature fusion network algorithm to improve the accuracy of ship detection target recognition. First, the LSKModule is used to improve the backbone network of YOLOv5s. By adaptively aggregating the features extracted by large-size convolution kernels to fully obtain context information, at the same time, key features are enhanced and noise interference is suppressed. Secondly, multiple Depthwise Separable Convolution layers are added to the SPPF (Spatial Pyramid Pooling-Fast) structure. Although a small number of parameters and calculations are introduced, features of different receptive fields can be extracted. Third, the feature fusion network of YOLOv5s is improved based on BIFPN, and the shallow feature map is used to optimize the small target detection performance. Finally, the CoordConv module is added before the detect head of YOLOv5, and two coordinate channels are added during the convolution operation to further improve the accuracy of target detection. The map50 of this method for the SSDD dataset and HRSID dataset reached 97.6% and 91.7%, respectively, and was compared with a variety of advanced target detection models. The results show that the detection accuracy of this method is higher than other similar target detection algorithms. Full article

► Show Figures

Figure 1

25 pages, 2129 KB

Open AccessArticle

Zero-Shot 3D Reconstruction of Industrial Assets: A Completion-to-Reconstruction Framework Trained on Synthetic Data

by Yongjie Xu, Haihua Zhu and Barmak Honarvar Shakibaei Asli

Electronics 2025, 14(15), 2949; https://doi.org/10.3390/electronics14152949 - 24 Jul 2025

Viewed by 542

Abstract

Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and [...] Read more.

Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and topologically sound 3D meshes from sparse, noisy, and incomplete point clouds acquired in complex industrial environments. We introduce a robust two-stage completion-to-reconstruction framework, C2R3D-Net, that systematically tackles this problem. The methodology first employs a pretrained, self-supervised point cloud completion network to infer a dense and structurally coherent geometric representation from degraded inputs. Subsequently, a novel adaptive surface reconstruction network generates the final high-fidelity mesh. This network features a hybrid encoder (FKAConv-LSA-DC), which integrates fixed-kernel and deformable convolutions with local self-attention to robustly capture both coarse geometry and fine details, and a boundary-aware multi-head interpolation decoder, which explicitly models sharp edges and thin structures to preserve geometric fidelity. Comprehensive experiments on the large-scale synthetic ShapeNet benchmark demonstrate state-of-the-art performance across all standard metrics. Crucially, we validate the framework’s strong zero-shot generalization capability by deploying the model—trained exclusively on synthetic data—to reconstruct complex assets from a custom-collected industrial dataset without any additional fine-tuning. The results confirm the method’s suitability as a robust and scalable approach for 3D asset modeling, a critical enabling step for creating high-fidelity DTs in demanding, unseen industrial settings. Full article

(This article belongs to the Special Issue Advances of Artificial Intelligence and Vision Applications, 2nd Edition)

► Show Figures

Figure 1

25 pages, 6689 KB

Open AccessArticle

UAV Small Target Detection Model Based on Dual Branches and Adaptive Feature Fusion

by Guogang Wang, Mingxing Gao and Yunpeng Liu

Sensors 2025, 25(15), 4542; https://doi.org/10.3390/s25154542 - 22 Jul 2025

Viewed by 590

Abstract

In order to solve the problem of small and dense targets in drone aerial images, a small target detection model based on dual branches and adaptive feature fusion is proposed. The model first constructs a small target detection framework with dual branches to [...] Read more.

In order to solve the problem of small and dense targets in drone aerial images, a small target detection model based on dual branches and adaptive feature fusion is proposed. The model first constructs a small target detection framework with dual branches to improve the detection accuracy while reducing the number of parameters. Secondly, the model introduces semantic and detail injection (SDI) in the neck network and embeds bidirectional adaptive feature fusion in the detection head to innovate and optimize the feature fusion mechanism, achieve the full interaction of deep and shallow information, enhance the feature representation of small targets, and overcome the problem of scale inconsistency. Finally, in order to focus on the target area more accurately, we introduce the large separable kernel attention mechanism into the convolutional layer to provide it with a richer and more comprehensive feature representation, which significantly improves the detection accuracy of targets of different scales. The experimental results show that the model algorithm performs well in the VisDrone2019 dataset. Compared with the original model, the mAP50 of this model increases by 20.9%, the mAP50–95 increases by 23.7%, and the total number of parameters decreases by 61.3%, making it more suitable for drones. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

Search Results (180)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (180)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI