MDPI - Publisher of Open Access Journals

19 pages, 13644 KB

Open AccessArticle

Rock Surface Crack Recognition Based on Improved Mask R-CNN with CBAM and BiFPN

by Yu Hu, Naifu Deng, Fan Ye, Qinglong Zhang and Yuchen Yan

Buildings 2025, 15(19), 3516; https://doi.org/10.3390/buildings15193516 - 29 Sep 2025

Viewed by 272

To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. [...] Read more.

To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. A dataset of 1028 rock surface crack images was constructed. The robustness of the model was improved by dynamically combining Gaussian blurring, noise overlay, and color adjustment to enhance data augmentation strategies. The model embeds the CBAM module after the residual block of the ResNet50 backbone network, strengthens the crack-related feature response through channel attention, and uses spatial attention to focus on the spatial distribution of cracks; at the same time, it replaces the traditional FPN with BiFPN, realizes the adaptive fusion of cross-scale features through learnable weights, and optimizes multi-scale crack feature extraction. Experimental results show that the improved model significantly improves the crack recognition effect in complex rock mass scenarios. The mAP index, precision and recall rate are improved by 8.36%, 9.1% and 12.7%, respectively, compared with the baseline model. This research provides an effective solution for rock crack detection in complex geological environments, especially the missed detection of small cracks and complex backgrounds. Full article

(This article belongs to the Special Issue Recent Scientific Developments in Structural Damage Identification)

► Show Figures

Figure 1

19 pages, 6678 KB

Open AccessArticle

Wheat Head Detection in Field Environments Based on an Improved YOLOv11 Model

by Yuting Zhang, Zihang Liu, Xiangdong Guo, Congcong Li and Guifa Teng

Agriculture 2025, 15(16), 1765; https://doi.org/10.3390/agriculture15161765 - 17 Aug 2025

Cited by 1 | Viewed by 929

Abstract

Precise wheat head detection is essential for plant counting and yield estimation in precision agriculture. To tackle the difficulties arising from densely packed wheat heads with diverse scales and intricate occlusions in real-world field conditions, this research introduces YOLO v11n-GRN, an improved wheat [...] Read more.

Precise wheat head detection is essential for plant counting and yield estimation in precision agriculture. To tackle the difficulties arising from densely packed wheat heads with diverse scales and intricate occlusions in real-world field conditions, this research introduces YOLO v11n-GRN, an improved wheat head detection model founded on the streamlined YOLO v11n framework. The model optimizes performance through three key innovations: This study introduces a Global Edge Information Transfer (GEIT) module architecture that incorporates a Multi-Scale Edge Information Generator (MSEIG) to enhance the perception of wheat head contours through effective modeling of edge features and deep semantic fusion. Additionally, a C3k2_RFCAConv module is developed to improve spatial awareness and multi-scale feature representation by integrating receptive field augmentation and a coordinate attention mechanism. The utilization of the Normalized Gaussian Wasserstein Distance (NWD) as the localization loss function enhances regression stability for distant small targets. Experiments were, respectively, validated on the self-built multi-temporal wheat field image dataset and the GWHD2021 public dataset. Results showed that, while maintaining a lightweight design (3.6 MB, 10.3 GFLOPs), the YOLOv11n-GRN model achieved a precision, recall, and mAP@0.5 of 92.5%, 91.1%, and 95.7%, respectively, on the self-built dataset, and 91.6%, 89.7%, and 94.4%, respectively, on the GWHD2021 dataset. This fully demonstrates that the improvements can effectively enhance the model’s comprehensive detection performance for wheat ear targets in complex backgrounds. Meanwhile, this study offers an effective technical approach for wheat head detection and yield estimation in challenging field conditions, showcasing promising practical implications. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

23 pages, 5668 KB

Open AccessArticle

MEFA-Net: Multilevel Feature Extraction and Fusion Attention Network for Infrared Small-Target Detection

by Jingcui Ma, Nian Pan, Dengyu Yin, Di Wang and Jin Zhou

Remote Sens. 2025, 17(14), 2502; https://doi.org/10.3390/rs17142502 - 18 Jul 2025

Cited by 1 | Viewed by 622

Abstract

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic [...] Read more.

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic gap in the feature fusion process, a multilevel feature extraction and fusion attention network (MEFA-Net) is designed. Specifically, the dilated direction-sensitive convolution block (DDCB) is devised to collaboratively extract local detail features, contextual features, and Gaussian salient features via ordinary convolution, dilated convolution and parallel strip convolution. Furthermore, the encoder attention fusion module (EAF) is employed, where spatial and channel attention weights are generated using dual-path pooling to achieve the adaptive fusion of deep and shallow layer features. Lastly, an efficient up-sampling block (EUB) is constructed, integrating a hybrid up-sampling strategy with multi-scale dilated convolution to refine the localization of small targets. The experimental results confirm that the proposed algorithm model surpasses most existing recent methods. Compared with the baseline, the intersection over union (IoU) and probability of detection

P_{d}

of MEFA-Net on the IRSTD-1k dataset are increased by 2.25% and 3.05%, respectively, achieving better detection performance and a lower false alarm rate in complex scenarios. Full article

► Show Figures

Figure 1

24 pages, 20337 KB

Open AccessArticle

MEAC: A Multi-Scale Edge-Aware Convolution Module for Robust Infrared Small-Target Detection

by Jinlong Hu, Tian Zhang and Ming Zhao

Sensors 2025, 25(14), 4442; https://doi.org/10.3390/s25144442 - 16 Jul 2025

Viewed by 677

Abstract

Infrared small-target detection remains a critical challenge in military reconnaissance, environmental monitoring, forest-fire prevention, and search-and-rescue operations, owing to the targets’ extremely small size, sparse texture, low signal-to-noise ratio, and complex background interference. Traditional convolutional neural networks (CNNs) struggle to detect such weak, [...] Read more.

Infrared small-target detection remains a critical challenge in military reconnaissance, environmental monitoring, forest-fire prevention, and search-and-rescue operations, owing to the targets’ extremely small size, sparse texture, low signal-to-noise ratio, and complex background interference. Traditional convolutional neural networks (CNNs) struggle to detect such weak, low-contrast objects due to their limited receptive fields and insufficient feature extraction capabilities. To overcome these limitations, we propose a Multi-Scale Edge-Aware Convolution (MEAC) module that enhances feature representation for small infrared targets without increasing parameter count or computational cost. Specifically, MEAC fuses (1) original local features, (2) multi-scale context captured via dilated convolutions, and (3) high-contrast edge cues derived from differential Gaussian filters. After fusing these branches, channel and spatial attention mechanisms are applied to adaptively emphasize critical regions, further improving feature discrimination. The MEAC module is fully compatible with standard convolutional layers and can be seamlessly embedded into various network architectures. Extensive experiments on three public infrared small-target datasets (SIRSTD-UAVB, IRSTDv1, and IRSTD-1K) demonstrate that networks augmented with MEAC significantly outperform baseline models using standard convolutions. When compared to eleven mainstream convolution modules (ACmix, AKConv, DRConv, DSConv, LSKConv, MixConv, PConv, ODConv, GConv, and Involution), our method consistently achieves the highest detection accuracy and robustness. Experiments conducted across multiple versions, including YOLOv10, YOLOv11, and YOLOv12, as well as various network levels, demonstrate that the MEAC module achieves stable improvements in performance metrics while slightly increasing computational and parameter complexity. These results validate the MEAC module’s significant advantages in enhancing the detection of small and weak objects and suppressing interference from complex backgrounds. These results validate MEAC’s effectiveness in enhancing weak small-target detection and suppressing complex background noise, highlighting its strong generalization ability and practical application potential. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

28 pages, 11617 KB

Open AccessArticle

PS-YOLO: A Lighter and Faster Network for UAV Object Detection

by Han Zhong, Yan Zhang, Zhiguang Shi, Yu Zhang and Liang Zhao

Remote Sens. 2025, 17(9), 1641; https://doi.org/10.3390/rs17091641 - 6 May 2025

Cited by 7 | Viewed by 3087

Abstract

The operational environment of UAVs poses unique challenges for object detection compared to conventional methods. When UAVs capture remote sensing images from elevated altitudes, objects often appear minuscule and can be easily obscured by complex backgrounds. This increases the likelihood of false positives [...] Read more.

The operational environment of UAVs poses unique challenges for object detection compared to conventional methods. When UAVs capture remote sensing images from elevated altitudes, objects often appear minuscule and can be easily obscured by complex backgrounds. This increases the likelihood of false positives and missed detections, thereby complicating the detection process. Furthermore, the hardware resources available on UAV platforms are typically highly constrained. To meet deployment requirements, researchers often must compromise some detection accuracy in favor of a more lightweight model. To address these challenges, we propose PS-YOLO, a fast and precise network specifically designed for UAV-based object detection. In the proposed network, we first design a lightweight backbone based on partial convolution. Then, we introduce a more efficient neck network called FasterBIFFPN to replace the original PAFPN, enabling more effective multi-scale feature fusion. Finally, we propose the GSCD head. GSCD employs shared convolutions to enhance the network’s ability to learn common features across objects of different scales and introduces Normalized Gaussian Wasserstein Distance Loss (NWDLoss) to improve detection accuracy. This detection head effectively increases inference speed without significantly increasing parameter counts. The proposed PS-YOLO is validated on the Visdrone2019 dataset, and the results demonstrate that PS-YOLO provides a 2% improvement in precision, 0.5% improvement in recall, 1.3% improvement in mean average precision (mAP), 41.3% reduction in parameter counts, 6.1% reduction in computational cost, and 26.73 FPS improvement in inference speed compared to the benchmark model YOLOv11-s. Full article

► Show Figures

Figure 1

29 pages, 31432 KB

Open AccessArticle

GAANet: Symmetry-Driven Gaussian Modeling with Additive Attention for Precise and Robust Oriented Object Detection

by Jiangang Zhu, Yi Liu, Qiang Fu and Donglin Jing

Symmetry 2025, 17(5), 653; https://doi.org/10.3390/sym17050653 - 25 Apr 2025

Viewed by 536

Abstract

Oriented objects in RSI (Remote Sensing Imagery) typically present arbitrary rotations, extreme aspect ratios, multi-scale variations, and complex backgrounds. These factors often result in feature misalignment, representational ambiguity, and regression inconsistency, which significantly degrade detection performance. To address these issues, GAANet (Gaussian-Augmented Additive [...] Read more.

Oriented objects in RSI (Remote Sensing Imagery) typically present arbitrary rotations, extreme aspect ratios, multi-scale variations, and complex backgrounds. These factors often result in feature misalignment, representational ambiguity, and regression inconsistency, which significantly degrade detection performance. To address these issues, GAANet (Gaussian-Augmented Additive Network), a symmetry-driven framework for ODD (oriented object detection), is proposed. GAANet incorporates a symmetry-preserving mechanism into three critical components—feature extraction, representation modeling, and metric optimization—facilitating systematic improvements from structural representation to learning objectives. A CAX-ViT (Contextual Additive Exchange Vision Transformer) is developed to enhance multi-scale structural modeling by combining spatial–channel symmetric interactions with convolution–attention fusion. A GBBox (Gaussian Bounding Box) representation is employed, which implicitly encodes directional information through the invariance of the covariance matrix, thereby alleviating angular periodicity problems. Additionally, a GPIoU (Gaussian Product Intersection over Union) loss function is introduced to ensure geometric consistency between training objectives and the SkewIoU evaluation metric. GAANet achieved a 90.58% mAP on HRSC2016, 89.95% on UCAS-AOD, and 77.86% on the large-scale DOTA v1.0 dataset, outperforming mainstream methods across various benchmarks. In particular, GAANet showed a +3.27% mAP improvement over

R^{3}

Det and a +4.68% gain over Oriented R-CNN on HRSC2016, demonstrating superior performance over representative baselines. Overall, GAANet establishes a closed-loop detection paradigm that integrates feature interaction, probabilistic modeling, and metric optimization under symmetry priors, offering both theoretical rigor and practical efficacy. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry Study in Object Detection)

► Show Figures

Figure 1

19 pages, 7047 KB

Open AccessArticle

A Real-Time Lightweight Behavior Recognition Model for Multiple Dairy Goats

by Xiaobo Wang, Yufan Hu, Meili Wang, Mei Li, Wenxiao Zhao and Rui Mao

Animals 2024, 14(24), 3667; https://doi.org/10.3390/ani14243667 - 19 Dec 2024

Cited by 4 | Viewed by 1773

Abstract

Livestock behavior serves as a crucial indicator of physiological health. Leveraging deep learning techniques to automatically recognize dairy goat behaviors, particularly abnormal ones, enables early detection of potential health and environmental issues. To address the challenges of recognizing small-target behaviors in complex environments, [...] Read more.

Livestock behavior serves as a crucial indicator of physiological health. Leveraging deep learning techniques to automatically recognize dairy goat behaviors, particularly abnormal ones, enables early detection of potential health and environmental issues. To address the challenges of recognizing small-target behaviors in complex environments, a multi-scale and lightweight behavior recognition model for dairy goats called GSCW-YOLO was proposed. The model integrates Gaussian Context Transformation (GCT) and the Content-Aware Reassembly of Features (CARAFE) upsampling operator, enhancing the YOLOv8n framework’s attention to behavioral features, reducing interferences from complex backgrounds, and improving the ability to distinguish subtle behavior differences. Additionally, GSCW-YOLO incorporates a small-target detection layer and optimizes the Wise-IoU loss function, increasing its effectiveness in detecting distant small-target behaviors and transient abnormal behaviors in surveillance videos. Data for this study were collected via video surveillance under varying lighting conditions and evaluated on a self-constructed dataset comprising 9213 images. Experimental results demonstrated that the GSCW-YOLO model achieved a precision of 93.5%, a recall of 94.1%, and a mean Average Precision (mAP) of 97.5%, representing improvements of 3, 3.1, and 2 percentage points, respectively, compared to the YOLOv8n model. Furthermore, GSCW-YOLO is highly efficient, with a model size of just 5.9 MB and a frame per second (FPS) of 175. It outperforms popular models such as CenterNet, EfficientDet, and other YOLO-series networks, providing significant technical support for the intelligent management and welfare-focused breeding of dairy goats, thus advancing the modernization of the dairy goat industry. Full article

(This article belongs to the Special Issue Mathematical Modeling and Computer Vision in Animal Activity or Behavior: 2nd Edition)

► Show Figures

Figure 1

22 pages, 5240 KB

Open AccessArticle

MMPW-Net: Detection of Tiny Objects in Aerial Imagery Using Mixed Minimum Point-Wasserstein Distance

by Nan Su, Zilong Zhao, Yiming Yan, Jinpeng Wang, Wanxuan Lu, Hongbo Cui, Yunfei Qu, Shou Feng and Chunhui Zhao

Remote Sens. 2024, 16(23), 4485; https://doi.org/10.3390/rs16234485 - 29 Nov 2024

Cited by 5 | Viewed by 2320

Abstract

The detection of distant tiny objects in aerial imagery plays a pivotal role in early warning, localization, and recognition tasks. However, due to the scarcity of appearance information, minimal pixel representation, susceptibility to blending with the background, and the incompatibility of conventional metrics, [...] Read more.

The detection of distant tiny objects in aerial imagery plays a pivotal role in early warning, localization, and recognition tasks. However, due to the scarcity of appearance information, minimal pixel representation, susceptibility to blending with the background, and the incompatibility of conventional metrics, the rapid and accurate detection of tiny objects poses significant challenges. To address these issues, a single-stage tiny object detector tailored for aerial imagery is proposed, comprising two primary components. Firstly, we introduce a light backbone-heavy neck architecture, named the Global Context Self-Attention and Dense Nested Connection Feature Extraction Network (GC-DN Network), which efficiently extracts and fuses multi-scale features of the target. Secondly, we propose a novel metric, MMPW, to replace the Intersection over Union (IoU) in label assignment strategies, Non-Maximum Suppression (NMS), and regression loss functions. Specifically, MMPW models bounding boxes as 2D Gaussian distributions and utilizes the Mixed Minimum Point-Wasserstein Distance to quantify the similarity between boxes. Experiments conducted on the latest aerial image tiny object datasets, AI-TOD and VisDrone-19, demonstrate that our method improves AP50 performance by 9.4% and 5%, respectively, and AP performance by 4.3% and 3.6%. This validates the efficacy of our approach for detecting tiny objects in aerial imagery. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

23 pages, 5300 KB

Open AccessArticle

An Automatic Detection and Statistical Method for Underwater Fish Based on Foreground Region Convolution Network (FR-CNN)

by Shenghong Li, Peiliang Li, Shuangyan He, Zhiyan Kuai, Yanzhen Gu, Haoyang Liu, Tao Liu and Yuan Lin

J. Mar. Sci. Eng. 2024, 12(8), 1343; https://doi.org/10.3390/jmse12081343 - 7 Aug 2024

Cited by 2 | Viewed by 2090

Abstract

Computer vision in marine ranching enables real-time monitoring of underwater resources. Detecting fish presents challenges due to varying water turbidity and lighting, affecting color consistency. We propose a Foreground Region Convolutional Neural Network (FR-CNN) that combines unsupervised and supervised methods. It introduces an [...] Read more.

Computer vision in marine ranching enables real-time monitoring of underwater resources. Detecting fish presents challenges due to varying water turbidity and lighting, affecting color consistency. We propose a Foreground Region Convolutional Neural Network (FR-CNN) that combines unsupervised and supervised methods. It introduces an adaptive multiscale regression Gaussian background model to distinguish fish from noise at different scales. Probability density functions integrate spatiotemporal information for object detection, addressing illumination and water quality shifts. FR-CNN achieves 95% mAP with IoU of 0.5, reducing errors from open-source datasets. It updates anchor boxes automatically on local datasets, enhancing object detection accuracy in long-term monitoring. The results analyze fish species behaviors in relation to environmental conditions, validating the method’s practicality. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

22 pages, 4810 KB

Open AccessArticle

Ship Target Detection in Optical Remote Sensing Images Based on E2YOLOX-VFL

by Qichang Zhao, Yiquan Wu and Yubin Yuan

Remote Sens. 2024, 16(2), 340; https://doi.org/10.3390/rs16020340 - 15 Jan 2024

Cited by 13 | Viewed by 3850

Abstract

In this research, E2YOLOX-VFL is proposed as a novel approach to address the challenges of optical image multi-scale ship detection and recognition in complex maritime and land backgrounds. Firstly, the typical anchor-free network YOLOX is utilized as the baseline network for ship detection. [...] Read more.

In this research, E2YOLOX-VFL is proposed as a novel approach to address the challenges of optical image multi-scale ship detection and recognition in complex maritime and land backgrounds. Firstly, the typical anchor-free network YOLOX is utilized as the baseline network for ship detection. Secondly, the Efficient Channel Attention module is incorporated into the YOLOX Backbone network to enhance the model’s capability to extract information from objects of different scales, such as large, medium, and small, thus improving ship detection performance in complex backgrounds. Thirdly, we propose the Efficient Force-IoU (EFIoU) Loss function as a replacement for the Intersection over Union (IoU) Loss, addressing the issue whereby IoU Loss only considers the intersection and union between the ground truth boxes and the predicted boxes, without taking into account the size and position of targets. This also considers the disadvantageous effects of low-quality samples, resulting in inaccuracies in measuring target similarity, and improves the regression performance of the algorithm. Fourthly, the confidence loss function is improved. Specifically, Varifocal Loss is employed instead of CE Loss, effectively handling the positive and negative sample imbalance, challenging samples, and class imbalance, enhancing the overall detection performance of the model. Then, we propose Balanced Gaussian NMS (BG-NMS) to solve the problem of missed detection caused by the occlusion of dense targets. Finally, the E2YOLOX-VFL algorithm is tested on the HRSC2016 dataset, achieving a 9.28% improvement in mAP compared to the baseline YOLOX algorithm. Moreover, the detection performance using BG-NMS is also analyzed, and the experimental results validate the effectiveness of the E2YOLOX-VFL algorithm. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection II)

► Show Figures

Graphical abstract

22 pages, 26316 KB

Open AccessArticle

Semantic Segmentation of Remote Sensing Imagery Based on Multiscale Deformable CNN and DenseCRF

by Xiang Cheng and Hong Lei

Remote Sens. 2023, 15(5), 1229; https://doi.org/10.3390/rs15051229 - 23 Feb 2023

Cited by 12 | Viewed by 4902

Abstract

The semantic segmentation of remote sensing images is a significant research direction in digital image processing. The complex background environment, irregular size and shape of objects, and similar appearance of different categories of remote sensing images have brought great challenges to remote sensing [...] Read more.

The semantic segmentation of remote sensing images is a significant research direction in digital image processing. The complex background environment, irregular size and shape of objects, and similar appearance of different categories of remote sensing images have brought great challenges to remote sensing image segmentation tasks. Traditional convolutional-neural-network-based models often ignore spatial information in the feature extraction stage and pay less attention to global context information. However, spatial context information is important in complex remote sensing images, which means that the segmentation effect of traditional models needs to be improved. In addition, neural networks with a superior segmentation performance often suffer from the problem of high computational resource consumption. To address the above issues, this paper proposes a combination model of a modified multiscale deformable convolutional neural network (mmsDCNN) and dense conditional random field (DenseCRF). Firstly, we designed a lightweight multiscale deformable convolutional network (mmsDCNN) with a large receptive field to generate a preliminary prediction probability map at each pixel. The output of the mmsDCNN model is a coarse segmentation result map, which has the same size as the input image. In addition, the preliminary segmentation result map contains rich multiscale features. Then, the multi-level DenseCRF model based on the superpixel level and the pixel level is proposed, which can make full use of the context information of the image at different levels and further optimize the rough segmentation result of mmsDCNN. To be specific, we converted the pixel-level preliminary probability map into a superpixel-level predicted probability map according to the simple linear iterative clustering (SILC) algorithm and defined the potential function of the DenseCRF model based on this. Furthermore, we added the pixel-level potential function constraint term to the superpixel-based Gaussian potential function to obtain a combined Gaussian potential function, which enabled our model to consider the features of various scales and prevent poor superpixel segmentation results from affecting the final result. To restore the contour of the object more clearly, we utilized the Sketch token edge detection algorithm to extract the edge contour features of the image and fused them into the potential function of the DenseCRF model. Finally, extensive experiments on the Potsdam and Vaihingen datasets demonstrated that the proposed model exhibited significant advantages compared to the current state-of-the-art models. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Graphical abstract

26 pages, 34880 KB

Open AccessArticle

AF-OSD: An Anchor-Free Oriented Ship Detector Based on Multi-Scale Dense-Point Rotation Gaussian Heatmap

by Zizheng Hua, Gaofeng Pan, Kun Gao, Hengchao Li and Su Chen

Remote Sens. 2023, 15(4), 1120; https://doi.org/10.3390/rs15041120 - 18 Feb 2023

Cited by 11 | Viewed by 2623

Abstract

Due to the complexity of airborne remote sensing scenes, strong background and noise interference, positive and negative sample imbalance, and multiple ship scales, ship detection is a critical and challenging task in remote sensing. This work proposes an end-to-end anchor-free oriented ship detector [...] Read more.

Due to the complexity of airborne remote sensing scenes, strong background and noise interference, positive and negative sample imbalance, and multiple ship scales, ship detection is a critical and challenging task in remote sensing. This work proposes an end-to-end anchor-free oriented ship detector (AF-OSD) framework based on a multi-scale dense-point rotation Gaussian heatmap (MDP-RGH) to tackle these aforementioned challenges. First, to solve the sample imbalance problem and suppress the interference of negative samples such as background and noise, the oriented ship is modeled via the proposed MDP-RGH according to its shape and direction to generate ship labels with more accurate information, while the imbalance between positive and negative samples is adaptively learned for the ships with different scales. Then, the AF-OSD based on MDP-RGH is further devised to detect the multi-scale oriented ship, which is the accurate identification and information extraction for multi-scale vessels. Finally, a multi-task object size adaptive loss function is designed to guide the training process, improving its detection quality and performance for multi-scale oriented ships. Simulation results show that extensive experiments on HRSC2016 and DOTA ship datasets reveal that the proposed method achieves significantly outperforms the compared state-of-the-art methods. Full article

(This article belongs to the Topic Deep Learning and Transformers’ Methods Applied to Remotely Captured Data)

► Show Figures

Figure 1

21 pages, 6681 KB

Open AccessArticle

Arbitrary-Oriented Ship Detection Method Based on Long-Edge Decomposition Rotated Bounding Box Encoding in SAR Images

by Xinqiao Jiang, Hongtu Xie, Jiaxing Chen, Jian Zhang, Guoqian Wang and Kai Xie

Remote Sens. 2023, 15(3), 673; https://doi.org/10.3390/rs15030673 - 23 Jan 2023

Cited by 22 | Viewed by 3700

Abstract

Due to the limitations of the horizontal bounding boxes for locating the oriented ship targets in synthetic aperture radar (SAR) images, the rotated bounding box (RBB) has received wider attention in recent years. First, the existing RBB encodings suffer from boundary discontinuity problems, [...] Read more.

Due to the limitations of the horizontal bounding boxes for locating the oriented ship targets in synthetic aperture radar (SAR) images, the rotated bounding box (RBB) has received wider attention in recent years. First, the existing RBB encodings suffer from boundary discontinuity problems, which interfere with the convergence of the model, and then lead to some problems, such as the inaccurate location of the ship targets in the boundary state. Thus, from the perspective that the long-edge features of the ships are more representative of their orientation, the long-edge decomposition RBB encoding has been proposed in this paper, which can avoid the boundary discontinuity problem. Second, the problem of the positive and negative samples imbalance is serious for the SAR ship images because only a few ship targets exist in the vast background of these images. Since the ship targets of different sizes are subject to varying degrees of interference caused by this problem, a multiscale elliptical Gaussian sample balancing strategy has been proposed in this paper, which can mitigate the impact of this problem by labeling the loss weights of the negative samples within the target foreground area with multiscale elliptical Gaussian kernels. Finally, experiments based on the CenterNet model were implemented on the benchmark SAR image dataset SSDD (SAR ship detection dataset). The experimental results demonstrate that our proposed long-edge decomposition RBB encoding outperforms other conventional RBB encodings in the task of oriented ship detection in SAR images. In addition, our proposed multiscale elliptical Gaussian sample balancing strategy is effective and can improve the model performance. Full article

(This article belongs to the Special Issue Radar Signal Processing and Imaging for Ocean Remote Sensing)

► Show Figures

Figure 1

20 pages, 6770 KB

Open AccessArticle

Early Fault Diagnosis of Rolling Bearing Based on Threshold Acquisition U-Net

by Dongsheng Zhang, Laiquan Zhang, Naikang Zhang, Shuo Yang and Yuhao Zhang

Machines 2023, 11(1), 119; https://doi.org/10.3390/machines11010119 - 15 Jan 2023

Cited by 5 | Viewed by 2731

Abstract

Considering the problem that the early fault signal of rolling bearing is easily interfered with by background information, such as noise, and it is difficult to extract fault features, a method of rolling bearing early fault diagnosis based on the threshold acquisition U-Net [...] Read more.

Considering the problem that the early fault signal of rolling bearing is easily interfered with by background information, such as noise, and it is difficult to extract fault features, a method of rolling bearing early fault diagnosis based on the threshold acquisition U-Net (TA-UNet) is proposed. First, to improve the feature extraction ability of U-Net, the channel spatial threshold acquisition network (CS-TAN) and the dilated convolution module (DCM) based on different dilated rate combinations are introduced into the U-Net to construct the TA-UNet. Among them, the CS-TAN can adaptively learn the threshold, reduce the interference of noise in the signal, and the DCM can improve the multi-scale feature extraction ability of the network. Then, the TA-UNet is used for early fault diagnosis, and the method is divided into two steps: The model training phase and the vibration signal fault feature extraction phase. In the first step, additive gaussian white noise is added to the vibration signal to obtain the noise-added vibration signal, and the TA-UNet is trained to learn how to denoise the noise-added vibration signal. In the second step, the trained TA-UNet is used to extract the fault features of vibration signals and diagnose the early fault types of rolling bearing. The two-step method solves the problem that U-Net, as a supervised neural network, needs corresponding labeled data to be trained, as it realizes the fault diagnosis of unlabeled data. The feature extraction capability of the TA-UNet is evaluated by denoising the simulated signal of rolling bearing. The effectiveness of the proposed diagnostic method is demonstrated by the early fault diagnosis of open-source datasets. Full article

(This article belongs to the Topic Artificial Intelligence in Smart Industrial Diagnostics and Manufacturing)

► Show Figures

Figure 1

19 pages, 38957 KB

Open AccessArticle

Enhancement and Restoration of Scratched Murals Based on Hyperspectral Imaging—A Case Study of Murals in the Baoguang Hall of Qutan Temple, Qinghai, China

by Pengyu Sun, Miaole Hou, Shuqiang Lyu, Wanfu Wang, Shuyang Li, Jincheng Mao and Songnian Li

Sensors 2022, 22(24), 9780; https://doi.org/10.3390/s22249780 - 13 Dec 2022

Cited by 12 | Viewed by 3720

Abstract

Environmental changes and human activities have caused serious degradation of murals around the world. Scratches are one of the most common issues in these damaged murals. We propose a new method for virtually enhancing and removing scratches from murals; which can provide an [...] Read more.

Environmental changes and human activities have caused serious degradation of murals around the world. Scratches are one of the most common issues in these damaged murals. We propose a new method for virtually enhancing and removing scratches from murals; which can provide an auxiliary reference and support for actual restoration. First, principal component analysis (PCA) was performed on the hyperspectral data of a mural after reflectance correction, and high-pass filtering was performed on the selected first principal component image. Principal component fusion was used to replace the original first principal component with a high-pass filtered first principal component image, which was then inverse PCA transformed with the other original principal component images to obtain an enhanced hyperspectral image. The linear information in the mural was therefore enhanced, and the differences between the scratches and background improved. Second, the enhanced hyperspectral image of the mural was synthesized as a true colour image and converted to the HSV colour space. The light brightness component of the image was estimated using the multi-scale Gaussian function and corrected with a 2D gamma function, thus solving the problem of localised darkness in the murals. Finally, the enhanced mural images were applied as input to the triplet domain translation network pretrained model. The local branches in the translation network perform overall noise smoothing and colour recovery of the mural, while the partial nonlocal block is used to extract the information from the scratches. The mapping process was learned in the hidden space for virtual removal of the scratches. In addition, we added a Butterworth high-pass filter at the end of the network to generate the final restoration result of the mural with a clearer visual effect and richer high-frequency information. We verified and validated these methods for murals in the Baoguang Hall of Qutan Temple. The results show that the proposed method outperforms the restoration results of the total variation (TV) model, curvature-driven diffusion (CDD) model, and Criminisi algorithm. Moreover, the proposed combined method produces better recovery results and improves the visual richness, readability, and artistic expression of the murals compared with direct recovery using a triple domain translation network. Full article

(This article belongs to the Special Issue Sensors for Imaging Cultural Heritage: Technologies, Methods and Data Processing)

► Show Figures

Figure 1

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (24)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI