Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (24)

Search Parameters:
Keywords = weighted cascade feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 19790 KiB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Viewed by 209
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 3406 KiB  
Article
Single-Image Super-Resolution via Cascaded Non-Local Mean Network and Dual-Path Multi-Branch Fusion
by Yu Xu and Yi Wang
Sensors 2025, 25(13), 4044; https://doi.org/10.3390/s25134044 - 28 Jun 2025
Viewed by 482
Abstract
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs. It plays a crucial role in applications such as medical imaging, surveillance, and remote sensing. However, due to the ill-posed nature of the task and the inherent limitations of imaging [...] Read more.
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs. It plays a crucial role in applications such as medical imaging, surveillance, and remote sensing. However, due to the ill-posed nature of the task and the inherent limitations of imaging sensors, obtaining accurate HR images remains challenging. While numerous methods have been proposed, the traditional approaches suffer from oversmoothing and limited generalization; CNN-based models lack the ability to capture long-range dependencies; and Transformer-based solutions, although effective in modeling global context, are computationally intensive and prone to texture loss. To address these issues, we propose a hybrid CNN–Transformer architecture that cascades a pixel-wise self-attention non-local means module (PSNLM) and an adaptive dual-path multi-scale fusion block (ADMFB). The PSNLM is inspired by the non-local means (NLM) algorithm. We use weighted patches to estimate the similarity between pixels centered at each patch while limiting the search region and constructing a communication mechanism across ranges. The ADMFB enhances texture reconstruction by adaptively aggregating multi-scale features through dual attention paths. The experimental results demonstrate that our method achieves superior performance on multiple benchmarks. For instance, in challenging ×4 super-resolution, our method outperforms the second-best method by 0.0201 regarding the Structural Similarity Index (SSIM) on the BSD100 dataset. On the texture-rich Urban100 dataset, our method achieves a 26.56 dB Peak Signal-to-Noise Ratio (PSNR) and 0.8133 SSIM. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 6358 KiB  
Article
Optimization of Sorghum Spike Recognition Algorithm and Yield Estimation
by Mengyao Han, Jian Gao, Cuiqing Wu, Qingliang Cui, Xiangyang Yuan and Shujin Qiu
Agronomy 2025, 15(7), 1526; https://doi.org/10.3390/agronomy15071526 - 23 Jun 2025
Viewed by 291
Abstract
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational [...] Read more.
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational power, and it is difficult to realize real-time detection of sorghum spikes on mobile devices. This study proposes a detection-tracking scheme based on improved YOLOv8s-GOLD-LSKA with optimized DeepSort, aiming to enhance yield estimation accuracy in complex agricultural field scenarios. By integrating the GOLD module’s dual-branch multi-scale feature fusion and the LSKA attention mechanism, a lightweight detection model is developed. The improved DeepSort algorithm enhances tracking robustness in occlusion scenarios by optimizing the confidence threshold filtering (0.46), frame-skipping count, and cascading matching strategy (n = 3, max_age = 40). Combined with the five-point sampling method, the average dry weight of sorghum spikes (0.12 kg) was used to enable rapid yield estimation. The results demonstrate that the improved model achieved a mAP of 85.86% (a 6.63% increase over the original YOLOv8), an F1 score of 81.19%, and a model size reduced to 7.48 MB, with a detection speed of 0.0168 s per frame. The optimized tracking system attained a MOTA of 67.96% and ran at 42 FPS. Image- and video-based yield estimation accuracies reached 89–96% and 75–93%, respectively, with single-frame latency as low as 0.047 s. By optimizing the full detection–tracking–yield pipeline, this solution overcomes challenges in small object missed detections, ID switches under occlusion, and real-time processing in complex scenarios. Its lightweight, high-efficiency design is well suited for deployment on UAVs and mobile terminals, providing robust technical support for intelligent sorghum monitoring and precision agriculture management, and thereby playing a crucial role in driving agricultural digital transformation. Full article
Show Figures

Figure 1

24 pages, 5461 KiB  
Article
Classification and Prediction of Unknown Thermal Barrier Coating Thickness Based on Hybrid Machine Learning and Terahertz Nondestructive Characterization
by Zhou Xu, Jianfei Xu, Yiwen Wu, Changdong Yin, Suqin Chen, Qiang Liu, Xin Ge, Luanfei Wan and Dongdong Ye
Coatings 2025, 15(6), 725; https://doi.org/10.3390/coatings15060725 - 17 Jun 2025
Viewed by 384
Abstract
Thickness inspection of thermal barrier coatings is crucial to safeguard the reliability of high-temperature components of aero-engines, but traditional destructive inspection methods are difficult to meet the demand for rapid assessment in the field. In this study, a new non-destructive testing method integrating [...] Read more.
Thickness inspection of thermal barrier coatings is crucial to safeguard the reliability of high-temperature components of aero-engines, but traditional destructive inspection methods are difficult to meet the demand for rapid assessment in the field. In this study, a new non-destructive testing method integrating terahertz time-domain spectroscopy and machine learning algorithms is proposed to systematically study the thickness inspection of 8YSZ coatings prepared by two processes, namely atmospheric plasma spraying (APS) and electron beam physical vapor deposition (EB-PVD). By optimizing the preparation process parameters, 620 sets of specimens with thicknesses of 100–400 μm are prepared, and three types of characteristic parameters, namely, time delay Δt, frequency shift Δf, and energy decay η, are extracted by combining wavelet threshold denoising and time-frequency joint analysis. A CNN-RF cascade model is constructed to realize coating process classification, and an attention-LSTM and SVR weighted fusion model is developed for thickness regression prediction. The results show that the multimodal feature fusion reduces the root-mean-square error of thickness prediction to 8.9 μm, which further improves the accuracy over the single feature model. The classification accuracy reaches 96.8%, of which the feature importance of time delay Δt accounts for 62%. The hierarchical modeling strategy reduces the detection mean absolute error from 6.2 μm to 4.1 μm. the method provides a high-precision solution for intelligent quality assessment of thermal barrier coatings, which is of great significance in promoting the progress of intelligent manufacturing technology for high-end equipment. Full article
Show Figures

Figure 1

25 pages, 9781 KiB  
Article
Building Segmentation in Urban and Rural Areas with MFA-Net: A Multidimensional Feature Adjustment Approach
by Zijie Han, Xue Li, Xianteng Wang, Zihao Wu and Jian Liu
Sensors 2025, 25(8), 2589; https://doi.org/10.3390/s25082589 - 19 Apr 2025
Viewed by 425
Abstract
Deep-learning-based methods are crucial for building extraction from high-resolution remote sensing images, playing a key role in applications like natural disaster response, land resource management, and smart city development. However, extracting precise building from complex urban and rural environments remains challenging due to [...] Read more.
Deep-learning-based methods are crucial for building extraction from high-resolution remote sensing images, playing a key role in applications like natural disaster response, land resource management, and smart city development. However, extracting precise building from complex urban and rural environments remains challenging due to spectral variability and intricate background interference, particularly in densely packed and small buildings. To address these issues, we propose an enhanced U2-Net architecture, MFA-Net, which incorporates two key innovations: a Multidimensional Feature Adjustment (MFA) module that refines feature representations through Cascaded Channel, Spatial, and Multiscale Weighting Mechanisms and a Dynamic Fusion Loss function that enhances edge geometric fidelity. Evaluation on three datasets (Urban, Rural, and WHU) reveals that MFA-Net outperforms existing methods, with average improvements of 6% in F1-score and 7.3% in IoU and an average increase of 9.9% in training time. These advancements significantly improve edge delineation and the segmentation of dense building clusters, making MFA-Net especially beneficial for urban planning and land resource management. Full article
Show Figures

Figure 1

19 pages, 14722 KiB  
Article
Log Volume Measurement and Counting Based on Improved Cascade Mask R-CNN and Deep SORT
by Chunjiang Yu, Yongke Sun, Yong Cao, Lei Liu and Xiaotao Zhou
Forests 2024, 15(11), 1884; https://doi.org/10.3390/f15111884 - 26 Oct 2024
Viewed by 1335
Abstract
Logs require multiple verifications to ensure accurate volume and quantity measurements. Log end detection is a crucial step in measuring log volume and counting logs. Currently, this task primarily relies on the Mask R-CNN instance segmentation model. However, the Feature Pyramid Network (FPN) [...] Read more.
Logs require multiple verifications to ensure accurate volume and quantity measurements. Log end detection is a crucial step in measuring log volume and counting logs. Currently, this task primarily relies on the Mask R-CNN instance segmentation model. However, the Feature Pyramid Network (FPN) in Mask R-CNN may compromise accuracy due to feature redundancy during multi-scale fusion, particularly with small objects. Moreover, counting logs in a single image is challenging due to their large size and stacking. To address the above issues, we propose an improved log segmentation model based on Cascade Mask R-CNN. This method uses ResNet for multi-scale feature extraction and integrates a hierarchical Convolutional Block Attention Module (CBAM) to refine feature weights and enhance object emphasis. Then, a Region Proposal Network (RPN) is employed to generate log segmentation proposals. Finally, combined with Deep SORT, the model tracks log ends in video streams and counts the number of logs in the stack. Experiments demonstrate the effectiveness of our method, achieving an average precision (AP) of 82.3, APs of 75.3 for small, APm of 70.9 for medium, and APl of 86.2 for large objects. These results represent improvements of 1.8%, 3.7%, 2.6%, and 1.4% over Mask R-CNN, respectively. The detection rate reached 98.6%, with a counting accuracy of 95%. Compared to manually measured volumes, our method shows a low error rate of 4.07%. Full article
(This article belongs to the Section Wood Science and Forest Products)
Show Figures

Figure 1

18 pages, 18674 KiB  
Article
An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images
by Feixiang Lv, Taihong Zhang, Yunjie Zhao, Zhixin Yao and Xinyu Cao
Sensors 2024, 24(18), 5990; https://doi.org/10.3390/s24185990 - 15 Sep 2024
Cited by 2 | Viewed by 1328
Abstract
Farm aerial survey layers can assist in unmanned farm operations, such as planning paths and early warnings. To address the inefficiencies and high costs associated with traditional layer construction, this study proposes a high-precision instance segmentation algorithm based on SparseInst. Considering the structural [...] Read more.
Farm aerial survey layers can assist in unmanned farm operations, such as planning paths and early warnings. To address the inefficiencies and high costs associated with traditional layer construction, this study proposes a high-precision instance segmentation algorithm based on SparseInst. Considering the structural characteristics of farm elements, this study introduces a multi-scale attention module (MSA) that leverages the properties of atrous convolution to expand the sensory field. It enhances spatial and channel feature weights, effectively improving segmentation accuracy for large-scale and complex targets in the farm through three parallel dense connections. A bottom-up aggregation path is added to the feature pyramid fusion network, enhancing the model’s ability to perceive complex targets such as mechanized trails in farms. Coordinate attention blocks (CAs) are incorporated into the neck to capture richer contextual semantic information, enhancing farm aerial imagery scene recognition accuracy. To assess the proposed method, we compare it against existing mainstream object segmentation models, including the Mask R-CNN, Cascade–Mask, SOLOv2, and Condinst algorithms. The experimental results show that the improved model proposed in this study can be adapted to segment various complex targets in farms. The accuracy of the improved SparseInst model greatly exceeds that of Mask R-CNN and Cascade–Mask and is 10.8 and 12.8 percentage points better than the average accuracy of SOLOv2 and Condinst, respectively, with the smallest number of model parameters. The results show that the model can be used for real-time segmentation of targets under complex farm conditions. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

40 pages, 27981 KiB  
Article
Pyramid Cascaded Convolutional Neural Network with Graph Convolution for Hyperspectral Image Classification
by Haizhu Pan, Hui Yan, Haimiao Ge, Liguo Wang and Cuiping Shi
Remote Sens. 2024, 16(16), 2942; https://doi.org/10.3390/rs16162942 - 11 Aug 2024
Cited by 4 | Viewed by 1737
Abstract
Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To [...] Read more.
Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To learn the relations among samples in non-grid data, GCNs are employed and combined with CNNs to process HSIs. Nevertheless, most methods based on CNN-GCN may overlook the integration of pixel-wise spectral signatures. In this paper, we propose a pyramid cascaded convolutional neural network with graph convolution (PCCGC) for hyperspectral image classification. It mainly comprises CNN-based and GCN-based subnetworks. Specifically, in the CNN-based subnetwork, a pyramid residual cascaded module and a pyramid convolution cascaded module are employed to extract multiscale spectral and spatial features separately, which can enhance the robustness of the proposed model. Furthermore, an adaptive feature-weighted fusion strategy is utilized to adaptively fuse multiscale spectral and spatial features. In the GCN-based subnetwork, a band selection network (BSNet) is used to learn the spectral signatures in the HSI using nonlinear inter-band dependencies. Then, the spectral-enhanced GCN module is utilized to extract and enhance the important features in the spectral matrix. Subsequently, a mutual-cooperative attention mechanism is constructed to align the spectral signatures between BSNet-based matrix with the spectral-enhanced GCN-based matrix for spectral signature integration. Abundant experiments performed on four widely used real HSI datasets show that our model achieves higher classification accuracy than the fourteen other comparative methods, which shows the superior classification performance of PCCGC over the state-of-the-art methods. Full article
Show Figures

Graphical abstract

17 pages, 3557 KiB  
Article
EDUNet++: An Enhanced Denoising Unet++ for Ice-Covered Transmission Line Images
by Yu Zhang, Yinke Dou, Liangliang Zhao, Yangyang Jiao and Dongliang Guo
Electronics 2024, 13(11), 2085; https://doi.org/10.3390/electronics13112085 - 27 May 2024
Cited by 1 | Viewed by 4630
Abstract
New technology has made it possible to monitor and analyze the condition of ice-covered transmission lines based on images. However, the collected images are frequently accompanied by noise, which results in inaccurate monitoring. Therefore, this paper proposes an enhanced denoising Unet++ for ice-covered [...] Read more.
New technology has made it possible to monitor and analyze the condition of ice-covered transmission lines based on images. However, the collected images are frequently accompanied by noise, which results in inaccurate monitoring. Therefore, this paper proposes an enhanced denoising Unet++ for ice-covered transmission line images (EDUNet++). This algorithm mainly comprises three modules: a feature encoding and decoding module (FEADM), a shared source feature fusion module (SSFFM), and an error correction module (ECM). In the FEADM, a residual attention module (RAM) and a multilevel feature attention module (MFAM) are proposed. The RAM incorporates the cascaded residual structure and hybrid attention mechanism, that effectively preserve the mapping of feature information. The MFAM uses dilated convolution to obtain features at different levels, and then uses feature attention for weighting. This module effectively combines local and global features, which can better capture the details and texture information in the image. In the SSFFM, the source features are fused to preserve low-frequency information like texture and edges in the image, hence enhancing the realism and clarity of the image. The ECM utilizes the discrepancy between the generated image and the original image to effectively capture all the potential information in the image, hence enhancing the realism of the generated image. We employ a novel piecewise joint loss. On the dataset of ice-covered transmission lines, PSNR (peak signal to noise ratio) and SSIM (structural similarity) achieved values of 29.765 dB and 0.968, respectively. Additionally, the visual effects exhibited more distinct detailed features. The proposed method exhibits superior noise suppression capabilities and robustness compared to alternative approaches. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

16 pages, 5241 KiB  
Article
YOLOv8-CB: Dense Pedestrian Detection Algorithm Based on In-Vehicle Camera
by Qiuli Liu, Haixiong Ye, Shiming Wang and Zhe Xu
Electronics 2024, 13(1), 236; https://doi.org/10.3390/electronics13010236 - 4 Jan 2024
Cited by 35 | Viewed by 7828
Abstract
Recently, the field of vehicle-mounted visual intelligence technology has witnessed a surge of interest in pedestrian detection. Existing algorithms for dense pedestrian detection at intersections face challenges such as high computational weight, complex models that are difficult to deploy, and suboptimal detection accuracy [...] Read more.
Recently, the field of vehicle-mounted visual intelligence technology has witnessed a surge of interest in pedestrian detection. Existing algorithms for dense pedestrian detection at intersections face challenges such as high computational weight, complex models that are difficult to deploy, and suboptimal detection accuracy for small targets and highly occluded pedestrians. To address these issues, this paper proposes an improved lightweight multi-scale pedestrian detection algorithm, YOLOv8-CB. The algorithm introduces a lightweight cascade fusion network, CFNet (cascade fusion network), and a CBAM attention module to improve the characterization of multi-scale feature semantics and location information, and it superimposes a bidirectional weighted feature fusion path BIFPN structure to fuse more effective features and improve pedestrian detection performance. It is experimentally verified that compared with the YOLOv8n algorithm, the accuracy of the improved model is increased by 2.4%, the number of model parameters is reduced by 6.45%, and the computational load is reduced by 6.74%. The inference time for a single image is 10.8 ms. The cascade fusion algorithm YOLOv8-CB has higher detection accuracy and is a lighter model for multi-scale pedestrian detection in complex scenes such as streets or intersections. This proposed algorithm presents a valuable approach for device-side pedestrian detection with limited computational resources. Full article
Show Figures

Figure 1

16 pages, 6334 KiB  
Article
UAV Image Small Object Detection Based on RSAD Algorithm
by Jian Song, Zhihong Yu, Guimei Qi, Qiang Su, Jingjing Xie and Wenhang Liu
Appl. Sci. 2023, 13(20), 11524; https://doi.org/10.3390/app132011524 - 20 Oct 2023
Cited by 3 | Viewed by 1655
Abstract
There are many small objects in UAV images, and the object scale varies greatly. When the SSD algorithm detects them, the backbone network’s feature extraction capabilities are poor; it does not fully utilize the semantic information in the deeper feature layer, and it [...] Read more.
There are many small objects in UAV images, and the object scale varies greatly. When the SSD algorithm detects them, the backbone network’s feature extraction capabilities are poor; it does not fully utilize the semantic information in the deeper feature layer, and it does not give enough consideration to the little items in the loss function, which result in serious missing object detection and low object detection accuracy. To tackle these issues, a new algorithm called RSAD (Resnet Self-Attention Detector) that takes advantage of the self-attention mechanism has been proposed. The proposed RSAD algorithm utilises the residual structure of the ResNet-50 backbone network, which is more capable of feature extraction, in order to extract deeper features from UAV image information. It then utilises the SAFM (Self-Attention Fusion Module) to reshape and concatenate the shallow and deep features of the backbone network, selectively weighted by attention units, ensuring the efficient fusion of features to provide rich semantic features for small object detection. Lastly, it introduces the Focal Loss loss function, which adjusts the corresponding parameters to enhance the contribution of small objects to the detection model. The ablation experiments show that the mAP of RSAD is 10.6% higher than that of the SSD model, with SAFM providing the highest mAP enhancement of 7.4% and ResNet-50 and Focal Loss providing 1.3% and 1.9% enhancements, respectively. The detection speed is only reduced by 3FPS, but it meets the real-time requirement. Comparison experiments show that in terms of mAP, it is far ahead of Faster R-CNN, Cascade R-CNN, RetinaNet, CenterNet, YOLOv5s, and YOLOv8n, which are the mainstream object detection models; In terms of FPS, it slightly inferior to YOLOv5s and YOLOv8n. Thus, RSAD has a good balance between detection speed and accuracy, and it can facilitate the advancement of the UAV to complete object detection tasks in different scenarios. Full article
Show Figures

Figure 1

19 pages, 6038 KiB  
Article
Remote Sensing Small Object Detection Network Based on Attention Mechanism and Multi-Scale Feature Fusion
by Junsuo Qu, Zongbing Tang, Le Zhang, Yanghai Zhang and Zhenguo Zhang
Remote Sens. 2023, 15(11), 2728; https://doi.org/10.3390/rs15112728 - 24 May 2023
Cited by 27 | Viewed by 4627
Abstract
In remote sensing images, small objects have too few discriminative features, are easily confused with background information, and are difficult to locate, leading to a degradation in detection accuracy when using general object detection networks for aerial images. To solve the above problems, [...] Read more.
In remote sensing images, small objects have too few discriminative features, are easily confused with background information, and are difficult to locate, leading to a degradation in detection accuracy when using general object detection networks for aerial images. To solve the above problems, we propose a remote sensing small object detection network based on the attention mechanism and multi-scale feature fusion, and name it AMMFN. Firstly, a detection head enhancement module (DHEM) was designed to strengthen the characterization of small object features through a combination of multi-scale feature fusion and attention mechanisms. Secondly, an attention mechanism based channel cascade (AMCC) module was designed to reduce the redundant information in the feature layer and protect small objects from information loss during feature fusion. Then, the Normalized Wasserstein Distance (NWD) was introduced and combined with Generalized Intersection over Union (GIoU) as the location regression loss function to improve the optimization weight of the model for small objects and the accuracy of the regression boxes. Finally, an object detection layer was added to improve the object feature extraction ability at different scales. Experimental results from the Unmanned Aerial Vehicles (UAV) dataset VisDrone2021 and the homemade dataset show that the AMMFN improves the APs values by 2.4% and 3.2%, respectively, compared with YOLOv5s, which represents an effective improvement in the detection accuracy of small objects. Full article
(This article belongs to the Special Issue Advances in Radar Systems for Target Detection and Tracking)
Show Figures

Graphical abstract

14 pages, 8575 KiB  
Article
Mask Detection Method Based on YOLO-GBC Network
by Changqing Wang, Bei Zhang, Yuan Cao, Maoxuan Sun, Kunyu He, Zhonghao Cao and Meng Wang
Electronics 2023, 12(2), 408; https://doi.org/10.3390/electronics12020408 - 13 Jan 2023
Cited by 15 | Viewed by 2647
Abstract
For the problems of inaccurate recognition and the high missed detection rate of existing mask detection algorithms in actual scenes, a novel mask detection algorithm based on the YOLO-GBC network is proposed. Specifically, in the backbone network part, the global attention mechanism (GAM) [...] Read more.
For the problems of inaccurate recognition and the high missed detection rate of existing mask detection algorithms in actual scenes, a novel mask detection algorithm based on the YOLO-GBC network is proposed. Specifically, in the backbone network part, the global attention mechanism (GAM) is integrated to improve the ability to extract key information through cross-latitude information interaction. The cross-layer cascade method is adopted to improve the feature pyramid structure to achieve effective bidirectional cross-scale connection and weighted feature fusion. The sampling method of content-aware reassembly of features (CARAFE) is integrated into the feature pyramid network to fully retain the semantic information and global features of the feature map. NMS is replaced with Soft-NMS to improve model prediction frame accuracy by confidence decay method. The experimental results show that the average accuracy (mAP) of the YOLO-GBC reached 91.2% in the mask detection data set, which is 2.3% higher than the baseline YOLOv5, and the detection speed reached 64FPS. The accuracy and recall have also been improved to varying degrees, increasing the detection task of correctly wearing masks. Full article
(This article belongs to the Special Issue Deep Learning Based Object Detection II)
Show Figures

Figure 1

23 pages, 7051 KiB  
Article
LRFFNet: Large Receptive Field Feature Fusion Network for Semantic Segmentation of SAR Images in Building Areas
by Bo Peng, Wenyi Zhang, Yuxin Hu, Qingwei Chu and Qianqian Li
Remote Sens. 2022, 14(24), 6291; https://doi.org/10.3390/rs14246291 - 12 Dec 2022
Cited by 4 | Viewed by 2198
Abstract
There are limited studies on the semantic segmentation of high-resolution synthetic aperture radar (SAR) images in building areas due to speckle noise and geometric distortion. For this challenge, we propose the large receptive field feature fusion network (LRFFNet), which contains a feature extractor, [...] Read more.
There are limited studies on the semantic segmentation of high-resolution synthetic aperture radar (SAR) images in building areas due to speckle noise and geometric distortion. For this challenge, we propose the large receptive field feature fusion network (LRFFNet), which contains a feature extractor, a cascade feature pyramid module (CFP), a large receptive field channel attention module (LFCA), and an auxiliary branch. SAR images only contain single-channel information and have a low signal-to-noise ratio. Using only one level of features extracted by the feature extractor will result in poor segmentation results. Therefore, we design the CFP module; it can integrate different levels of features through multi-path connection. Due to the problem of geometric distortion in SAR images, the structural and semantic information is not obvious. In order to pick out feature channels that are useful for segmentation, we design the LFCA module, which can reassign the weight of channels through the channel attention mechanism with a large receptive field to help the network focus on more effective channels. SAR images do not include color information, and the identification of ground object categories is prone to errors, so we design the auxiliary branch. The branch uses the full convolution structure to optimize training results and reduces the phenomenon of recognizing objects outside the building area as buildings. Compared with state-of-the-art (SOTA) methods, our proposed network achieves higher scores in evaluation indicators and shows excellent competitiveness. Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis)
Show Figures

Figure 1

21 pages, 5333 KiB  
Article
An Approach to Accurate Ship Image Recognition in a Complex Maritime Transportation Environment
by Meng Yu, Shaojie Han, Tengfei Wang and Haiyan Wang
J. Mar. Sci. Eng. 2022, 10(12), 1903; https://doi.org/10.3390/jmse10121903 - 5 Dec 2022
Cited by 11 | Viewed by 2704
Abstract
In order to monitor traffic in congested waters, permanent video stations are now commonly used on interior riverbank bases. It is frequently challenging to identify ships properly and effectively in such images because of the intricate backdrop scenery and overlap between ships brought [...] Read more.
In order to monitor traffic in congested waters, permanent video stations are now commonly used on interior riverbank bases. It is frequently challenging to identify ships properly and effectively in such images because of the intricate backdrop scenery and overlap between ships brought on by the fixed camera location. This work proposes Ship R-CNN(SR-CNN), a Faster R-CNN-based ship target identification algorithm with improved feature fusion and non-maximum suppression (NMS). The SR-CNN approach can produce more accurate target prediction frames for prediction frames with distance intersection over union (DIOU) larger than a specific threshold in the same class weighted by confidence scores, which can enhance the model’s detection ability in ship-dense conditions. The SR-CNN approach in NMS replaces the intersection over union (IOU) filtering criterion, which solely takes into account the overlap of prediction frames, while DIOU, also takes into account the centroid distance. The screening procedure in NMS, which is based on a greedy method, is then improved by the SR-CNN technique by including a confidence decay function. In order to generate more precise target prediction frames and enhance the model’s detection performance in ship-dense scenarios, the proposed SR-CNN technique weights prediction frames in the same class with DIOU greater than a predetermined threshold by the confidence score. Additionally, the SR-CNN methodology uses two feature weighting methods based on the channel domain attention mechanism and regularized weights to provide a more appropriate feature fusion for the issue of a difficult ship from background differentiation in busy waters. By gathering images of ship monitoring, a ship dataset is created to conduct comparative testing. The experimental results demonstrate that, when compared to the three traditional two-stage target detection algorithms Faster R-CNN, Cascade R-CNN, and Libra R-CNN, this paper’s algorithm Ship R-CNN can effectively identify ship targets in the complex background of far-shore scenes where the distinction between the complex background and the ship targets is low. The suggested approach can enhance detection and decrease misses for small ship targets where it is challenging to distinguish between ship targets and complex background objects in a far-shore setting. Full article
(This article belongs to the Special Issue Advances in Maritime Economics and Logistics)
Show Figures

Figure 1

Back to TopTop