Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (18)

Search Parameters:
Keywords = Pyramid Split Attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 4122 KB  
Article
Development of a Tool to Detect Open-Mouthed Respiration in Caged Broilers
by Yali Ma, Yongmin Guo, Bin Gao, Pengshen Zheng and Changxi Chen
Animals 2025, 15(18), 2732; https://doi.org/10.3390/ani15182732 - 18 Sep 2025
Viewed by 413
Abstract
Open-mouth panting in broiler chickens is a visible and critical indicator of heat stress and compromised welfare. However, detecting this behavior in densely populated cages is challenging due to the small size of the target and frequent occlusions and cluttered backgrounds. To overcome [...] Read more.
Open-mouth panting in broiler chickens is a visible and critical indicator of heat stress and compromised welfare. However, detecting this behavior in densely populated cages is challenging due to the small size of the target and frequent occlusions and cluttered backgrounds. To overcome these issues, we proposed an enhanced object detection method based on the lightweight YOLOv8n framework, incorporating four key improvements. First, we add a dedicated P2 detection head to improve the recognition of small targets. Second, a space-to-depth grouped convolution module (SGConv) is introduced to capture fine-grained texture and edge features crucial for panting identification. Third, a bidirectional feature pyramid network (BIFPN) merges multi-scale feature maps for richer representations. Finally, a squeeze-and-excitation (SE) channel attention mechanism emphasizes mouth-related cues while suppressing irrelevant background noise. We trained and evaluated the method on a comprehensive, full-cycle broiler panting dataset covering all growth stages. Experimental results show that our method significantly outperforms baseline YOLO models, achieving 0.92 mAP@50 (independent test set) and 0.927 mAP@50 (leakage-free retraining), confirming strong generalizability while maintaining real-time performance. The initial evaluation had data partitioning limitations; method generalizability is now dually validated through both independent testing and rigorous split-then-augment retraining. This approach provides a practical tool for intelligent broiler welfare monitoring and heat stress management, contributing to improved environmental control and animal well-being. Full article
(This article belongs to the Section Poultry)
Show Figures

Figure 1

26 pages, 23082 KB  
Article
SPyramidLightNet: A Lightweight Shared Pyramid Network for Efficient Underwater Debris Detection
by Yi Luo and Osama Eljamal
Appl. Sci. 2025, 15(17), 9404; https://doi.org/10.3390/app15179404 - 27 Aug 2025
Viewed by 569
Abstract
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, [...] Read more.
Underwater debris detection plays a crucial role in marine environmental protection. However, existing object detection algorithms generally suffer from excessive model complexity and insufficient detection accuracy, making it difficult to meet the real-time detection requirements in resource-constrained underwater environments. To address this challenge, this paper proposes a novel lightweight object detection network named the Shared Pyramid Lightweight Network (SPyramidLightNet). The network adopts an improved architecture based on YOLOv11 and achieves an optimal balance between detection performance and computational efficiency by integrating three core innovative modules. First, the Split–Merge Attention Block (SMAB) employs a dynamic kernel selection mechanism and split–merge strategy, significantly enhancing feature representation capability through adaptive multi-scale feature fusion. Second, the C3 GroupNorm Detection Head (C3GNHead) introduces a shared convolution mechanism and GroupNorm normalization strategy, substantially reducing the computational complexity of the detection head while maintaining detection accuracy. Finally, the Shared Pyramid Convolution (SPyramidConv) replaces traditional pooling operations with a parameter-sharing multi-dilation-rate convolution architecture, achieving more refined and efficient multi-scale feature aggregation. Extensive experiments on underwater debris datasets demonstrate that SPyramidLightNet achieves 0.416 on the mAP@0.5:0.95 metric, significantly outperforming mainstream algorithms including Faster-RCNN, SSD, RT-DETR, and the YOLO series. Meanwhile, compared to the baseline YOLOv11, the proposed algorithm achieves an 11.8% parameter compression and a 17.5% computational complexity reduction, with an inference speed reaching 384 FPS, meeting the stringent requirements for real-time detection. Ablation experiments and visualization analyses further validate the effectiveness and synergistic effects of each core module. This research provides important theoretical guidance for the design of lightweight object detection algorithms and lays a solid foundation for the development of automated underwater debris recognition and removal technologies. Full article
Show Figures

Figure 1

34 pages, 18851 KB  
Article
Dual-Branch Multi-Dimensional Attention Mechanism for Joint Facial Expression Detection and Classification
by Cheng Peng, Bohao Li, Kun Zou, Bowen Zhang, Genan Dai and Ah Chung Tsoi
Sensors 2025, 25(12), 3815; https://doi.org/10.3390/s25123815 - 18 Jun 2025
Viewed by 695
Abstract
This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network [...] Read more.
This paper addresses the central issue arising from the (SDAC) of facial expressions, namely, to balance the competing demands of good global features for detection, and fine features for good facial expression classifications by replacing the feature extraction part of the “neck” network in the feature pyramid network in the You Only Look Once X (YOLOX) framework with a novel architecture involving three attention mechanisms—batch, channel, and neighborhood—which respectively explores the three input dimensions—batch, channel, and spatial. Correlations across a batch of images in the individual path of the dual incoming paths are first extracted by a self attention mechanism in the batch dimension; these two paths are fused together to consolidate their information and then split again into two separate paths; the information along the channel dimension is extracted using a generalized form of channel attention, an adaptive graph channel attention, which provides each element of the incoming signal with a weight that is adapted to the incoming signal. The combination of these two paths, together with two skip connections from the input to the batch attention to the output of the adaptive channel attention, then passes into a residual network, with neighborhood attention to extract fine features in the spatial dimension. This novel dual path architecture has been shown experimentally to achieve a better balance between the competing demands in an SDAC problem than other competing approaches. Ablation studies enable the determination of the relative importance of these three attention mechanisms. Competitive results are obtained on two non-aligned face expression recognition datasets, RAF-DB and SFEW, when compared with other state-of-the-art methods. Full article
Show Figures

Figure 1

23 pages, 2354 KB  
Article
A Generic Image Steganography Recognition Scheme with Big Data Matching and an Improved ResNet50 Deep Learning Network
by Xuefeng Gao, Junkai Yi, Lin Liu and Lingling Tan
Electronics 2025, 14(8), 1610; https://doi.org/10.3390/electronics14081610 - 16 Apr 2025
Cited by 1 | Viewed by 1148
Abstract
Image steganalysis has been a key technology in information security in recent years. However, existing methods are mostly limited to the binary classification for detecting steganographic images used in digital watermarking, privacy protection, illicit data concealment, and security images, such as unaltered cover [...] Read more.
Image steganalysis has been a key technology in information security in recent years. However, existing methods are mostly limited to the binary classification for detecting steganographic images used in digital watermarking, privacy protection, illicit data concealment, and security images, such as unaltered cover images or surveillance images. They cannot identify the steganography algorithms used in steganographic images, which restricts their practicality. To solve this problem, this paper proposes a general steganography algorithms recognition scheme based on image big data matching with improved ResNet50. The scheme first intercepts the image region with the highest complexity and focuses on the key features to improve the analysis efficiency; subsequently, the original image of the image to be detected is accurately located by the image big data matching technique and the steganographic difference feature image is generated; finally, the ResNet50 is improved by combining the pyramid attention mechanism and the joint loss function, which achieves the efficient recognition of the steganography algorithm. To verify the feasibility and effectiveness of the scheme, three experiments are designed in this paper: verification of the selection of the core analysis region, verification of the image similarity evaluation based on Peak Signal-to-Noise Ratio (PSNR), and performance verification of the improved ResNet50 model. The experimental results show that the scheme proposed in this paper outperforms the existing mainstream steganalysis models, such as ZhuNet and YeNet, with a detection accuracy of 96.11%, supports the recognition of six adaptive steganography algorithms, and adapts to the needs of analysis of multiple sizes and image formats, demonstrating excellent versatility and application value. Full article
(This article belongs to the Special Issue AI-Based Solutions for Cybersecurity)
Show Figures

Graphical abstract

31 pages, 25698 KB  
Article
Detection of Cotter Pin Defects in Transmission Lines Based on Improved YOLOv8
by Peng Wang, Guowu Yuan, Zhiqin Zhang, Junlin Rao, Yi Ma and Hao Zhou
Electronics 2025, 14(7), 1360; https://doi.org/10.3390/electronics14071360 - 28 Mar 2025
Viewed by 729
Abstract
The cotter pin is a critical component in power transmission lines, as it prevents the loosening or detachment of nuts at essential locations. Therefore, detecting defects in cotter pins is vital for monitoring and diagnosing faults in power transmission systems. Due to environmental [...] Read more.
The cotter pin is a critical component in power transmission lines, as it prevents the loosening or detachment of nuts at essential locations. Therefore, detecting defects in cotter pins is vital for monitoring and diagnosing faults in power transmission systems. Due to environmental factors and human errors, cotter pins are susceptible to loosening and becoming missing. In split pin detection, the primary challenges lie in the small size of the target features and the fine-grained issue of “small inter-class differences and large intra-class variations”. This paper aims to enhance the detection performance of the model for fine-grained small targets by adding a detection head specifically designed for small objects and embedding an attention mechanism. This paper addresses the detection of looseness and missing defects in cotter pins by proposing a target detection model called PMW-YOLOv8 (P-C2f + MCA + WIOU) based on the YOLOv8 framework. The model introduces a specialized small-target detection head (160 × 160), which forms a four-scale pyramid (P2–P5) through cross-layer aggregation, effectively utilizing shallow features. Additionally, it incorporates a multidimensional collaborative attention (MCA) module to enhance the features transmitted to the detection head. To further address the fine-grained feature extraction problem, a polarization self-attention mechanism is integrated into C2f, leading to the proposed P-C2f module. Finally, the WIOU loss function is applied to the model to mitigate the impact of sample quality fluctuations on training. Experiments were conducted on a cotter pin defect dataset to validate the model’s effectiveness, achieving a detection accuracy of 66.3%, an improvement of 3% over YOLOv8. The experimental results demonstrate that our model exhibits strong robustness and generalization, enabling it to extract more profound and comprehensive features. Full article
Show Figures

Figure 1

23 pages, 3884 KB  
Article
Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems
by Hao Li and Lixin Zheng
Sensors 2024, 24(24), 7958; https://doi.org/10.3390/s24247958 - 13 Dec 2024
Cited by 2 | Viewed by 1311
Abstract
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping [...] Read more.
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model’s ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels. In tests on the Cornell dataset, our network achieved grasping pose prediction at a speed of 66.7 frames per second, with accuracy rates of 98.6% and 96.9% for image-wise and object-wise splits, respectively. The experimental results show that our method achieves high-speed processing while maintaining high accuracy. In real-world robotic grasping experiments, our method also proved to be effective, achieving an average grasping success rate of 95.6% on a robot equipped with parallel grippers. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

13 pages, 3626 KB  
Article
A Lightweight Deep Learning Network with an Optimized Attention Module for Aluminum Surface Defect Detection
by Yizhe Li, Yidong Xie and Hu He
Sensors 2024, 24(23), 7691; https://doi.org/10.3390/s24237691 - 30 Nov 2024
Cited by 2 | Viewed by 1569
Abstract
Aluminum is extensively utilized in the aerospace, aviation, automotive, and other industries. The presence of surface defects on aluminum has a significant impact on product quality. However, traditional detection methods fail to meet the efficiency and accuracy requirements of industrial practices. In this [...] Read more.
Aluminum is extensively utilized in the aerospace, aviation, automotive, and other industries. The presence of surface defects on aluminum has a significant impact on product quality. However, traditional detection methods fail to meet the efficiency and accuracy requirements of industrial practices. In this study, we propose an innovative aluminum surface defect detection method based on an optimized two-stage Faster R-CNN network. A 2D camera serves as the image sensor, capturing high-resolution images in real time. Optimized lighting and focus ensure that defect features are clearly visible. After preprocessing, the images are fed into a deep learning network incorporated with a multi-scale feature pyramid structure, which effectively enhances defect recognition accuracy by integrating high-level semantic information with location details. Additionally, we introduced an optimized Convolutional Block Attention Module (CBAM) to further enhance network efficiency. Furthermore, we employed the genetic K-means algorithm to optimize prior region selection, and a lightweight Ghost model to reduce network complexity by 14.3%, demonstrating the superior performance of the Ghost model in terms of loss function optimization during training and validation as well as in terms of detection accuracy, speed, and stability. The network was trained on a dataset of 3200 images captured by the image sensor, split in an 8:1:1 ratio for training, validation, and testing, respectively. The experimental results show a mean Average Precision (mAP) of 94.25%, with individual Average Precision (AP) values exceeding 80%, meeting industrial standards for defect detection. Full article
(This article belongs to the Special Issue Sensing and Imaging for Defect Detection: 2nd Edition)
Show Figures

Figure 1

16 pages, 22655 KB  
Article
LightCF-Net: A Lightweight Long-Range Context Fusion Network for Real-Time Polyp Segmentation
by Zhanlin Ji, Xiaoyu Li, Jianuo Liu, Rui Chen, Qinping Liao, Tao Lyu and Li Zhao
Bioengineering 2024, 11(6), 545; https://doi.org/10.3390/bioengineering11060545 - 27 May 2024
Cited by 13 | Viewed by 2498
Abstract
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based [...] Read more.
Automatically segmenting polyps from colonoscopy videos is crucial for developing computer-assisted diagnostic systems for colorectal cancer. Existing automatic polyp segmentation methods often struggle to fulfill the real-time demands of clinical applications due to their substantial parameter count and computational load, especially those based on Transformer architectures. To tackle these challenges, a novel lightweight long-range context fusion network, named LightCF-Net, is proposed in this paper. This network attempts to model long-range spatial dependencies while maintaining real-time performance, to better distinguish polyps from background noise and thus improve segmentation accuracy. A novel Fusion Attention Encoder (FAEncoder) is designed in the proposed network, which integrates Large Kernel Attention (LKA) and channel attention mechanisms to extract deep representational features of polyps and unearth long-range dependencies. Furthermore, a newly designed Visual Attention Mamba module (VAM) is added to the skip connections, modeling long-range context dependencies in the encoder-extracted features and reducing background noise interference through the attention mechanism. Finally, a Pyramid Split Attention module (PSA) is used in the bottleneck layer to extract richer multi-scale contextual features. The proposed method was thoroughly evaluated on four renowned polyp segmentation datasets: Kvasir-SEG, CVC-ClinicDB, BKAI-IGH, and ETIS. Experimental findings demonstrate that the proposed method delivers higher segmentation accuracy in less time, consistently outperforming the most advanced lightweight polyp segmentation networks. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

15 pages, 6287 KB  
Article
Research on Improved Road Visual Navigation Recognition Method Based on DeepLabV3+ in Pitaya Orchard
by Lixue Zhu, Wenqian Deng, Yingjie Lai, Xiaogeng Guo and Shiang Zhang
Agronomy 2024, 14(6), 1119; https://doi.org/10.3390/agronomy14061119 - 24 May 2024
Cited by 5 | Viewed by 1624
Abstract
Traditional DeepLabV3+ image semantic segmentation methods face challenges in pitaya orchard environments characterized by multiple interference factors, complex image backgrounds, high computational complexity, and extensive memory consumption. This paper introduces an improved visual navigation path recognition method for pitaya orchards. Initially, DeepLabV3+ utilizes [...] Read more.
Traditional DeepLabV3+ image semantic segmentation methods face challenges in pitaya orchard environments characterized by multiple interference factors, complex image backgrounds, high computational complexity, and extensive memory consumption. This paper introduces an improved visual navigation path recognition method for pitaya orchards. Initially, DeepLabV3+ utilizes a lightweight MobileNetV2 as its primary feature extraction backbone, which is augmented with a Pyramid Split Attention (PSA) module placed after the Atrous Spatial Pyramid Pooling (ASPP) module. This improvement enhances the spatial feature representation of feature maps, thereby sharpening the segmentation boundaries. Additionally, an Efficient Channel Attention Network (ECANet) mechanism is integrated with the lower-level features of MobileNetV2 to reduce computational complexity and refine the clarity of target boundaries. The paper also designs a navigation path extraction algorithm, which fits the road mask regions segmented by the model to achieve precise navigation path recognition. Experimental findings show that the enhanced DeepLabV3+ model achieved a Mean Intersection over Union (MIoU) and average pixel accuracy of 95.79% and 97.81%, respectively. These figures represent increases of 0.59 and 0.41 percentage points when contrasted with the original model. Furthermore, the model’s memory consumption is reduced by 85.64%, 84.70%, and 85.06% when contrasted with the Pyramid Scene Parsing Network (PSPNet), U-Net, and Fully Convolutional Network (FCN) models, respectively. This reduction makes the proposed model more efficient while maintaining high segmentation accuracy, thus supporting enhanced operational efficiency in practical applications. The test results for navigation path recognition accuracy reveal that the angle error between the navigation centerline extracted using the least squares method and the manually fitted centerline is less than 5°. Additionally, the average deviation between the road centerlines extracted under three different lighting conditions and the actual road centerline is only 2.66 pixels, with an average image recognition time of 0.10 s. This performance suggests that the study can provide an effective reference for visual navigation in smart agriculture. Full article
(This article belongs to the Special Issue The Applications of Deep Learning in Smart Agriculture)
Show Figures

Figure 1

12 pages, 2615 KB  
Article
Automatic Recognition of Blood Cell Images with Dense Distributions Based on a Faster Region-Based Convolutional Neural Network
by Yun Liu, Yumeng Liu, Menglu Chen, Haoxing Xue, Xiaoqiang Wu, Linqi Shui, Junhong Xing, Xian Wang, Hequn Li and Mingxing Jiao
Appl. Sci. 2023, 13(22), 12412; https://doi.org/10.3390/app132212412 - 16 Nov 2023
Viewed by 1418
Abstract
In modern clinical medicine, the important information of red blood cells, such as shape and number, is applied to detect blood diseases. However, the automatic recognition problem of single cells and adherent cells always exists in a densely distributed medical scene, which is [...] Read more.
In modern clinical medicine, the important information of red blood cells, such as shape and number, is applied to detect blood diseases. However, the automatic recognition problem of single cells and adherent cells always exists in a densely distributed medical scene, which is difficult to solve for both the traditional detection algorithms with lower recognition rates and the conventional networks with weaker feature extraction capabilities. In this paper, an automatic recognition method of adherent blood cells with dense distribution is proposed. Based on the Faster R-CNN, the balanced feature pyramid structure, deformable convolution network, and efficient pyramid split attention mechanism are adopted to automatically recognize the blood cells under the conditions of dense distribution, extrusion deformation, adhesion and overlap. In addition, the Align algorithm for region of interest also contributes to improving the accuracy of recognition results. The experimental results show that the mean average precision of cell detection is 0.895, which is 24.5% higher than that of the original network model. Compared with the one-stage mainstream networks, the presented network has a stronger feature extraction capability. The proposed method is suitable for identifying single cells and adherent cells with dense distribution in the actual medical scene. Full article
(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)
Show Figures

Figure 1

35 pages, 8889 KB  
Article
JO-TADP: Learning-Based Cooperative Dynamic Resource Allocation for MEC–UAV-Enabled Wireless Network
by Shabeer Ahmad, Jinling Zhang, Adil Khan, Umar Ajaib Khan and Babar Hayat
Drones 2023, 7(5), 303; https://doi.org/10.3390/drones7050303 - 4 May 2023
Cited by 16 | Viewed by 3367
Abstract
Providing robust communication services to mobile users (MUs) is a challenging task due to the dynamicity of MUs. Unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) are used to improve connectivity by allocating resources to MUs more efficiently in a dynamic environment. [...] Read more.
Providing robust communication services to mobile users (MUs) is a challenging task due to the dynamicity of MUs. Unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) are used to improve connectivity by allocating resources to MUs more efficiently in a dynamic environment. However, energy consumption and lifetime issues in UAVs severely limit the resources and communication services. In this paper, we propose a dynamic cooperative resource allocation scheme for MEC–UAV-enabled wireless networks called joint optimization of trajectory, altitude, delay, and power (JO-TADP) using anarchic federated learning (AFL) and other learning algorithms to enhance data rate, use rate, and resource allocation efficiency. Initially, the MEC–UAVs are optimally positioned based on the MU density using the beluga whale optimization (BLWO) algorithm. Optimal clustering is performed in terms of splitting and merging using the triple-mode density peak clustering (TM-DPC) algorithm based on user mobility. Moreover, the trajectory, altitude, and hovering time of MEC–UAVs are predicted and optimized using the self-simulated inner attention long short-term memory (SSIA-LSTM) algorithm. Finally, the MUs and MEC–UAVs play auction games based on the classified requests, using an AFL-based cross-scale attention feature pyramid network (CSAFPN) and enhanced deep Q-learning (EDQN) algorithms for dynamic resource allocation. To validate the proposed approach, our system model has been simulated in Network Simulator 3.26 (NS-3.26). The results demonstrate that the proposed work outperforms the existing works in terms of connectivity, energy efficiency, resource allocation, and data rate. Full article
Show Figures

Figure 1

18 pages, 4610 KB  
Article
Lightweight Tennis Ball Detection Algorithm Based on Robomaster EP
by Yuan Zhao, Ling Lu, Wu Yang, Qizheng Li and Xiujie Zhang
Appl. Sci. 2023, 13(6), 3461; https://doi.org/10.3390/app13063461 - 8 Mar 2023
Cited by 7 | Viewed by 4577
Abstract
To address the problems of poor recognition effect, low detection accuracy, many model parameters and computation, complex network structure, and unfavorable portability to embedded devices in traditional tennis ball detection algorithms, this study proposes a lightweight tennis ball detection algorithm, YOLOv5s-Z, based on [...] Read more.
To address the problems of poor recognition effect, low detection accuracy, many model parameters and computation, complex network structure, and unfavorable portability to embedded devices in traditional tennis ball detection algorithms, this study proposes a lightweight tennis ball detection algorithm, YOLOv5s-Z, based on the YOLOv5s algorithm and Robomater EP. The main work is as follows: firstly, the lightweight network G-Backbone and G-Neck network layers are constructed to reduce the number of parameters and computation of the network structure. Secondly, convolutional coordinate attention is incorporated into the G-Backbone to embed location information into channel attention, which enables the network to obtain location information of a larger area through multiple convolutions and enhances the expression ability of mobile network learning features. In addition, the Concat module in the original feature fusion is modified into a weighted bi-directional feature pyramid W-BiFPN with settable learning weights to improve the feature fusion capability and achieve efficient weighted feature fusion and bi-directional cross-scale connectivity. Finally, the Loss function EIOU Loss is introduced to split the influence factor of the aspect ratio and calculate the length and width of the target frame and anchor frame, respectively, combined with Focal-EIOU Loss to solve the problem of imbalance between complex and easy samples. Meta-ACON’s activation function is introduced to achieve an adaptive selection of whether to activate neurons and improve the detection accuracy. The experimental results show that compared with the YOLOv5s algorithm, the YOLOv5s-Z algorithm reduces the number of parameters and computation by 42% and 44%, respectively, reduces the model size by 39%, and improves the mean accuracy by 2%, verifying the effectiveness of the improved algorithm and the lightweight of the model, adapting to Robomaster EP, and meeting the deployment requirements of embedded devices for the detection and identification of tennis balls. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Deep Learning)
Show Figures

Figure 1

16 pages, 3521 KB  
Article
GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion
by Jinxin Wang, Xiaoli Xi, Dongmei Li, Fang Li and Guanxin Zhang
Entropy 2023, 25(1), 169; https://doi.org/10.3390/e25010169 - 14 Jan 2023
Cited by 10 | Viewed by 2782
Abstract
Multimodal image fusion aims to retain valid information from different modalities, remove redundant information to highlight critical targets, and maintain rich texture details in the fused image. However, current image fusion networks only use simple convolutional layers to extract features, ignoring global dependencies [...] Read more.
Multimodal image fusion aims to retain valid information from different modalities, remove redundant information to highlight critical targets, and maintain rich texture details in the fused image. However, current image fusion networks only use simple convolutional layers to extract features, ignoring global dependencies and channel contexts. This paper proposes GRPAFusion, a multimodal image fusion framework based on gradient residual and pyramid attention. The framework uses multiscale gradient residual blocks to extract multiscale structural features and multigranularity detail features from the source image. The depth features from different modalities were adaptively corrected for inter-channel responses using a pyramid split attention module to generate high-quality fused images. Experimental results on public datasets indicated that GRPAFusion outperforms the current fusion methods in subjective and objective evaluations. Full article
(This article belongs to the Special Issue Advances in Image Fusion)
Show Figures

Figure 1

15 pages, 9839 KB  
Article
Low-Light Image Enhancement Using Photometric Alignment with Hierarchy Pyramid Network
by Jing Ye, Xintao Chen, Changzhen Qiu and Zhiyong Zhang
Sensors 2022, 22(18), 6799; https://doi.org/10.3390/s22186799 - 8 Sep 2022
Cited by 2 | Viewed by 3661
Abstract
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. [...] Read more.
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. In this paper, inspired by a coarse-to-fine strategy, we propose an end-to-end image-level alignment with pixel-wise perceptual information enhancement pipeline for low-light image enhancement. A coarse adaptive global photometric alignment sub-network is constructed to reduce style differences, which facilitates improving illumination and revealing under-exposure area information. After the learned aligned image, a hierarchy pyramid enhancement sub-network is used to optimize image quality, which helps to remove amplified noise and enhance the local detail of low-light images. We also propose a multi-residual cascade attention block (MRCAB) that involves channel split and concatenation strategy, polarized self-attention mechanism, which leads to high-resolution reconstruction images in perceptual quality. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods in detail and color reproduction. Full article
Show Figures

Figure 1

18 pages, 1334 KB  
Article
Residual-Shuffle Network with Spatial Pyramid Pooling Module for COVID-19 Screening
by Mohd Asyraf Zulkifley, Siti Raihanah Abdani, Nuraisyah Hani Zulkifley and Mohamad Ibrani Shahrimin
Diagnostics 2021, 11(8), 1497; https://doi.org/10.3390/diagnostics11081497 - 19 Aug 2021
Cited by 5 | Viewed by 3084
Abstract
Since the start of the COVID-19 pandemic at the end of 2019, more than 170 million patients have been infected with the virus that has resulted in more than 3.8 million deaths all over the world. This disease is easily spreadable from one [...] Read more.
Since the start of the COVID-19 pandemic at the end of 2019, more than 170 million patients have been infected with the virus that has resulted in more than 3.8 million deaths all over the world. This disease is easily spreadable from one person to another even with minimal contact, even more for the latest mutations that are more deadly than its predecessor. Hence, COVID-19 needs to be diagnosed as early as possible to minimize the risk of spreading among the community. However, the laboratory results on the approved diagnosis method by the World Health Organization, the reverse transcription-polymerase chain reaction test, takes around a day to be processed, where a longer period is observed in the developing countries. Therefore, a fast screening method that is based on existing facilities should be developed to complement this diagnosis test, so that a suspected patient can be isolated in a quarantine center. In line with this motivation, deep learning techniques were explored to provide an automated COVID-19 screening system based on X-ray imaging. This imaging modality is chosen because of its low-cost procedures that are widely available even in many small clinics. A new convolutional neural network (CNN) model is proposed instead of utilizing pre-trained networks of the existing models. The proposed network, Residual-Shuffle-Net, comprises four stacks of the residual-shuffle unit followed by a spatial pyramid pooling (SPP) unit. The architecture of the residual-shuffle unit follows an hourglass design with reduced convolution filter size in the middle layer, where a shuffle operation is performed right after the split branches have been concatenated back. Shuffle operation forces the network to learn multiple sets of features relationship across various channels instead of a set of global features. The SPP unit, which is placed at the end of the network, allows the model to learn multi-scale features that are crucial to distinguish between the COVID-19 and other types of pneumonia cases. The proposed network is benchmarked with 12 other state-of-the-art CNN models that have been designed and tuned specially for COVID-19 detection. The experimental results show that the Residual-Shuffle-Net produced the best performance in terms of accuracy and specificity metrics with 0.97390 and 0.98695, respectively. The model is also considered as a lightweight model with slightly more than 2 million parameters, which makes it suitable for mobile-based applications. For future work, an attention mechanism can be integrated to target certain regions of interest in the X-ray images that are deemed to be more informative for COVID-19 diagnosis. Full article
Show Figures

Figure 1

Back to TopTop