Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = dual-gate context blocks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 10745 KB  
Article
CNN-GCN Coordinated Multimodal Frequency Network for Hyperspectral Image and LiDAR Classification
by Haibin Wu, Haoran Lv, Aili Wang, Siqi Yan, Gabor Molnar, Liang Yu and Minhui Wang
Remote Sens. 2026, 18(2), 216; https://doi.org/10.3390/rs18020216 - 9 Jan 2026
Viewed by 221
Abstract
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and [...] Read more.
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and neglect of deep inter-modal interactions in traditional fusion methods, often accompanied by high computational complexity. To address these issues, this paper proposes a comprehensive deep learning framework combining convolutional neural network (CNN), a graph convolutional network (GCN), and wavelet transform for the joint classification of HSI and LiDAR data, including several novel components: a Spectral Graph Mixer Block (SGMB), where a CNN branch captures fine-grained spectral–spatial features by multi-scale convolutions, while a parallel GCN branch models long-range contextual features through an enhanced gated graph network. This dual-path design enables simultaneous extraction of local detail and global topological features from HSI data; a Spatial Coordinate Block (SCB) to enhance spatial awareness and improve the perception of object contours and distribution patterns; a Multi-Scale Elevation Feature Extraction Block (MSFE) for capturing terrain representations across varying scales; and a Bidirectional Frequency Attention Encoder (BiFAE) to enable efficient and deep interaction between multimodal features. These modules are intricately designed to work in concert, forming a cohesive end-to-end framework, which not only achieves a more effective balance between local details and global contexts but also enables deep yet computationally efficient interaction across features, significantly strengthening the discriminability and robustness of the learned representation. To evaluate the proposed method, we conducted experiments on three multimodal remote sensing datasets: Houston2013, Augsburg, and Trento. Quantitative results demonstrate that our framework outperforms state-of-the-art methods, achieving OA values of 98.93%, 88.05%, and 99.59% on the respective datasets. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

24 pages, 4538 KB  
Article
CNN–Transformer-Based Model for Maritime Blurred Target Recognition
by Tianyu Huang, Chao Pan, Jin Liu and Zhiwei Kang
Electronics 2025, 14(17), 3354; https://doi.org/10.3390/electronics14173354 - 23 Aug 2025
Viewed by 861
Abstract
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This [...] Read more.
In maritime blurred image recognition, ship collision accidents frequently result from three primary blur types: (1) motion blur from vessel movement in complex sea conditions, (2) defocus blur due to water vapor refraction, and (3) scattering blur caused by sea fog interference. This paper proposes a dual-branch recognition method specifically designed for motion blur, which represents the most prevalent blur type in maritime scenarios. Conventional approaches exhibit constrained computational efficiency and limited adaptability across different modalities. To overcome these limitations, we propose a hybrid CNN–Transformer architecture: the CNN branch captures local blur characteristics, while the enhanced Transformer module models long-range dependencies via attention mechanisms. The CNN branch employs a lightweight ResNet variant, in which conventional residual blocks are substituted with Multi-Scale Gradient-Aware Residual Block (MSG-ARB). This architecture employs learnable gradient convolution for explicit local gradient feature extraction and utilizes gradient content gating to strengthen blur-sensitive region representation, significantly improving computational efficiency compared to conventional CNNs. The Transformer branch incorporates a Hierarchical Swin Transformer (HST) framework with Shifted Window-based Multi-head Self-Attention for global context modeling. The proposed method incorporates blur invariant Positional Encoding (PE) to enhance blur spectrum modeling capability, while employing DyT (Dynamic Tanh) module with learnable α parameters to replace traditional normalization layers. This architecture achieves a significant reduction in computational costs while preserving feature representation quality. Moreover, it efficiently computes long-range image dependencies using a compact 16 × 16 window configuration. The proposed feature fusion module synergistically integrates CNN-based local feature extraction with Transformer-enabled global representation learning, achieving comprehensive feature modeling across different scales. To evaluate the model’s performance and generalization ability, we conducted comprehensive experiments on four benchmark datasets: VAIS, GoPro, Mini-ImageNet, and Open Images V4. Experimental results show that our method achieves superior classification accuracy compared to state-of-the-art approaches, while simultaneously enhancing inference speed and reducing GPU memory consumption. Ablation studies confirm that the DyT module effectively suppresses outliers and improves computational efficiency, particularly when processing low-quality input data. Full article
Show Figures

Figure 1

24 pages, 2440 KB  
Article
A Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images
by Huazhong Jin, Yizhuo Song, Ting Bai, Kaimin Sun and Yepei Chen
Remote Sens. 2025, 17(14), 2415; https://doi.org/10.3390/rs17142415 - 12 Jul 2025
Viewed by 947
Abstract
Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the [...] Read more.
Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the contextual scope based on the specific characteristics of each target. To address this issue and improve the detection performance of small objects (typically defined as objects with a bounding box area of less than 1024 pixels), we propose a novel backbone network called the Dynamic Context Branch Attention Network (DCBANet). We present the Dynamic Context Scale-Aware (DCSA) Block, which utilizes a multi-branch architecture to generate features with diverse receptive fields. Within each branch, a Context Adaptive Selection Module (CASM) dynamically weights information, allowing the model to focus on the most relevant context. To further enhance performance, we introduce an Efficient Branch Attention (EBA) module that adaptively reweights the parallel branches, prioritizing the most discriminative ones. Finally, to ensure computational efficiency, we design a Dual-Gated Feedforward Network (DGFFN), a lightweight yet powerful replacement for standard FFNs. Extensive experiments conducted on four public remote sensing datasets demonstrate that the DCBANet achieves impressive mAP@0.5 scores of 80.79% on DOTA, 89.17% on NWPU VHR-10, 80.27% on SIMD, and a remarkable 42.4% mAP@0.5:0.95 on the specialized small object benchmark AI-TOD. These results surpass RetinaNet, YOLOF, FCOS, Faster R-CNN, Dynamic R-CNN, SKNet, and Cascade R-CNN, highlighting its effectiveness in detecting small objects in remote sensing images. However, there remains potential for further improvement in multi-scale and weak target detection. Future work will integrate local and global context to enhance multi-scale object detection performance. Full article
(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)
Show Figures

Figure 1

28 pages, 17488 KB  
Article
Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation
by Ali Zakir, Sartaj Ahmed Salman, Gibran Benitez-Garcia and Hiroki Takahashi
Electronics 2025, 14(11), 2107; https://doi.org/10.3390/electronics14112107 - 22 May 2025
Viewed by 1292
Abstract
Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for [...] Read more.
Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for real-time or resource-constrained scenarios. We propose AMFACPose (Attentive Multi-scale Features with Adaptive Context PoseResNet) to address these limitations. Specifically, our architecture incorporates Coordinate Convolution 2D (CoordConv2d) to retain explicit spatial context, alleviating the loss of coordinate information in conventional convolutions. To reduce computational overhead while maintaining accuracy, we utilize Depthwise Separable Convolutions (DSCs), separating spatial and pointwise operations. At the core of our approach is an Adaptive Feature Pyramid Network (AFPN), which replaces costly deconvolution-based upsampling by efficiently aggregating multi-scale features to handle diverse human poses and body sizes. We further introduce Dual-Gate Context Blocks (DGCBs) that refine global context to manage partial occlusions and cluttered backgrounds. The model integrates Squeeze-and-Excitation (SE) blocks and the Spatial–Channel Refinement Module (SCRM) to emphasize the most informative feature channels and spatial regions, which is particularly beneficial for occluded or overlapping keypoints. For precise keypoint localization, we replace dense heatmap predictions with coordinate classification using Multi-Layer Perceptron (MLP) heads. Experiments on the COCO and CrowdPose datasets demonstrate that AMFACPose surpasses the existing 2D HPE methods in both accuracy and computational efficiency. Moreover, our implementation on edge devices achieves real-time performance while preserving high accuracy, confirming the suitability of AMFACPose for resource-constrained pose estimation in both benchmark and real-world environments. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

18 pages, 5652 KB  
Article
LDMNet: Enhancing the Segmentation Capabilities of Unmanned Surface Vehicles in Complex Waterway Scenarios
by Tongyang Dai, Huiyu Xiang, Chongjie Leng, Song Huang, Guanghui He and Shishuo Han
Appl. Sci. 2024, 14(17), 7706; https://doi.org/10.3390/app14177706 - 31 Aug 2024
Cited by 2 | Viewed by 2095
Abstract
Semantic segmentation-based Complex Waterway Scene Understanding has shown great promise in the environmental perception of Unmanned Surface Vehicles. Existing methods struggle with estimating the edges of obstacles under conditions of blurred water surfaces. To address this, we propose the Lightweight Dual-branch Mamba Network [...] Read more.
Semantic segmentation-based Complex Waterway Scene Understanding has shown great promise in the environmental perception of Unmanned Surface Vehicles. Existing methods struggle with estimating the edges of obstacles under conditions of blurred water surfaces. To address this, we propose the Lightweight Dual-branch Mamba Network (LDMNet), which includes a CNN-based Deep Dual-branch Network for extracting image features and a Mamba-based fusion module for aggregating and integrating global information. Specifically, we improve the Deep Dual-branch Network structure by incorporating multiple Atrous branches for local fusion; we design a Convolution-based Recombine Attention Module, which serves as the gate activation condition for Mamba-2 to enhance feature interaction and global information fusion from both spatial and channel dimensions. Moreover, to tackle the directional sensitivity of image serialization and the impact of the State Space Model’s forgetting strategy on non-causal data modeling, we introduce a Hilbert curve scanning mechanism to achieve multi-scale feature serialization. By stacking feature sequences, we alleviate the local bias of Mamba-2 towards image sequence data. LDMNet integrates the Deep Dual-branch Network, Recombine Attention, and Mamba-2 blocks, effectively capturing the long-range dependencies and multi-scale global context information of Complex Waterway Scene images. The experimental results on four benchmarks show that the proposed LDMNet significantly improves obstacle edge segmentation performance and outperforms existing methods across various performance metrics. Full article
(This article belongs to the Section Marine Science and Engineering)
Show Figures

Figure 1

22 pages, 9373 KB  
Article
Single Image Super-Resolution via Wide-Activation Feature Distillation Network
by Zhen Su, Yuze Wang, Xiang Ma, Mang Sun, Deqiang Cheng, Chao Li and He Jiang
Sensors 2024, 24(14), 4597; https://doi.org/10.3390/s24144597 - 16 Jul 2024
Viewed by 2290
Abstract
Feature extraction plays a pivotal role in the context of single image super-resolution. Nonetheless, relying on a single feature extraction method often undermines the full potential of feature representation, hampering the model’s overall performance. To tackle this issue, this study introduces the wide-activation [...] Read more.
Feature extraction plays a pivotal role in the context of single image super-resolution. Nonetheless, relying on a single feature extraction method often undermines the full potential of feature representation, hampering the model’s overall performance. To tackle this issue, this study introduces the wide-activation feature distillation network (WFDN), which realizes single image super-resolution through dual-path learning. Initially, a dual-path parallel network structure is employed, utilizing a residual network as the backbone and incorporating global residual connections to enhance feature exploitation and expedite network convergence. Subsequently, a feature distillation block is adopted, characterized by fast training speed and a low parameter count. Simultaneously, a wide-activation mechanism is integrated to further enhance the representational capacity of high-frequency features. Lastly, a gated fusion mechanism is introduced to weight the fusion of feature information extracted from the dual branches. This mechanism enhances reconstruction performance while mitigating information redundancy. Extensive experiments demonstrate that the proposed algorithm achieves stable and superior results compared to the state-of-the-art methods, as evidenced by quantitative evaluation metrics tests conducted on four benchmark datasets. Furthermore, our WFDN excels in reconstructing images with richer detailed textures, more realistic lines, and clearer structures, affirming its exceptional superiority and robustness. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 4963 KB  
Article
An Efficient Image Deblurring Network with a Hybrid Architecture
by Mingju Chen, Sihang Yi, Zhongxiao Lan and Zhengxu Duan
Sensors 2023, 23(16), 7260; https://doi.org/10.3390/s23167260 - 18 Aug 2023
Cited by 11 | Viewed by 4022
Abstract
Blurring is one of the main degradation factors in image degradation, so image deblurring is of great interest as a fundamental problem in low-level computer vision. Because of the limited receptive field, traditional CNNs lack global fuzzy region modeling, and do not make [...] Read more.
Blurring is one of the main degradation factors in image degradation, so image deblurring is of great interest as a fundamental problem in low-level computer vision. Because of the limited receptive field, traditional CNNs lack global fuzzy region modeling, and do not make full use of rich context information between features. Recently, a transformer-based neural network structure has performed well in natural language tasks, inspiring rapid development in the field of defuzzification. Therefore, in this paper, a hybrid architecture based on CNN and transformers is used for image deblurring. Specifically, we first extract the shallow features of the blurred images using a cross-layer feature fusion block that emphasizes the contextual information of each feature extraction layer. Secondly, an efficient transformer module for extracting deep features is designed, which fully aggregates feature information at medium and long distances using vertical and horizontal intra- and inter-strip attention layers, and a dual gating mechanism is used as a feedforward neural network, which effectively reduces redundant features. Finally, the cross-layer feature fusion block is used to complement the feature information to obtain the deblurred image. Extensive experimental results on publicly available benchmark datasets GoPro, HIDE, and the real dataset RealBlur show that the proposed method outperforms the current mainstream deblurring algorithms and recovers the edge contours and texture details of the images more clearly. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop