Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = interactive dual-channel encoder

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 5237 KB  
Article
DCA-UNet: A Cross-Modal Ginkgo Crown Recognition Method Based on Multi-Source Data
by Yunzhi Guo, Yang Yu, Yan Li, Mengyuan Chen, Wenwen Kong, Yunpeng Zhao and Fei Liu
Plants 2026, 15(2), 249; https://doi.org/10.3390/plants15020249 - 13 Jan 2026
Viewed by 276
Abstract
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying [...] Read more.
Wild ginkgo, as an endangered species, holds significant value for genetic resource conservation, yet its practical applications face numerous challenges. Traditional field surveys are inefficient in mountainous mixed forests, while satellite remote sensing is limited by spatial resolution. Current deep learning approaches relying on single-source data or merely simple multi-source fusion fail to fully exploit information, leading to suboptimal recognition performance. This study presents a multimodal ginkgo crown dataset, comprising RGB and multispectral images acquired by an UAV platform. To achieve precise crown segmentation with this data, we propose a novel dual-branch dynamic weighting fusion network, termed dual-branch cross-modal attention-enhanced UNet (DCA-UNet). We design a dual-branch encoder (DBE) with a two-stream architecture for independent feature extraction from each modality. We further develop a cross-modal interaction fusion module (CIF), employing cross-modal attention and learnable dynamic weights to boost multi-source information fusion. Additionally, we introduce an attention-enhanced decoder (AED) that combines progressive upsampling with a hybrid channel-spatial attention mechanism, thereby effectively utilizing multi-scale features and enhancing boundary semantic consistency. Evaluation on the ginkgo dataset demonstrates that DCA-UNet achieves a segmentation performance of 93.42% IoU (Intersection over Union), 96.82% PA (Pixel Accuracy), 96.38% Precision, and 96.60% F1-score. These results outperform differential feature attention fusion network (DFAFNet) by 12.19%, 6.37%, 4.62%, and 6.95%, respectively, and surpasses the single-modality baselines (RGB or multispectral) in all metrics. Superior performance on cross-flight-altitude data further validates the model’s strong generalization capability and robustness in complex scenarios. These results demonstrate the superiority of DCA-UNet in UAV-based multimodal ginkgo crown recognition, offering a reliable and efficient solution for monitoring wild endangered tree species. Full article
(This article belongs to the Special Issue Advanced Remote Sensing and AI Techniques in Agriculture and Forestry)
Show Figures

Figure 1

19 pages, 5302 KB  
Article
LSSCC-Net: Integrating Spatial-Feature Aggregation and Adaptive Attention for Large-Scale Point Cloud Semantic Segmentation
by Wenbo Wang, Xianghong Hua, Cheng Li, Pengju Tian, Yapeng Wang and Lechao Liu
Symmetry 2026, 18(1), 124; https://doi.org/10.3390/sym18010124 - 8 Jan 2026
Viewed by 256
Abstract
Point cloud semantic segmentation is a key technology for applications such as autonomous driving, robotics, and virtual reality. Current approaches are heavily reliant on local relative coordinates and simplistic attention mechanisms to aggregate neighborhood information. This often leads to an ineffective joint representation [...] Read more.
Point cloud semantic segmentation is a key technology for applications such as autonomous driving, robotics, and virtual reality. Current approaches are heavily reliant on local relative coordinates and simplistic attention mechanisms to aggregate neighborhood information. This often leads to an ineffective joint representation of geometric perturbations and feature variations, coupled with a lack of adaptive selection for salient features during context fusion. On this basis, we propose LSSCC-Net, a novel segmentation framework based on LACV-Net. First, the spatial-feature dynamic aggregation module is designed to fuse offset information by symmetric interaction between spatial positions and feature channels, thus supplementing local structural information. Second, a dual-dimensional attention mechanism (spatial and channel) is introduced to symmetrically deploy attention modules in both the encoder and decoder, prioritizing salient information extraction. Finally, Lovász-Softmax Loss is used as an auxiliary loss to optimize the training objective. The proposed method is evaluated on two public benchmark datasets. The mIoU on the Toronto3D and S3DIS datasets is 83.6% and 65.2%, respectively. Compared with the baseline LACV-Net, LSSCC-Net showed notable improvements in challenging categories: the IoU for “road mark” and “fence” on Toronto3D increased by 3.6% and 8.1%, respectively. These results indicate that LSSCC-Net more accurately characterizes complex boundaries and fine-grained structures, enhancing segmentation capabilities for small-scale targets and category boundaries. Full article
Show Figures

Figure 1

31 pages, 36598 KB  
Article
Spatio-Temporal and Semantic Dual-Channel Contrastive Alignment for POI Recommendation
by Chong Bu, Yujie Liu, Jing Lu, Manqi Huang, Maoyi Li and Jiarui Li
Big Data Cogn. Comput. 2025, 9(12), 322; https://doi.org/10.3390/bdcc9120322 - 15 Dec 2025
Viewed by 377
Abstract
Point-of-Interest (POI) recommendation predicts users’ future check-ins based on their historical trajectories and plays a key role in location-based services (LBS). Traditional approaches such as collaborative filtering and matrix factorization model user–POI interaction matrices fail to fully leverage spatio-temporal information and semantic attributes, [...] Read more.
Point-of-Interest (POI) recommendation predicts users’ future check-ins based on their historical trajectories and plays a key role in location-based services (LBS). Traditional approaches such as collaborative filtering and matrix factorization model user–POI interaction matrices fail to fully leverage spatio-temporal information and semantic attributes, leading to weak performance on sparse and long-tail POIs. Recently, Graph Neural Networks (GNNs) have been applied by constructing heterogeneous user–POI graphs to capture high-order relations. However, they still struggle to effectively integrate spatio-temporal and semantic information and enhance the discriminative power of learned representations. To overcome these issues, we propose Spatio-Temporal and Semantic Dual-Channel Contrastive Alignment for POI Recommendation (S2DCRec), a novel framework integrating spatio-temporal and semantic information. It employs hierarchical relational encoding to capture fine-grained behavioral patterns and high-level semantic dependencies. The model jointly captures user–POI interactions, temporal dynamics, and semantic correlations in a unified framework. Furthermore, our alignment strategy ensures micro-level collaborative and spatio-temporal consistency and macro-level semantic coherence, enabling fine-grained embedding fusion and interpretable contrastive learning. Experiments on real-world datasets, Foursquare NYC, and Yelp, show that S2DCRec outperforms all baselines, improving F1 scores by 4.04% and 3.01%, respectively. These results demonstrate the effectiveness of the dual-channel design in capturing both sequential and semantic dependencies for accurate POI recommendation. Full article
(This article belongs to the Topic Graph Neural Networks and Learning Systems)
Show Figures

Figure 1

19 pages, 3089 KB  
Article
Trajectory Prediction for Powered Two-Wheelers in Mixed Traffic Scenes: An Enhanced Social-GAT Approach
by Longxin Zeng, Fujian Chen, Jiangfeng Li, Haiquan Wang, Yujie Li and Zhongyi Zhai
Systems 2025, 13(11), 1036; https://doi.org/10.3390/systems13111036 - 19 Nov 2025
Viewed by 551
Abstract
In mixed traffic scenarios involving both motorized and non-motorized participants, accurately predicting future trajectories of surrounding vehicles remains a major challenge for autonomous driving. Predicting the motion of powered two-wheelers (PTWs) is particularly difficult due to their abrupt behavioral changes and stochastic interaction [...] Read more.
In mixed traffic scenarios involving both motorized and non-motorized participants, accurately predicting future trajectories of surrounding vehicles remains a major challenge for autonomous driving. Predicting the motion of powered two-wheelers (PTWs) is particularly difficult due to their abrupt behavioral changes and stochastic interaction patterns. To address this issue, this paper proposes an enhanced Social-GAT model with a multi-module architecture for PTW trajectory prediction. The model consists of a dual-channel LSTM encoder that separately processes position and motion features; a temporal attention mechanism to weight key historical states; and a residual-connected two-layer GAT structure to model social relationships within the interaction range, capturing interactive features between PTWs and surrounding vehicles through dynamic adjacency matrices. Finally, an LSTM decoder integrates spatiotemporal features and outputs the predicted trajectory. Experimental results on the rounD dataset demonstrate that our model achieves an outstanding ADE of 0.28, surpassing Trajectron++ by 9.68% and Social-GAN by 69.2%. It also attains the lowest RMSE values across 0.4–2.0s prediction horizons, confirming its superior accuracy and stability for PTW trajectory prediction in mixed traffic environments. Full article
(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)
Show Figures

Figure 1

25 pages, 1225 KB  
Article
Dual-Channel Heterogeneous Graph Neural Network for Automatic Algorithm Recommendation
by Xiaoyu Zhang, Yuxiang Sun and Xianzhong Zhou
Mathematics 2025, 13(22), 3674; https://doi.org/10.3390/math13223674 - 16 Nov 2025
Cited by 1 | Viewed by 674
Abstract
Automatic algorithm selection is a critical challenge in data-driven decision-making due to the proliferation of available algorithms and the diversity of application scenarios, with no universally optimal solution. Traditional methods, including rule-based systems, grid search, and single-modal meta-learning, often struggle with high computational [...] Read more.
Automatic algorithm selection is a critical challenge in data-driven decision-making due to the proliferation of available algorithms and the diversity of application scenarios, with no universally optimal solution. Traditional methods, including rule-based systems, grid search, and single-modal meta-learning, often struggle with high computational cost, limited generalization, and insufficient modeling of complex dataset-algorithm interactions, particularly under data sparsity or cold-start conditions. To address these issues, we propose a Dual-Channel Heterogeneous Graph Neural Network (DCHGNN) for automatic algorithm recommendation. Datasets and algorithms are represented as nodes in a heterogeneous bipartite graph, with edge weights defined by observed performance. The framework employs two channels, one for encoding the textual descriptions and the other for capturing the meta-features of the dataset. Cross-channel contrastive learning aligns embeddings to improve consistency, and a random forest regressor predicts algorithm performance on unseen datasets. Experiments on 121 datasets and 179 algorithms show that DCHGNN achieves an average relative maximum value of 94.8%, outperforming baselines, with 85% of predictions in the high-confidence range [0.9, 1]. Ablation studies and visualization analyses confirm the contributions of both channels and the contrastive mechanism. Overall, DCHGNN effectively integrates multimodal information, mitigates sparsity and cold-start issues, and provides robust and accurate algorithm recommendations. Full article
(This article belongs to the Special Issue New Advances and Challenges in Neural Networks and Applications)
Show Figures

Figure 1

23 pages, 673 KB  
Review
Calcium Dynamics in Astrocyte-Neuron Communication from Intracellular to Extracellular Signaling
by Agnieszka Nowacka, Maciej Śniegocki and Ewa A. Ziółkowska
Cells 2025, 14(21), 1709; https://doi.org/10.3390/cells14211709 - 31 Oct 2025
Viewed by 1960
Abstract
Astrocytic calcium signaling is a central mechanism of neuron-glia communication that operates across multiple spatial and temporal scales. Traditionally, research has focused on intracellular Ca2+ oscillations that regulate gliotransmitter release, ion homeostasis, and metabolic support. Recent evidence, however, reveals that extracellular calcium [...] Read more.
Astrocytic calcium signaling is a central mechanism of neuron-glia communication that operates across multiple spatial and temporal scales. Traditionally, research has focused on intracellular Ca2+ oscillations that regulate gliotransmitter release, ion homeostasis, and metabolic support. Recent evidence, however, reveals that extracellular calcium ([Ca2+]o) is not a passive reservoir but a dynamic signaling mediator capable of influencing neuronal excitability within milliseconds. Through mechanisms such as calcium-sensing receptor (CaSR) activation, ion channel modulation, surface charge effects, and ephaptic coupling, astrocytes emerge as active partners in both slow and rapid modes of communication. This dual perspective reshapes our understanding of brain physiology and disease. Disrupted Ca2+ signaling contributes to network instability in epilepsy, synaptic dysfunction in Alzheimer’s and Parkinson’s disease, and impaired maturation in neurodevelopmental disorders. Methodological advances, including Ca2+-selective microelectrodes, genetically encoded extracellular indicators, and computational modeling, are beginning to uncover the richness of extracellular Ca2+ dynamics, though challenges remain in achieving sufficient spatial and temporal resolution. By integrating classical intracellular pathways with emerging insights into extracellular signaling, this review highlights astrocytes as central architects of the ionic landscape. Recognizing calcium as both an intracellular messenger and an extracellular signaling mediator provides a unifying framework for neuron–glia interactions and opens new avenues for therapeutic intervention. Full article
Show Figures

Figure 1

24 pages, 1826 KB  
Article
Cloud and Snow Segmentation via Transformer-Guided Multi-Stream Feature Integration
by Kaisheng Yu, Kai Chen, Liguo Weng, Min Xia and Shengyan Liu
Remote Sens. 2025, 17(19), 3329; https://doi.org/10.3390/rs17193329 - 29 Sep 2025
Viewed by 697
Abstract
Cloud and snow often share comparable visual and structural patterns in satellite observations, making their accurate discrimination and segmentation particularly challenging. To overcome this, we design an innovative Transformer-guided architecture with complementary feature-extraction capabilities. The encoder adopts a dual-path structure, integrating a Transformer [...] Read more.
Cloud and snow often share comparable visual and structural patterns in satellite observations, making their accurate discrimination and segmentation particularly challenging. To overcome this, we design an innovative Transformer-guided architecture with complementary feature-extraction capabilities. The encoder adopts a dual-path structure, integrating a Transformer Encoder Module (TEM) for capturing long-range semantic dependencies and a ResNet18-based convolutional branch for detailed spatial representation. A Feature-Enhancement Module (FEM) is introduced to promote bidirectional interaction and adaptive feature integration between the two pathways. To improve delineation of object boundaries, especially in visually complex areas, we embed a Deep Feature-Extraction Module (DFEM) at the deepest layer of the convolutional stream. This component refines channel-level information to highlight critical features and enhance edge clarity. Additionally, to address noise from intricate backgrounds and ambiguous cloud-snow transitions, we incorporate both a Transformer Fusion Module (TFM) and a Strip Pooling Auxiliary Module (SPAM) in the decoding phase. These modules collaboratively enhance structural recovery and improve robustness in segmentation. Extensive experiments on the CSWV and SPARCS datasets show that our method consistently outperforms state-of-the-art baselines, demonstrating its strong effectiveness and applicability in real-world cloud and snow-detection scenarios. Full article
Show Figures

Figure 1

20 pages, 14906 KB  
Article
Dual-Channel ADCMix–BiLSTM Model with Attention Mechanisms for Multi-Dimensional Sentiment Analysis of Danmu
by Wenhao Ping, Zhihui Bai and Yubo Tao
Technologies 2025, 13(8), 353; https://doi.org/10.3390/technologies13080353 - 10 Aug 2025
Viewed by 1661
Abstract
Sentiment analysis methods for interactive services such as Danmu in online videos are challenged by their colloquial style and diverse sentiment expressions. For instance, the existing methods cannot easily distinguish between similar sentiments. To address these limitations, this paper proposes a dual-channel model [...] Read more.
Sentiment analysis methods for interactive services such as Danmu in online videos are challenged by their colloquial style and diverse sentiment expressions. For instance, the existing methods cannot easily distinguish between similar sentiments. To address these limitations, this paper proposes a dual-channel model integrated with attention mechanisms for multi-dimensional sentiment analysis of Danmu. First, we replace word embeddings with character embeddings to better capture the colloquial nature of Danmu text. Second, the dual-channel multi-dimensional sentiment encoder extracts both the high-level semantic and raw contextual information. Channel I of the encoder learns the sentiment features from different perspectives through a mixed model that combines the benefits of self-Attention and Dilated CNN (ADCMix) and performs contextual modeling through bidirectional long short-term memory (BiLSTM) with attention mechanisms. Channel II mitigates potential biases and omissions in the sentiment features. The model combines the two channels to erase the fuzzy boundaries between similar sentiments. Third, a multi-dimensional sentiment decoder is designed to handle the diversity in sentiment expressions. The superior performance of the proposed model is experimentally demonstrated on two datasets. Our model outperformed the state-of-the-art methods on both datasets, with improvements of at least 2.05% in accuracy and 3.28% in F1-score. Full article
Show Figures

Figure 1

24 pages, 7475 KB  
Article
Application of a Dual-Stream Network Collaboratively Based on Wavelet and Spatial-Channel Convolution in the Inpainting of Blank Strips in Marine Electrical Imaging Logging Images: A Case Study in the South China Sea
by Guilan Lin, Sinan Fang, Manxin Li, Hongtao Wu, Chenxi Xue and Zeyu Zhang
J. Mar. Sci. Eng. 2025, 13(5), 997; https://doi.org/10.3390/jmse13050997 - 21 May 2025
Cited by 3 | Viewed by 1159
Abstract
Electrical imaging logging technology precisely characterizes the features of the formation on the borehole wall through high-resolution resistivity images. However, the problem of blank strips caused by the mismatch between the instrument pads and the borehole diameter seriously affects the accuracy of fracture [...] Read more.
Electrical imaging logging technology precisely characterizes the features of the formation on the borehole wall through high-resolution resistivity images. However, the problem of blank strips caused by the mismatch between the instrument pads and the borehole diameter seriously affects the accuracy of fracture identification and formation continuity interpretation in marine oil and gas reservoirs. Existing inpainting methods struggle to reconstruct complex geological textures while maintaining structural continuity, particularly in balancing low-frequency formation morphology with high-frequency fracture details. To address this issue, this paper proposes an inpainting method using a dual-stream network based on the collaborative optimization of wavelet and spatial-channel convolution. By designing a texture-aware data prior algorithm, a high-quality training dataset with geological rationality is generated. A dual-stream encoder–decoder network architecture is adopted, and the wavelet transform convolution (WTConv) module is utilized to enhance the multi-scale perception ability of the generator, achieving a collaborative analysis of the low-frequency formation structure and high-frequency fracture details. Combined with the spatial channel convolution (SCConv) to enhance the feature fusion module, the cross-modal interaction between texture and structural features is optimized through a dynamic gating mechanism. Furthermore, a multi-objective loss function is introduced to constrain the semantic coherence and visual authenticity of image reconstruction. Experiments show that, in the inpainting indexes for Block X in the South China Sea, the mean absolute error (MAE), structural similarity index (SSIM), and peak signal-to-noise ratio (PSNR) of this method are 6.893, 0.779, and 19.087, respectively, which are significantly better than the improved filtersim, U-Net, and AOT-GAN methods. The correlation degree of the pixel distribution between the inpainted area and the original image reaches 0.921~0.997, verifying the precise matching of the low-frequency morphology and high-frequency details. In the inpainting of electrical imaging logging images across blocks, the applicability of the method is confirmed, effectively solving the interference of blank strips on the interpretation accuracy of marine oil and gas reservoirs. It provides an intelligent inpainting tool with geological interpretability for the electrical imaging logging interpretation of complex reservoirs, and has important engineering value for improving the efficiency of oil and gas exploration and development. Full article
(This article belongs to the Special Issue Research on Offshore Oil and Gas Numerical Simulation)
Show Figures

Figure 1

29 pages, 22521 KB  
Article
DBCA-Net: A Dual-Branch Context-Aware Algorithm for Cattle Face Segmentation and Recognition
by Xiaopu Feng, Jiaying Zhang, Yongsheng Qi, Liqiang Liu and Yongting Li
Agriculture 2025, 15(5), 516; https://doi.org/10.3390/agriculture15050516 - 27 Feb 2025
Cited by 1 | Viewed by 1277
Abstract
Cattle face segmentation and recognition in complex scenarios pose significant challenges due to insufficient fine-grained feature representation in segmentation networks and limited modeling of salient regions and local–global feature interactions in recognition models. To address these issues, DBCA-Net, a dual-branch context-aware algorithm for [...] Read more.
Cattle face segmentation and recognition in complex scenarios pose significant challenges due to insufficient fine-grained feature representation in segmentation networks and limited modeling of salient regions and local–global feature interactions in recognition models. To address these issues, DBCA-Net, a dual-branch context-aware algorithm for cattle face segmentation and recognition, is proposed. The method integrates an improved TransUNet-based segmentation network with a novel Fusion-Augmented Channel Attention (FACA) mechanism in the hybrid encoder, enhancing channel attention and fine-grained feature representation to improve segmentation performance in complex environments. The decoder incorporates an Adaptive Multi-Scale Attention Gate (AMAG) module, which mitigates interference from complex backgrounds through adaptive multi-scale feature fusion. Additionally, FACA and AMAG establish a dynamic feedback mechanism that enables iterative optimization of feature representation and parameter updates. For recognition, the GeLU-enhanced Partial Class Activation Attention (G-PCAA) module is introduced after Patch Partition, strengthening salient region modeling and enhancing local–global feature interaction. Experimental results demonstrate that DBCA-Net achieves superior performance, with 95.48% mIoU and 97.61% mDSC in segmentation tasks and 95.34% accuracy and 93.14% F1-score in recognition tasks. These findings underscore the effectiveness of DBCA-Net in addressing segmentation and recognition challenges in complex scenarios, offering significant improvements over existing methods. Full article
Show Figures

Figure 1

25 pages, 6071 KB  
Article
A Multi-Scale Spatio-Temporal Fusion Network for Occluded Small Object Detection in Geiger-Mode Avalanche Photodiode LiDAR Systems
by Yuanxue Ding, Dakuan Du, Jianfeng Sun, Le Ma, Xianhui Yang, Rui He, Jie Lu and Yanchen Qu
Remote Sens. 2025, 17(5), 764; https://doi.org/10.3390/rs17050764 - 22 Feb 2025
Viewed by 1713
Abstract
The Geiger-Mode Avalanche Photodiode (Gm-APD) LiDAR system demonstrates high-precision detection capabilities over long distances. However, the detection of occluded small objects at long distances poses significant challenges, limiting its practical application. To address this issue, we propose a multi-scale spatio-temporal object detection network [...] Read more.
The Geiger-Mode Avalanche Photodiode (Gm-APD) LiDAR system demonstrates high-precision detection capabilities over long distances. However, the detection of occluded small objects at long distances poses significant challenges, limiting its practical application. To address this issue, we propose a multi-scale spatio-temporal object detection network (MSTOD-Net), designed to associate object information across different spatio-temporal scales for the effective detection of occluded small objects. Specifically, in the encoding stage, a dual-channel feature fusion framework is employed to process range and intensity images from consecutive time frames, facilitating the detection of occluded objects. Considering the significant differences between range and intensity images, a multi-scale context-aware (MSCA) module and a feature fusion (FF) module are incorporated to enable efficient cross-scale feature interaction and enhance small object detection. Additionally, an edge perception (EDGP) module is integrated into the network’s shallow layers to refine the edge details and enhance the information in unoccluded regions. In the decoding stage, feature maps from the encoder are upsampled and combined with multi-level fused features, and four prediction heads are employed to decode the object categories, confidence, widths and heights, and displacement offsets. The experimental results demonstrate that the MSTOD-Net achieves mAP50 and mAR50 scores of 96.4% and 96.9%, respectively, outperforming the state-of-the-art methods. Full article
Show Figures

Figure 1

18 pages, 1869 KB  
Article
A Deepfake Image Detection Method Based on a Multi-Graph Attention Network
by Guorong Chen, Chongling Du, Yuan Yu, Hong Hu, Hongjun Duan and Huazheng Zhu
Electronics 2025, 14(3), 482; https://doi.org/10.3390/electronics14030482 - 24 Jan 2025
Cited by 3 | Viewed by 4157
Abstract
Deep forgery detection plays a crucial role in addressing the challenges posed by the rapid spread of deeply generated content that significantly erodes public trust in online information and media. Deeply forged images typically present subtle but significant artifacts in multiple regions, such [...] Read more.
Deep forgery detection plays a crucial role in addressing the challenges posed by the rapid spread of deeply generated content that significantly erodes public trust in online information and media. Deeply forged images typically present subtle but significant artifacts in multiple regions, such as in the background, lighting, and localized details. These artifacts manifest as unnatural visual distortions, inconsistent lighting, or irregularities in subtle features that break the natural coherence of the real image. To address these features of forged images, we propose a novel and efficient deep image forgery detection method that utilizes Multi-Graph Attention (MGA) techniques to extract global and local features and minimize accuracy loss. Specifically, our method introduces an interactive dual-channel encoder (DIRM), which aims to extract global and channel-specific features and facilitate complex interactions between these feature sets. In the decoding phase, one of the channels is processed as a block and combined with a Dynamic Graph Attention Network (PDGAN), which is capable of recognizing and amplifying forged traces in local information. To further enhance the model’s ability to capture global context, we propose a global Height–Width Graph Attention Module (HWGAN), which effectively extracts and associates global spatial features. Experimental results show that the classification accuracy of our method for forged images in the GenImage and CIFAKE datasets is comparable to that of the optimal benchmark method. Notably, our model achieves 97.89% accuracy on the CIFAKE dataset and has the lowest number of model parameters and lowest computational overhead. These results highlight the potential of our method for deep forgery image detection. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)
Show Figures

Figure 1

30 pages, 13159 KB  
Article
GLMAFuse: A Dual-Stream Infrared and Visible Image Fusion Framework Integrating Local and Global Features with Multi-Scale Attention
by Fu Li, Yanghai Gu, Ming Zhao, Deji Chen and Quan Wang
Electronics 2024, 13(24), 5002; https://doi.org/10.3390/electronics13245002 - 19 Dec 2024
Viewed by 1546
Abstract
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to [...] Read more.
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to prioritize global information, which can lead to a deficiency in feature diversity and detail retention. Furthermore, methods reliant on single-scale feature extraction are inadequate for capturing extensive scene information. To address these limitations, this study presents GLMAFuse, an innovative dual-stream encoder–decoder network, which utilizes a multi-scale attention mechanism to harmoniously integrate global and local features. This framework is designed to maximize the extraction of multi-scale features from source images while effectively synthesizing local and global information across all layers. We introduce the global-aware and local embedding (GALE) module to adeptly capture and merge global structural attributes and localized details from infrared and visible imagery via a parallel dual-branch architecture. Additionally, the multi-scale attention fusion (MSAF) module is engineered to optimize attention weights at the channel level, facilitating an enhanced synergy between high-frequency edge details and global backgrounds. This promotes effective interaction and fusion of dual-modal features. Extensive evaluations using standard datasets demonstrate that GLMAFuse surpasses the existing leading methods in both qualitative and quantitative assessments, highlighting its superior capability in infrared and visible image fusion. On the TNO and MSRS datasets, our method achieves outstanding performance across multiple metrics, including EN (7.15, 6.75), SD (46.72, 47.55), SF (12.79, 12.56), MI (2.21, 3.22), SCD (1.75, 1.80), VIF (0.79, 1.08), Qbaf (0.58, 0.71), and SSIM (0.99, 1.00). These results underscore its exceptional proficiency in infrared and visible image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence Innovations in Image Processing)
Show Figures

Figure 1

24 pages, 7081 KB  
Article
Global-Local Collaborative Learning Network for Optical Remote Sensing Image Change Detection
by Jinghui Li, Feng Shao, Qiang Liu and Xiangchao Meng
Remote Sens. 2024, 16(13), 2341; https://doi.org/10.3390/rs16132341 - 27 Jun 2024
Cited by 2 | Viewed by 2274
Abstract
Due to the widespread applications of change detection technology in urban change analysis, environmental monitoring, agricultural surveillance, disaster detection, and other domains, the task of change detection has become one of the primary applications of Earth orbit satellite remote sensing data. However, the [...] Read more.
Due to the widespread applications of change detection technology in urban change analysis, environmental monitoring, agricultural surveillance, disaster detection, and other domains, the task of change detection has become one of the primary applications of Earth orbit satellite remote sensing data. However, the analysis of dual-temporal change detection (CD) remains a challenge in high-resolution optical remote sensing images due to the complexities in remote sensing images, such as intricate textures, seasonal variations in imaging time, climatic differences, and significant differences in the sizes of various objects. In this paper, we propose a novel U-shaped architecture for change detection. In the encoding stage, a multi-branch feature extraction module is employed by combining CNN and transformer networks to enhance the network’s perception capability for objects of varying sizes. Furthermore, a multi-branch aggregation module is utilized to aggregate features from different branches, providing the network with global attention while preserving detailed information. For dual-temporal features, we introduce a spatiotemporal discrepancy perception module to model the context of dual-temporal images. Particularly noteworthy is the construction of channel attention and token attention modules based on the transformer attention mechanism to facilitate information interaction between multi-level features, thereby enhancing the network’s contextual awareness. The effectiveness of the proposed network is validated on three public datasets, demonstrating its superior performance over other state-of-the-art methods through qualitative and quantitative experiments. Full article
Show Figures

Figure 1

30 pages, 73805 KB  
Article
DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection
by Ming Chen, Wanshou Jiang and Yuan Zhou
Remote Sens. 2024, 16(5), 844; https://doi.org/10.3390/rs16050844 - 28 Feb 2024
Cited by 3 | Viewed by 2664
Abstract
Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results often fail to capture target [...] Read more.
Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results often fail to capture target contours accurately. To address these issues, we propose a novel transformer-based hybrid network. In this study, we analyze the structural relationship in bi-temporal images and introduce a cross-attention-based transformer to model this relationship. First, we use a tokenizer to express the high-level features of the bi-temporal image into several semantic tokens. Then, we use a dual temporal transformer (DTT) encoder to capture dense spatiotemporal contextual relationships among the tokens. The features extracted at the coarse scale are refined into finer details through the DTT decoder. Concurrently, we input the backbone’s low-level features into a contour-guided graph interaction module (CGIM) that utilizes joint attention to capture semantic relationships between object regions and the contour. Then, we use the feature pyramid decoder to integrate the multi-scale outputs of the CGIM. The convolutional block attention modules (CBAMs) employ channel and spatial attention to reweight feature maps. Finally, the classifier discriminates change pixels and generates the final change map of the difference feature map. Several experiments have demonstrated that our model shows significant advantages over other methods in terms of efficiency, accuracy, and visual effects. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

Back to TopTop