Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (360)

Search Parameters:
Keywords = heterogeneous feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2498 KiB  
Article
SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability
by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen
Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 (registering DOI) - 2 Aug 2025
Abstract
How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive [...] Read more.
How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article
(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)
Show Figures

Figure 1

14 pages, 5672 KiB  
Article
Multiplex Immunofluorescence Reveals Therapeutic Targets EGFR, EpCAM, Tissue Factor, and TROP2 in Triple-Negative Breast Cancer
by T. M. Mohiuddin, Wenjie Sheng, Chaoyu Zhang, Marwah Al-Rawe, Svetlana Tchaikovski, Felix Zeppernick, Ivo Meinhold-Heerlein and Ahmad Fawzi Hussain
Int. J. Mol. Sci. 2025, 26(15), 7430; https://doi.org/10.3390/ijms26157430 (registering DOI) - 1 Aug 2025
Abstract
Triple-negative breast cancer (TNBC) is a clinically and molecularly heterogeneous subtype defined by the absence of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression. In this study, tumor specimens from 104 TNBC patients were analyzed to [...] Read more.
Triple-negative breast cancer (TNBC) is a clinically and molecularly heterogeneous subtype defined by the absence of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression. In this study, tumor specimens from 104 TNBC patients were analyzed to characterize molecular and clinicopathological features and to assess the expression and therapeutic potential of four key surface markers: epidermal growth factor receptor (EGFR), epithelial cell adhesion molecule (EpCAM), tissue factor (TF), and trophoblast cell surface antigen (TROP2). Multiplex immunofluorescence (mIF) demonstrated elevated EGFR and TROP2 expression in the majority of samples. Significant positive correlations were observed between EGFR and TF, as well as between TROP2 and both TF and EpCAM. Expression analyses revealed increased EGFR and TF levels with advancing tumor stage, whereas EpCAM expression declined in advanced-stage tumors. TROP2 and TF expression were significantly elevated in higher-grade tumors. Additionally, EGFR and EpCAM levels were significantly higher in patients with elevated Ki-67 indices. Binding specificity assays using single-chain variable fragment (scFv-SNAP) fusion proteins confirmed robust targeting efficacy, particularly for EGFR and TROP2. These findings underscore the therapeutic relevance of EGFR and TROP2 as potential biomarkers and targets in TNBC. Full article
(This article belongs to the Section Molecular Pathology, Diagnostics, and Therapeutics)
Show Figures

Figure 1

20 pages, 1536 KiB  
Article
Graph Convolution-Based Decoupling and Consistency-Driven Fusion for Multimodal Emotion Recognition
by Yingmin Deng, Chenyu Li, Yu Gu, He Zhang, Linsong Liu, Haixiang Lin, Shuang Wang and Hanlin Mo
Electronics 2025, 14(15), 3047; https://doi.org/10.3390/electronics14153047 - 30 Jul 2025
Viewed by 170
Abstract
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic [...] Read more.
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic Weighted Graph Convolutional Network (DW-GCN) for feature disentanglement and a Cross-Attention Consistency-Gated Fusion (CACG-Fusion) module for robust integration. DW-GCN models complex inter-modal relationships, enabling the extraction of both common and private features. The CACG-Fusion module subsequently enhances classification performance through dynamic alignment of cross-modal cues, employing attention-based coordination and consistency-preserving gating mechanisms to optimize feature integration. Experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method achieves state-of-the-art performance, significantly improving the ACC7, ACC2, and F1 scores. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

11 pages, 1521 KiB  
Communication
Research on the Grinding Quality Evaluation of Composite Materials Based on Multi-Scale Texture Fusion Analysis
by Yangjun Wang, Zilu Liu, Li Ling, Anru Guo, Jiacheng Li, Jiachang Liu, Chunju Wang, Mingqiang Pan and Wei Song
Materials 2025, 18(15), 3540; https://doi.org/10.3390/ma18153540 - 28 Jul 2025
Viewed by 214
Abstract
To address the challenges of manual inspection dependency, low efficiency, and high costs in evaluating the surface grinding quality of composite materials, this study investigated machine vision-based surface recognition algorithms. We proposed a multi-scale texture fusion analysis algorithm that innovatively integrated luminance analysis [...] Read more.
To address the challenges of manual inspection dependency, low efficiency, and high costs in evaluating the surface grinding quality of composite materials, this study investigated machine vision-based surface recognition algorithms. We proposed a multi-scale texture fusion analysis algorithm that innovatively integrated luminance analysis with multi-scale texture features through decision-level fusion. Specifically, a modified Rayleigh parameter was developed during luminance analysis to rapidly pre-segment unpolished areas by characterizing surface reflection properties. Furthermore, we enhanced the traditional Otsu algorithm by incorporating global grayscale mean (μ) and standard deviation (σ), overcoming its inherent limitations of exclusive reliance on grayscale histograms and lack of multimodal feature integration. This optimization enables simultaneous detection of specular reflection defects and texture uniformity variations. To improve detection window adaptability across heterogeneous surface regions, we designed a multi-scale texture analysis framework operating at multiple resolutions. Through decision-level fusion of luminance analysis and multi-scale texture evaluation, the proposed algorithm achieved 96% recognition accuracy with >95% reliability, demonstrating robust performance for automated surface grinding quality assessment of composite materials. Full article
(This article belongs to the Section Advanced Composites)
Show Figures

Figure 1

24 pages, 12286 KiB  
Article
A UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection
by Yalin Zhang, Xue Rui and Weiguo Song
Remote Sens. 2025, 17(15), 2593; https://doi.org/10.3390/rs17152593 - 25 Jul 2025
Viewed by 363
Abstract
UAVs are essential for forest fire detection due to vast forest areas and inaccessibility of high-risk zones, enabling rapid long-range inspection and detailed close-range surveillance. However, aerial photography faces challenges like multi-scale target recognition and complex scenario adaptation (e.g., deformation, occlusion, lighting variations). [...] Read more.
UAVs are essential for forest fire detection due to vast forest areas and inaccessibility of high-risk zones, enabling rapid long-range inspection and detailed close-range surveillance. However, aerial photography faces challenges like multi-scale target recognition and complex scenario adaptation (e.g., deformation, occlusion, lighting variations). RGB-Thermal fusion methods integrate visible-light texture and thermal infrared temperature features effectively, but current approaches are constrained by limited datasets and insufficient exploitation of cross-modal complementary information, ignoring cross-level feature interaction. A time-synchronized multi-scene, multi-angle aerial RGB-Thermal dataset (RGBT-3M) with “Smoke–Fire–Person” annotations and modal alignment via the M-RIFT method was constructed as a way to address the problem of data scarcity in wildfire scenarios. Finally, we propose a CP-YOLOv11-MF fusion detection model based on the advanced YOLOv11 framework, which can learn heterogeneous features complementary to each modality in a progressive manner. Experimental validation proves the superiority of our method, with a precision of 92.5%, a recall of 93.5%, a mAP50 of 96.3%, and a mAP50-95 of 62.9%. The model’s RGB-Thermal fusion capability enhances early fire detection, offering a benchmark dataset and methodological advancement for intelligent forest conservation, with implications for AI-driven ecological protection. Full article
(This article belongs to the Special Issue Advances in Spectral Imagery and Methods for Fire and Smoke Detection)
Show Figures

Figure 1

22 pages, 3082 KiB  
Article
A Lightweight Intrusion Detection System with Dynamic Feature Fusion Federated Learning for Vehicular Network Security
by Junjun Li, Yanyan Ma, Jiahui Bai, Congming Chen, Tingting Xu and Chi Ding
Sensors 2025, 25(15), 4622; https://doi.org/10.3390/s25154622 - 25 Jul 2025
Viewed by 293
Abstract
The rapid integration of complex sensors and electronic control units (ECUs) in autonomous vehicles significantly increases cybersecurity risks in vehicular networks. Although the Controller Area Network (CAN) is efficient, it lacks inherent security mechanisms and is vulnerable to various network attacks. The traditional [...] Read more.
The rapid integration of complex sensors and electronic control units (ECUs) in autonomous vehicles significantly increases cybersecurity risks in vehicular networks. Although the Controller Area Network (CAN) is efficient, it lacks inherent security mechanisms and is vulnerable to various network attacks. The traditional Intrusion Detection System (IDS) makes it difficult to effectively deal with the dynamics and complexity of emerging threats. To solve these problems, a lightweight vehicular network intrusion detection framework based on Dynamic Feature Fusion Federated Learning (DFF-FL) is proposed. The proposed framework employs a two-stream architecture, including a transformer-augmented autoencoder for abstract feature extraction and a lightweight CNN-LSTM–Attention model for preserving temporal and local patterns. Compared with the traditional theoretical framework of the federated learning, DFF-FL first dynamically fuses the deep feature representation of each node through the transformer attention module to realize the fine-grained cross-node feature interaction in a heterogeneous data environment, thereby eliminating the performance degradation caused by the difference in feature distribution. Secondly, based on the final loss LAEX,X^ index of each node, an adaptive weight adjustment mechanism is used to make the nodes with excellent performance dominate the global model update, which significantly improves robustness against complex attacks. Experimental evaluation on the CAN-Hacking dataset shows that the proposed intrusion detection system achieves more than 99% F1 score with only 1.11 MB of memory and 81,863 trainable parameters, while maintaining low computational overheads and ensuring data privacy, which is very suitable for edge device deployment. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

13 pages, 2828 KiB  
Article
Wafer Defect Image Generation Method Based on Improved Styleganv3 Network
by Jialin Zou, Hongcheng Wang and Jiajin Zhong
Micromachines 2025, 16(8), 844; https://doi.org/10.3390/mi16080844 - 23 Jul 2025
Viewed by 287
Abstract
This paper takes a look at training a generator model based on a limited dataset that can fit the distribution of the original dataset, improving the reconstruction ability of wafer datasets. High-fidelity wafer defect image generation remains challenging due to limited real data [...] Read more.
This paper takes a look at training a generator model based on a limited dataset that can fit the distribution of the original dataset, improving the reconstruction ability of wafer datasets. High-fidelity wafer defect image generation remains challenging due to limited real data and poor physical authenticity of existing methods. We propose an enhanced StyleGANv3 framework with two key innovations: (1) a Heterogeneous Kernel Fusion Unit (HKFU) enabling multi-scale defect feature refinement via spatiotemporal attention and dynamic gating; (2) a Dynamic Adaptive Attention Module (DAAM) adaptively boosting discriminator sensitivity. Experiments on Mixtype-WM38 and MIR-WM811K datasets demonstrate state-of-the-art performance, achieving FID scores of 25.20 and 28.70 alongside SDS values of 36.00 and 35.45. The proposed method in this article helps alleviate the problem of limited datasets and makes an important contribution to data preparation for downstream classification and detection tasks. Full article
Show Figures

Figure 1

22 pages, 2485 KiB  
Article
Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning
by Qingqing Hu, Yiran Peng, KinTak U and Siyuan Zhao
Mathematics 2025, 13(15), 2333; https://doi.org/10.3390/math13152333 - 22 Jul 2025
Viewed by 211
Abstract
Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the [...] Read more.
Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the fused images. In this paper, we propose a novel adversarial fusion framework driven by a state-space modelling paradigm to address these limitations. In the feature extraction phase, a computationally efficient state-space model is utilized to capture global semantic context from both infrared and visible inputs. A cross-modality state-space architecture is then introduced in the fusion phase to model long-range dependencies between heterogeneous features effectively. Finally, a multi-class discriminator, trained under an adversarial learning scheme, enhances the structural fidelity and detail consistency of the fused output. Extensive experiments conducted on publicly available infrared–visible fusion datasets demonstrate that the proposed method achieves superior performance in terms of information retention, contrast enhancement, and visual realism. The results confirm the robustness and generalizability of our framework for complex scene understanding and downstream tasks such as object detection under adverse conditions. Full article
Show Figures

Figure 1

23 pages, 24301 KiB  
Article
Robust Optical and SAR Image Registration Using Weighted Feature Fusion
by Ao Luo, Anxi Yu, Yongsheng Zhang, Wenhao Tong and Huatao Yu
Remote Sens. 2025, 17(15), 2544; https://doi.org/10.3390/rs17152544 - 22 Jul 2025
Viewed by 288
Abstract
Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). [...] Read more.
Image registration constitutes the fundamental basis for the joint interpretation of synthetic aperture radar (SAR) and optical images. However, robust image registration remains challenging due to significant regional heterogeneity in remote sensing scenes (e.g., co-existing urban and marine areas within a single image). To overcome this challenge, this article proposes a novel optical–SAR image registration method named Gradient and Standard Deviation Feature Weighted Fusion (GDWF). First, a Block-local standard deviation (Block-LSD) operator is proposed to extract block-based feature points with regional adaptability. Subsequently, a dual-modal feature description is developed, constructing both gradient-based descriptors and local standard deviation (LSD) descriptors for the neighborhoods surrounding the detected feature points. To further enhance matching robustness, a confidence-weighted feature fusion strategy is proposed. By establishing a reliability evaluation model for similarity measurement maps, the contribution weights of gradient features and LSD features are dynamically optimized, ensuring adaptive performance under varying conditions. To verify the effectiveness of the method, different optical and SAR datasets are used to compare it with the currently advanced algorithms MOGF, CFOG, and FED-HOPC. The experimental results demonstrate that the proposed GDWF algorithm achieves the best performance in terms of registration accuracy and robustness among all compared methods, effectively handling optical–SAR image pairs with significant regional heterogeneity. Full article
Show Figures

Figure 1

23 pages, 4361 KiB  
Article
ANHNE: Adaptive Multi-Hop Neighborhood Information Fusion for Heterogeneous Network Embedding
by Hanyu Xie, Hao Shao, Lunwen Wang and Changjian Song
Electronics 2025, 14(14), 2911; https://doi.org/10.3390/electronics14142911 - 21 Jul 2025
Viewed by 260
Abstract
Heterogeneous information network (HIN) embedding transforms multi-type nodes into low-dimensional vectors to preserve structural and semantic information for downstream tasks. However, it struggles with multiplex networks where nodes connect via diverse semantic paths (metapaths). Information fusion mainly improves the quality of node embedding [...] Read more.
Heterogeneous information network (HIN) embedding transforms multi-type nodes into low-dimensional vectors to preserve structural and semantic information for downstream tasks. However, it struggles with multiplex networks where nodes connect via diverse semantic paths (metapaths). Information fusion mainly improves the quality of node embedding by fully exploiting the structure and hidden information within the network. Current metapath-based methods ignore information from intermediate nodes along paths, depend on manually defined metapaths, and overlook implicit relationships between nodes sharing similar attributes. Our objective is to develop an adaptive framework that overcomes limitations in existing metapath-based embedding (incomplete information aggregation, manual path dependency, and ignorance of latent semantics) to learn more discriminative embeddings. We propose an adaptive multi-hop neighbor information fusion model for heterogeneous network embedding (ANHNE), which: (1) autonomously extracts composite metapaths (weighted combinations of relations) via a multipath aggregation matrix to mine hierarchical semantics of varying lengths for task-specific scenarios; (2) projects heterogeneous nodes into a unified space and employs hierarchical attention to selectively fuse neighborhood features across metapath hierarchies; and (3) enhances semantics by identifying potential node correlations via cosine similarity to construct implicit connections, enriching network structure with latent information. Extensive experimental results on multiple datasets show that ANHNE achieves more precise embeddings than comparable baseline models. Full article
(This article belongs to the Special Issue Advances in Learning on Graphs and Information Networks)
Show Figures

Figure 1

30 pages, 15434 KiB  
Article
A DSP–FPGA Heterogeneous Accelerator for On-Board Pose Estimation of Non-Cooperative Targets
by Qiuyu Song, Kai Liu, Shangrong Li, Mengyuan Wang and Junyi Wang
Aerospace 2025, 12(7), 641; https://doi.org/10.3390/aerospace12070641 - 19 Jul 2025
Viewed by 311
Abstract
The increasing presence of non-cooperative targets poses significant challenges to the space environment and threatens the sustainability of aerospace operations. Accurate on-orbit perception of such targets, particularly those without cooperative markers, requires advanced algorithms and efficient system architectures. This study presents a hardware–software [...] Read more.
The increasing presence of non-cooperative targets poses significant challenges to the space environment and threatens the sustainability of aerospace operations. Accurate on-orbit perception of such targets, particularly those without cooperative markers, requires advanced algorithms and efficient system architectures. This study presents a hardware–software co-design framework for the pose estimation of non-cooperative targets. Firstly, a two-stage architecture is proposed, comprising object detection and pose estimation. YOLOv5s is modified with a Focus module to enhance feature extraction, and URSONet adopts global average pooling to reduce the computational burden. Optimization techniques, including batch normalization fusion, ReLU integration, and linear quantization, are applied to improve inference efficiency. Secondly, a customized FPGA-based accelerator is developed with an instruction scheduler, memory slicing mechanism, and computation array. Instruction-level control supports model generalization, while a weight concatenation strategy improves resource utilization during convolution. Finally, a heterogeneous DSP–FPGA system is implemented, where the DSP manages data pre-processing and result integration, and the FPGA performs core inference. The system is deployed on a Xilinx X7K325T FPGA operating at 200 MHz. Experimental results show that the optimized model achieves a peak throughput of 399.16 GOP/s with less than 1% accuracy loss. The proposed design reaches 0.461 and 0.447 GOP/s/DSP48E1 for two model variants, achieving a 2× to 3× improvement over comparable designs. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

33 pages, 15612 KiB  
Article
A Personalized Multimodal Federated Learning Framework for Skin Cancer Diagnosis
by Shuhuan Fan, Awais Ahmed, Xiaoyang Zeng, Rui Xi and Mengshu Hou
Electronics 2025, 14(14), 2880; https://doi.org/10.3390/electronics14142880 - 18 Jul 2025
Viewed by 310
Abstract
Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable [...] Read more.
Skin cancer is one of the most prevalent forms of cancer worldwide, and early and accurate diagnosis critically impacts patient outcomes. Given the sensitive nature of medical data and its fragmented distribution across institutions (data silos), privacy-preserving collaborative learning is essential to enable knowledge-sharing without compromising patient confidentiality. While federated learning (FL) offers a promising solution, existing methods struggle with heterogeneous and missing modalities across institutions, which reduce the diagnostic accuracy. To address these challenges, we propose an effective and flexible Personalized Multimodal Federated Learning framework (PMM-FL), which enables efficient cross-client knowledge transfer while maintaining personalized performance under heterogeneous and incomplete modality conditions. Our study contains three key contributions: (1) A hierarchical aggregation strategy that decouples multi-module aggregation from local deployment via global modular-separated aggregation and local client fine-tuning. Unlike conventional FL (which synchronizes all parameters in each round), our method adopts a frequency-adaptive synchronization mechanism, updating parameters based on their stability and functional roles. (2) A multimodal fusion approach based on multitask learning, integrating learnable modality imputation and attention-based feature fusion to handle missing modalities. (3) A custom dataset combining multi-year International Skin Imaging Collaboration(ISIC) challenge data (2018–2024) to ensure comprehensive coverage of diverse skin cancer types. We evaluate PMM-FL through diverse experiment settings, demonstrating its effectiveness in heterogeneous and incomplete modality federated learning settings, achieving 92.32% diagnostic accuracy with only a 2% drop in accuracy under 30% modality missingness, with a 32.9% communication overhead decline compared with baseline FL methods. Full article
(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)
Show Figures

Figure 1

24 pages, 9664 KiB  
Article
Frequency-Domain Collaborative Lightweight Super-Resolution for Fine Texture Enhancement in Rice Imagery
by Zexiao Zhang, Jie Zhang, Jinyang Du, Xiangdong Chen, Wenjing Zhang and Changmeng Peng
Agronomy 2025, 15(7), 1729; https://doi.org/10.3390/agronomy15071729 - 18 Jul 2025
Viewed by 302
Abstract
In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, [...] Read more.
In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, which affects the detection performance. In view of the fact that pests and diseases affect the whole situation and tiny details are mostly localized, we propose a rice image reconstruction method based on an adaptive two-branch heterogeneous structure. The method consists of a low-frequency branch (LFB) that recovers global features using orientation-aware extended receptive fields to capture streaky global features, such as pests and diseases, and a high-frequency branch (HFB) that enhances detail edges through an adaptive enhancement mechanism to boost the clarity of local detail regions. By introducing the dynamic weight fusion mechanism (CSDW) and lightweight gating network (LFFN), the problem of the unbalanced fusion of frequency information for rice images in traditional methods is solved. Experiments on the 4× downsampled rice test set demonstrate that the proposed method achieves a 62% reduction in parameters compared to EDSR, 41% lower computational cost (30 G) than MambaIR-light, and an average PSNR improvement of 0.68% over other methods in the study while balancing memory usage (227 M) and inference speed. In downstream task validation, rice panicle maturity detection achieves a 61.5% increase in mAP50 (0.480 → 0.775) compared to interpolation methods, and leaf pest detection shows a 2.7% improvement in average mAP50 (0.949 → 0.975). This research provides an effective solution for lightweight rice image enhancement, with its dual-branch collaborative mechanism and dynamic fusion strategy establishing a new paradigm in agricultural rice image processing. Full article
(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)
Show Figures

Figure 1

21 pages, 5313 KiB  
Article
MixtureRS: A Mixture of Expert Network Based Remote Sensing Land Classification
by Yimei Liu, Changyuan Wu, Minglei Guan and Jingzhe Wang
Remote Sens. 2025, 17(14), 2494; https://doi.org/10.3390/rs17142494 - 17 Jul 2025
Viewed by 326
Abstract
Accurate land-use classification is critical for urban planning and environmental monitoring, yet effectively integrating heterogeneous data sources such as hyperspectral imagery and laser radar (LiDAR) remains challenging. To address this, we propose MixtureRS, a compact multimodal network that effectively integrates hyperspectral imagery and [...] Read more.
Accurate land-use classification is critical for urban planning and environmental monitoring, yet effectively integrating heterogeneous data sources such as hyperspectral imagery and laser radar (LiDAR) remains challenging. To address this, we propose MixtureRS, a compact multimodal network that effectively integrates hyperspectral imagery and LiDAR data for land-use classification. Our approach employs a 3-D plus heterogeneous convolutional stack to extract rich spectral–spatial features, which are then tokenized and fused via a cross-modality transformer. To enhance model capacity without incurring significant computational overhead, we replace conventional dense feed-forward blocks with a sparse Mixture-of-Experts (MoE) layer that selectively activates the most relevant experts for each token. Evaluated on a 15-class urban benchmark, MixtureRS achieves an overall accuracy of 88.6%, an average accuracy of 90.2%, and a Kappa coefficient of 0.877, outperforming the best homogeneous transformer by over 12 percentage points. Notably, the largest improvements are observed in water, railway, and parking categories, highlighting the advantages of incorporating height information and conditional computation. These results demonstrate that conditional, expert-guided fusion is a promising and efficient strategy for advancing multimodal remote sensing models. Full article
Show Figures

Graphical abstract

26 pages, 7645 KiB  
Article
VMMT-Net: A Dual-Branch Parallel Network Combining Visual State Space Model and Mix Transformer for Land–Sea Segmentation of Remote Sensing Images
by Jiawei Wu, Zijian Liu, Zhipeng Zhu, Chunhui Song, Xinghui Wu and Haihua Xing
Remote Sens. 2025, 17(14), 2473; https://doi.org/10.3390/rs17142473 - 16 Jul 2025
Viewed by 411
Abstract
Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack [...] Read more.
Land–sea segmentation is a fundamental task in remote sensing image analysis, and plays a vital role in dynamic coastline monitoring. The complex morphology and blurred boundaries of coastlines in remote sensing imagery make fast and accurate segmentation challenging. Recent deep learning approaches lack the ability to model spatial continuity effectively, thereby limiting a comprehensive understanding of coastline features in remote sensing imagery. To address this issue, we have developed VMMT-Net, a novel dual-branch semantic segmentation framework. By constructing a parallel heterogeneous dual-branch encoder, VMMT-Net integrates the complementary strengths of the Mix Transformer and the Visual State Space Model, enabling comprehensive modeling of local details, global semantics, and spatial continuity. We design a Cross-Branch Fusion Module to facilitate deep feature interaction and collaborative representation across branches, and implement a customized decoder module that enhances the integration of multiscale features and improves boundary refinement of coastlines. Extensive experiments conducted on two benchmark remote sensing datasets, GF-HNCD and BSD, demonstrate that the proposed VMMT-Net outperforms existing state-of-the-art methods in both quantitative metrics and visual quality. Specifically, the model achieves mean F1-scores of 98.48% (GF-HNCD) and 98.53% (BSD) and mean intersection-over-union values of 97.02% (GF-HNCD) and 97.11% (BSD). The model maintains reasonable computational complexity, with only 28.24 M parameters and 25.21 GFLOPs, striking a favorable balance between accuracy and efficiency. These results indicate the strong generalization ability and practical applicability of VMMT-Net in real-world remote sensing segmentation tasks. Full article
(This article belongs to the Special Issue Application of Remote Sensing in Coastline Monitoring)
Show Figures

Figure 1

Back to TopTop