MDPI - Publisher of Open Access Journals

24 pages, 3937 KiB

Open AccessArticle

HyperTransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Hyperspectral Image Classification

by Xin Dai, Zexi Li, Lin Li, Shuihua Xue, Xiaohui Huang and Xiaofei Yang

Remote Sens. 2025, 17(14), 2361; https://doi.org/10.3390/rs17142361 - 9 Jul 2025

Viewed by 346

Abstract

Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) [...] Read more.

Recent advances in hyperspectral image (HSI) classification have demonstrated the effectiveness of hybrid architectures that integrate convolutional neural networks (CNNs) and Transformers, leveraging CNNs for local feature extraction and Transformers for global dependency modeling. However, existing fusion approaches face three critical challenges: (1) insufficient synergy between spectral and spatial feature learning due to rigid coupling mechanisms; (2) high computational complexity resulting from redundant attention calculations; and (3) limited adaptability to spectral redundancy and noise in small-sample scenarios. To address these limitations, we propose HyperTransXNet, a novel CNN-Transformer hybrid architecture that incorporates adaptive spectral-spatial fusion. Specifically, the proposed HyperTransXNet comprises three key modules: (1) a Hybrid Spatial-Spectral Module (HSSM) that captures the refined local spectral-spatial features and models global spectral correlations by combining depth-wise dynamic convolution with frequency-domain attention; (2) a Mixture-of-Experts Routing (MoE-R) module that adaptively fuses multi-scale features by dynamically selecting optimal experts via Top-K sparse weights; and (3) a Spatial-Spectral Tokens Enhancer (SSTE) module that ensures causality-preserving interactions between spectral bands and spatial contexts. Extensive experiments on the Indian Pines, Houston 2013, and WHU-Hi-LongKou datasets demonstrate the superiority of HyperTransXNet. Full article

(This article belongs to the Special Issue AI-Driven Hyperspectral Remote Sensing of Atmosphere and Land)

► Show Figures

Figure 1

28 pages, 114336 KiB

Open AccessArticle

Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

by Qiyuan Zhang, Xiaodan Zhang, Chen Quan, Tong Zhao, Wei Huo and Yuanchen Huang

Remote Sens. 2025, 17(13), 2135; https://doi.org/10.3390/rs17132135 - 21 Jun 2025

Viewed by 578

Abstract

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers [...] Read more.

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers are hard to scale due to quadratic computational complexity and high memory consumption. To address these challenges, this study introduces an end-to-end remote sensing image spatiotemporal fusion approach based on the Mamba architecture (Mamba-spatiotemporal fusion model, Mamba-STFM), marking the first application of Mamba in this domain and presenting a novel paradigm for spatiotemporal fusion model design. Mamba-STFM consists of a feature extraction encoder and a feature fusion decoder. At the core of the encoder is the visual state space-FuseCore-AttNet block (VSS-FCAN block), which deeply integrates linear complexity cross-scan global perception with a channel attention mechanism, significantly reducing quadratic-level computation and memory overhead while improving inference throughput through parallel scanning and kernel fusion techniques. The decoder’s core is the spatiotemporal mixture-of-experts fusion module (STF-MoE block), composed of our novel spatial expert and temporal expert modules. The spatial expert adaptively adjusts channel weights to optimize spatial feature representation, enabling precise alignment and fusion of multi-resolution images, while the temporal expert incorporates a temporal squeeze-and-excitation mechanism and selective state space model (SSM) techniques to efficiently capture short-range temporal dependencies, maintain linear sequence modeling complexity, and further enhance overall spatiotemporal fusion throughput. Extensive experiments on public datasets demonstrate that Mamba-STFM outperforms existing methods in fusion quality; ablation studies validate the effectiveness of each core module; and efficiency analyses and application comparisons further confirm the model’s superior performance. Full article

► Show Figures

Figure 1

37 pages, 2359 KiB

Open AccessArticle

CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts

by Axel Gedeon Mengara Mengara and Yeon-kug Moon

Mathematics 2025, 13(12), 1907; https://doi.org/10.3390/math13121907 - 7 Jun 2025

Cited by 1 | Viewed by 1016

Abstract

Multimodal emotion recognition faces substantial challenges due to the inherent heterogeneity of data sources, each with its own temporal resolution, noise characteristics, and potential for incompleteness. For example, physiological signals, audio features, and textual data capture complementary yet distinct aspects of emotion, requiring [...] Read more.

Multimodal emotion recognition faces substantial challenges due to the inherent heterogeneity of data sources, each with its own temporal resolution, noise characteristics, and potential for incompleteness. For example, physiological signals, audio features, and textual data capture complementary yet distinct aspects of emotion, requiring specialized processing to extract meaningful cues. These challenges include aligning disparate modalities, handling varying levels of noise and missing data, and effectively fusing features without diluting critical contextual information. In this work, we propose a novel Mixture of Experts (MoE) framework that addresses these challenges by integrating specialized transformer-based sub-expert networks, a dynamic gating mechanism with sparse Top-k activation, and a cross-modal attention module. Each modality is processed by multiple dedicated sub-experts designed to capture intricate temporal and contextual patterns, while the dynamic gating network selectively weights the contributions of the most relevant experts. Our cross-modal attention module further enhances the integration by facilitating precise exchange of information among modalities, thereby reinforcing robustness in the presence of noisy or incomplete data. Additionally, an auxiliary diversity loss encourages expert specialization, ensuring the fused representation remains highly discriminative. Extensive theoretical analysis and rigorous experiments on benchmark datasets—the Korean Emotion Multimodal Database (KEMDy20) and the ASCERTAIN dataset—demonstrate that our approach significantly outperforms state-of-the-art methods in emotion recognition, setting new performance baselines in affective computing. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

21 pages, 2226 KiB

Open AccessArticle

Research on Hybrid Collaborative Development Model Based on Multi-Dimensional Behavioral Information

by Shuanliang Gao, Wei Liao, Tao Shu, Zhuoning Zhao and Yaqiang Wang

Appl. Sci. 2025, 15(9), 4907; https://doi.org/10.3390/app15094907 - 28 Apr 2025

Viewed by 589

Abstract

This paper aims to propose a hybrid collaborative development model based on multi-dimensional behavioral information (HCDMB) to deal with systemic problems in modern software engineering, such as the low efficiency of cross-stage collaboration, the fragmentation of the intelligent tool chain, and the imperfect [...] Read more.

This paper aims to propose a hybrid collaborative development model based on multi-dimensional behavioral information (HCDMB) to deal with systemic problems in modern software engineering, such as the low efficiency of cross-stage collaboration, the fragmentation of the intelligent tool chain, and the imperfect human–machine collaboration mechanism. This paper focuses on the stages of requirements analysis, software development, software testing and software operation and maintenance in the process of software development. By integrating the multi-dimensional characteristics of the development behavior track, collaboration interaction record and product application data in the process of project promotion, the mixture of experts (MoE) model is introduced to break through the rigid constraints of the traditional tool chain. Reinforcement learning combined with human feedback is used to optimize the MoE dynamic routing mechanism. At the same time, the few-shot context learning method is used to build different expert models, which further improve the reasoning efficiency and knowledge transfer ability of the system in different scenarios. The HCDMB model proposed in this paper can be viewed as an important breakthrough in the software engineering collaboration paradigm, so as to provide innovative solutions to the many problems faced by dynamic requirements and diverse scenarios based on artificial intelligence technology in the field of software engineering involving different project personnel. Full article

(This article belongs to the Special Issue Artificial Intelligence in Software Engineering)

► Show Figures

Figure 1

14 pages, 2006 KiB

Open AccessArticle

MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model

by Dashun Zheng, Jiaxuan Li, Yunchu Yang, Yapeng Wang and Patrick Cheong-Iao Pang

Appl. Sci. 2024, 14(14), 6171; https://doi.org/10.3390/app14146171 - 16 Jul 2024

Cited by 8 | Viewed by 3308

Abstract

Natural language-processing tasks have been improved greatly by large language models (LLMs). However, numerous parameters make their execution computationally expensive and difficult on resource-constrained devices. For this problem, as well as maintaining accuracy, some techniques such as distillation and quantization have been proposed. [...] Read more.

Natural language-processing tasks have been improved greatly by large language models (LLMs). However, numerous parameters make their execution computationally expensive and difficult on resource-constrained devices. For this problem, as well as maintaining accuracy, some techniques such as distillation and quantization have been proposed. Unfortunately, current methods fail to integrate model pruning with downstream tasks and overlook sentence-level semantic modeling, resulting in reduced efficiency of distillation. To alleviate these limitations, we propose a novel distilled lightweight model for BERT named MicroBERT. This method can transfer the knowledge contained in the “teacher” BERT model to a “student” BERT model. The sentence-level feature alignment loss (FAL) distillation mechanism, guided by Mixture-of-Experts (MoE), captures comprehensive contextual semantic knowledge from the “teacher” model to enhance the “student” model’s performance while reducing its parameters. To make the outputs of “teacher” and “student” models comparable, we introduce the idea of a generative adversarial network (GAN) to train a discriminator. Our experimental results based on four datasets show that all steps of our distillation mechanism are effective, and the MicroBERT (101.14%) model outperforms TinyBERT (99%) by 2.24% in terms of average distillation reductions in various tasks on the GLUE dataset. Full article

(This article belongs to the Special Issue New Trends in Natural Language Processing)

► Show Figures

Figure 1

26 pages, 2339 KiB

Open AccessArticle

Switching Self-Attention Text Classification Model with Innovative Reverse Positional Encoding for Right-to-Left Languages: A Focus on Arabic Dialects

by Laith H. Baniata and Sangwoo Kang

Mathematics 2024, 12(6), 865; https://doi.org/10.3390/math12060865 - 15 Mar 2024

Cited by 4 | Viewed by 2265

Abstract

Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, [...] Read more.

Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, particularly when it comes to delivering top-notch performance and crafting strong latent feature representations. This challenge is more pronounced in the context of smaller datasets and is particularly acute for under-resourced languages such as Arabic. In light of these challenges, this study introduces a novel methodology for text classification of Arabic texts. This method harnesses the newly developed Reverse Positional Encoding (RPE) technique. It adopts an inductive-transfer learning (ITL) framework combined with a switching self-attention shared encoder, thereby increasing the model’s adaptability and improving its sentence representation accuracy. The integration of Mixture of Experts (MoE) and RPE techniques empowers the model to process longer sequences more effectively. This enhancement is notably beneficial for Arabic text classification, adeptly supporting both the intricate five-point and the simpler ternary classification tasks. The empirical evidence points to its outstanding performance, achieving accuracy rates of 87.20% for the HARD dataset, 72.17% for the BRAD dataset, and 86.89% for the LABR dataset, as evidenced by the assessments conducted on these datasets. Full article

(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)

► Show Figures

Figure 1

25 pages, 1187 KiB

Open AccessArticle

Switch-Transformer Sentiment Analysis Model for Arabic Dialects That Utilizes a Mixture of Experts Mechanism

by Laith H. Baniata and Sangwoo Kang

Mathematics 2024, 12(2), 242; https://doi.org/10.3390/math12020242 - 11 Jan 2024

Cited by 7 | Viewed by 2610

Abstract

In recent years, models such as the transformer have demonstrated impressive capabilities in the realm of natural language processing. However, these models are known for their complexity and the substantial training they require. Furthermore, the self-attention mechanism within the transformer, designed to capture [...] Read more.

In recent years, models such as the transformer have demonstrated impressive capabilities in the realm of natural language processing. However, these models are known for their complexity and the substantial training they require. Furthermore, the self-attention mechanism within the transformer, designed to capture semantic relationships among words in sequences, faces challenges when dealing with short sequences. This limitation hinders its effectiveness in five-polarity Arabic sentiment analysis (SA) tasks. The switch-transformer model has surfaced as a potential substitute. Nevertheless, when employing one-task learning for their training, these models frequently face challenges in presenting exceptional performances and encounter issues when producing resilient latent feature representations, particularly in the context of small-size datasets. This challenge is particularly prominent in the case of the Arabic dialect, which is recognized as a low-resource language. In response to these constraints, this research introduces a novel method for the sentiment analysis of Arabic text. This approach leverages multi-task learning (MTL) in combination with the switch-transformer shared encoder to enhance model adaptability and refine sentence representations. By integrating a mixture of experts (MoE) technique that breaks down the problem into smaller, more manageable sub-problems, the model becomes skilled in managing extended sequences and intricate input–output relationships, thereby benefiting both five-point and three-polarity Arabic sentiment analysis tasks. The proposed model effectively identifies sentiment in Arabic dialect sentences. The empirical results underscore its exceptional performance, with accuracy rates reaching 84.02% for the HARD dataset, 67.89% for the BRAD dataset, and 83.91% for the LABR dataset, as demonstrated by the evaluations conducted on these datasets. Full article

(This article belongs to the Special Issue New Machine Learning and Deep Learning Techniques in Natural Language Processing)

► Show Figures

Figure 1

17 pages, 4681 KiB

Open AccessArticle

CloudY-Net: A Deep Convolutional Neural Network Architecture for Joint Segmentation and Classification of Ground-Based Cloud Images

by Feiyang Hu, Beiping Hou, Wen Zhu, Yuzhen Zhu and Qinlong Zhang

Atmosphere 2023, 14(9), 1405; https://doi.org/10.3390/atmos14091405 - 6 Sep 2023

Cited by 3 | Viewed by 1955

Abstract

Ground-based cloud images contain a wealth of cloud information and are an important part of meteorological research. However, in practice, ground cloud images must be segmented and classified to obtain the cloud volume, cloud type and cloud coverage. Existing methods ignore the relationship [...] Read more.

Ground-based cloud images contain a wealth of cloud information and are an important part of meteorological research. However, in practice, ground cloud images must be segmented and classified to obtain the cloud volume, cloud type and cloud coverage. Existing methods ignore the relationship between cloud segmentation and classification, and usually only one of these is studied. Accordingly, our paper proposes a novel method for the joint classification and segmentation of cloud images, called CloudY-Net. Compared to the basic Y-Net framework, which extracts feature maps from the central layer, we extract feature maps from four different layers to obtain more useful information to improve the classification accuracy. These feature maps are combined to produce a feature vector to train the classifier. Additionally, the multi-head self-attention mechanism is implemented during the fusion process to enhance the information interaction among features further. A new module called Cloud Mixture-of-Experts (C-MoE) is proposed to enable the weights of each feature layer to be automatically learned by the model, thus improving the quality of the fused feature representation. Correspondingly, experiments are conducted on the open multi-modal ground-based cloud dataset (MGCD). The results demonstrate that the proposed model significantly improves the classification accuracy compared to classical networks and state-of-the-art algorithms, with classification accuracy of 88.58%. In addition, we annotate 4000 images in the MGCD for cloud segmentation and produce a cloud segmentation dataset called MGCD-Seg. Then, we obtain a 96.55 mIoU on MGCD-Seg, validating the efficacy of our method in ground-based cloud imagery segmentation and classification. Full article

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

► Show Figures

Figure 1

17 pages, 1378 KiB

Open AccessArticle

Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation Learning

by Gerui Wang and Sheng Tang

Electronics 2023, 12(9), 2085; https://doi.org/10.3390/electronics12092085 - 3 May 2023

Viewed by 2109

Abstract

Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This [...] Read more.

Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This approach encompasses two key concepts: discriminative representations and semantic-relevant representations. “Semantic-relevant” facilitates the transfer of semantic knowledge using pre-defined semantic descriptors, while “discriminative” is crucial for accurate category discrimination. However, these two concepts are arguably inherently conflicting, as semantic descriptors are not specifically designed for image classification. Existing methods often struggle with balancing these two aspects and neglect the conflict between them, leading to suboptimal representation generalization and transferability to unseen classes. To address this issue, we propose a novel partially-shared multi-task representation learning method, termed PS-GZSL, which jointly preserves complementary and sharable knowledge between these two concepts. Specifically, we first propose a novel perspective that treats the learning of discriminative and semantic-relevant representations as optimizing a discrimination task and a visual-semantic alignment task, respectively. Then, to learn more complete and generalizable representations, PS-GZSL explicitly factorizes visual features into task-shared and task-specific representations and introduces two advanced tasks: an instance-level contrastive discrimination task and a relation-based visual-semantic alignment task. Furthermore, PS-GZSL employs Mixture-of-Experts (MoE) with a dropout mechanism to prevent representation degeneration and integrates a conditional GAN (cGAN) to synthesize unseen features for estimating unseen visual features. Extensive experiments and more competitive results on five widely-used GZSL benchmark datasets validate the effectiveness of our PS-GZSL. Full article

(This article belongs to the Special Issue Advanced Research and Applications of Deep Learning and Neural Network in Image Recognition)

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI