MDPI - Publisher of Open Access Journals

16 pages, 4587 KiB

Open AccessArticle

FAMNet: A Lightweight Stereo Matching Network for Real-Time Depth Estimation in Autonomous Driving

by Jingyuan Zhang, Qiang Tong, Na Yan and Xiulei Liu

Symmetry 2025, 17(8), 1214; https://doi.org/10.3390/sym17081214 - 1 Aug 2025

Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods [...] Read more.

Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods often sacrifice accuracy or generalization capability. To address these challenges, we propose FAMNet (Fusion Attention Multi-Scale Network), a lightweight and generalizable stereo matching framework tailored for real-time depth estimation in autonomous driving applications. FAMNet consists of two novel modules: Fusion Attention-based Cost Volume (FACV) and Multi-scale Attention Aggregation (MAA). FACV constructs a compact yet expressive cost volume by integrating multi-scale correlation, attention-guided feature fusion, and channel reweighting, thereby reducing reliance on heavy 3D convolutions. MAA further enhances disparity estimation by fusing multi-scale contextual cues through pyramid-based aggregation and dual-path attention mechanisms. Extensive experiments on the KITTI 2012 and KITTI 2015 benchmarks demonstrate that FAMNet achieves a favorable trade-off between accuracy, efficiency, and generalization. On KITTI 2015, with the incorporation of FACV and MAA, the prediction accuracy of the baseline model is improved by 37% and 38%, respectively, and a total improvement of 42% is achieved by our final model. These results highlight FAMNet’s potential for practical deployment in resource-constrained autonomous driving systems requiring real-time and reliable depth perception. Full article

(This article belongs to the Special Issue Computer Vision, Pattern Recognition, Machine Learning, and Symmetry, 2nd Edition)

► Show Figures

Figure 1

19 pages, 3397 KiB

Open AccessArticle

FEMNet: A Feature-Enriched Mamba Network for Cloud Detection in Remote Sensing Imagery

by Weixing Liu, Bin Luo, Jun Liu, Han Nie and Xin Su

Remote Sens. 2025, 17(15), 2639; https://doi.org/10.3390/rs17152639 - 30 Jul 2025

Viewed by 77

Abstract

Accurate and efficient cloud detection is critical for maintaining the usability of optical remote sensing imagery, particularly in large-scale Earth observation systems. In this study, we propose FEMNet, a lightweight dual-branch network that combines state space modeling with convolutional encoding for multi-class cloud [...] Read more.

Accurate and efficient cloud detection is critical for maintaining the usability of optical remote sensing imagery, particularly in large-scale Earth observation systems. In this study, we propose FEMNet, a lightweight dual-branch network that combines state space modeling with convolutional encoding for multi-class cloud segmentation. The Mamba-based encoder captures long-range semantic dependencies with linear complexity, while a parallel CNN path preserves spatial detail. To address the semantic inconsistency across feature hierarchies and limited context perception in decoding, we introduce the following two targeted modules: a cross-stage semantic enhancement (CSSE) block that adaptively aligns low- and high-level features, and a multi-scale context aggregation (MSCA) block that integrates contextual cues at multiple resolutions. Extensive experiments on five benchmark datasets demonstrate that FEMNet achieves state-of-the-art performance across both binary and multi-class settings, while requiring only 4.4M parameters and 1.3G multiply–accumulate operations. These results highlight FEMNet’s suitability for resource-efficient deployment in real-world remote sensing applications. Full article

(This article belongs to the Special Issue Intelligent Image Analysis: Advancing Remote Sensing with Artificial Intelligence)

► Show Figures

Figure 1

35 pages, 7934 KiB

Open AccessArticle

Analyzing Diagnostic Reasoning of Vision–Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering

by Fatema Tuj Johora Faria, Laith H. Baniata, Ahyoung Choi and Sangwoo Kang

Mathematics 2025, 13(14), 2322; https://doi.org/10.3390/math13142322 - 21 Jul 2025

Viewed by 621

Abstract

Medical Visual Question Answering (MedVQA) lies at the intersection of computer vision, natural language processing, and clinical decision-making, aiming to generate accurate responses from medical images paired with complex inquiries. Despite recent advances in vision–language models (VLMs), their use in healthcare remains limited [...] Read more.

Medical Visual Question Answering (MedVQA) lies at the intersection of computer vision, natural language processing, and clinical decision-making, aiming to generate accurate responses from medical images paired with complex inquiries. Despite recent advances in vision–language models (VLMs), their use in healthcare remains limited by a lack of interpretability and a tendency to produce direct, unexplainable outputs. This opacity undermines their reliability in medical settings, where transparency and justification are critically important. To address this limitation, we propose a zero-shot chain-of-thought prompting framework that guides VLMs to perform multi-step reasoning before arriving at an answer. By encouraging the model to break down the problem, analyze both visual and contextual cues, and construct a stepwise explanation, the approach makes the reasoning process explicit and clinically meaningful. We evaluate the framework on the PMC-VQA benchmark, which includes authentic radiological images and expert-level prompts. In a comparative analysis of three leading VLMs, Gemini 2.5 Pro achieved the highest accuracy (72.48%), followed by Claude 3.5 Sonnet (69.00%) and GPT-4o Mini (67.33%). The results demonstrate that chain-of-thought prompting significantly improves both reasoning transparency and performance in MedVQA tasks. Full article

(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)

► Show Figures

Figure 1

22 pages, 1342 KiB

Open AccessArticle

Multi-Scale Attention-Driven Hierarchical Learning for Fine-Grained Visual Categorization

by Zhihuai Hu, Rihito Kojima and Xian-Hua Han

Electronics 2025, 14(14), 2869; https://doi.org/10.3390/electronics14142869 - 18 Jul 2025

Viewed by 262

Abstract

Fine-grained visual categorization (FGVC) presents significant challenges due to subtle inter-class variation and significant intra-class diversity, often leading to limited discriminative capacity in global representations. Existing methods inadequately capture localized, class-relevant features across multiple semantic levels, especially under complex spatial configurations. To address [...] Read more.

Fine-grained visual categorization (FGVC) presents significant challenges due to subtle inter-class variation and significant intra-class diversity, often leading to limited discriminative capacity in global representations. Existing methods inadequately capture localized, class-relevant features across multiple semantic levels, especially under complex spatial configurations. To address these challenges, we introduce a Multi-scale Attention-driven Hierarchical Learning (MAHL) framework that iteratively refines feature representations via scale-adaptive attention mechanisms. Specifically, fully connected (FC) classifiers are applied to spatially pooled feature maps at multiple network stages to capture global semantic context. The learned FC weights are then projected onto the original high-resolution feature maps to compute spatial contribution scores for the predicted class, serving as attention cues. These multi-scale attention maps guide the selection of discriminative regions, which are hierarchically integrated into successive training iterations to reinforce both global and local contextual dependencies. Moreover, we explore a generalized pooling operation that parametrically fuses average and max pooling, enabling richer contextual retention in the encoded features. Comprehensive evaluations on benchmark FGVC datasets demonstrate that MAHL consistently outperforms state-of-the-art methods, validating its efficacy in learning robust, class-discriminative, high-resolution representations through attention-guided hierarchical refinement. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)

► Show Figures

Figure 1

20 pages, 3941 KiB

Open AccessArticle

AΚtransU-Net: Transformer-Equipped U-Net Model for Improved Actinic Keratosis Detection in Clinical Photography

by Panagiotis Derekas, Charalampos Theodoridis, Aristidis Likas, Ioannis Bassukas, Georgios Gaitanis, Athanasia Zampeta, Despina Exadaktylou and Panagiota Spyridonos

Diagnostics 2025, 15(14), 1752; https://doi.org/10.3390/diagnostics15141752 - 10 Jul 2025

Viewed by 396

Abstract

Background: Integrating artificial intelligence into clinical photography offers great potential for monitoring skin conditions such as actinic keratosis (AK) and skin field cancerization. Identifying the extent of AK lesions often requires more than analyzing lesion morphology—it also depends on contextual cues, such as [...] Read more.

Background: Integrating artificial intelligence into clinical photography offers great potential for monitoring skin conditions such as actinic keratosis (AK) and skin field cancerization. Identifying the extent of AK lesions often requires more than analyzing lesion morphology—it also depends on contextual cues, such as surrounding photodamage. This highlights the need for models that can combine fine-grained local features with a comprehensive global view. Methods: To address this challenge, we propose AKTransU-net, a hybrid U-net-based architecture. The model incorporates Transformer blocks to enrich feature representations, which are passed through ConvLSTM modules within the skip connections. This configuration allows the network to maintain semantic coherence and spatial continuity in AK detection. This global awareness is critical when applying the model to whole-image detection via tile-based processing, where continuity across tile boundaries is essential for accurate and reliable lesion segmentation. Results: The effectiveness of AKTransU-net was demonstrated through comparative evaluations with state-of-the-art segmentation models. A proprietary annotated dataset of 569 clinical photographs from 115 patients with actinic keratosis was used to train and evaluate the models. From each photograph, crops of 512 × 512 pixels were extracted using translation lesion boxes that encompassed lesions in different positions and captured different contexts. AKtransU-net exhibited a more robust context awareness and achieved a median Dice score of 65.13%, demonstrating significant progress in whole-image assessments. Conclusions: Transformer-driven context modeling offers a promising approach for robust AK lesion monitoring, supporting its application in real-world clinical settings where accurate, context-aware analysis is crucial for managing skin field cancerization. Full article

(This article belongs to the Special Issue Artificial Intelligence in Dermatology)

► Show Figures

Figure 1

30 pages, 34072 KiB

Open AccessArticle

ARE-PaLED: Augmented Reality-Enhanced Patch-Level Explainable Deep Learning System for Alzheimer’s Disease Diagnosis from 3D Brain sMRI

by Chitrakala S and Bharathi U

Symmetry 2025, 17(7), 1108; https://doi.org/10.3390/sym17071108 - 10 Jul 2025

Viewed by 382

Abstract

Structural magnetic resonance imaging (sMRI) is a vital tool for diagnosing neurological brain diseases. However, sMRI scans often show significant structural changes only in limited brain regions due to localised atrophy, making the identification of discriminative features a key challenge. Importantly, the human [...] Read more.

Structural magnetic resonance imaging (sMRI) is a vital tool for diagnosing neurological brain diseases. However, sMRI scans often show significant structural changes only in limited brain regions due to localised atrophy, making the identification of discriminative features a key challenge. Importantly, the human brain exhibits inherent bilateral symmetry, and deviations from this symmetry—such as asymmetric atrophy—are strong indicators of early Alzheimer’s disease (AD). Patch-based methods help capture local brain changes for early AD diagnosis, but they often struggle with fixed-size limitations, potentially missing subtle asymmetries or broader contextual cues. To address these limitations, we propose a novel augmented reality (AR)-enhanced patch-level explainable deep learning (ARE-PaLED) system. It includes an adaptive multi-scale patch extraction network (AMPEN) to adjust patch sizes based on anatomical characteristics and spatial context, as well as an informative patch selection algorithm (IPSA) to identify discriminative patches, including those reflecting asymmetry patterns associated with AD; additionally, an AR module is proposed for future immersive explainability, complementing the patch-level interpretation framework. Evaluated on 1862 subjects from the ADNI and AIBL datasets, the framework achieved an accuracy of 92.5% (AD vs. NC) and 85.9% (AD vs. MCI). The proposed ARE-PaLED demonstrates potential as an interpretable and immersive diagnostic aid for sMRI-based AD diagnosis, supporting the interpretation of model predictions for AD diagnosis. Full article

► Show Figures

Figure 1

23 pages, 988 KiB

Open AccessArticle

The Influence of Spatial Distance and Trade-Off Salience on Ethical Decision-Making: An Eye-Tracking Study Based on Embodied Cognition

by Yu Yang, Yirui Li, Qingsong Lin and Xuejun Bai

Behav. Sci. 2025, 15(7), 911; https://doi.org/10.3390/bs15070911 - 4 Jul 2025

Viewed by 354

Abstract

Research based on the theory of embodied cognition has revealed that the vertical position of target information in space influences individuals’ construal level, which in turn affects their ethical decision-making. However, previous studies have shown inconsistent effects of construal level on ethical decision-making, [...] Read more.

Research based on the theory of embodied cognition has revealed that the vertical position of target information in space influences individuals’ construal level, which in turn affects their ethical decision-making. However, previous studies have shown inconsistent effects of construal level on ethical decision-making, which may be moderated by factors such as the manipulation methods of construal level and the salience of trade-offs. This study examines how manipulating the vertical position (high/low) of target information in space—thereby altering perceived spatial distance—impacts ethical decision-making through the lens of embodied cognition, using eye-tracking technology. Experiment 1 isolated the effect of target verticality, while Experiment 2 introduced trade-off salience as an additional factor. Eye-tracking metrics in Experiment 1 revealed that lower target positions significantly increased late-stage cognitive processing difficulty. Experiment 2 demonstrated an interaction between target position and trade-off salience in ethical decision-making. These findings suggest that spatial positioning influences cognitive processing via construal level, with its effects on ethical decision-making moderated by trade-off cues. In summary, this study reveals the significant influence of trade-off salience as a contextual cue in individuals’ ethical decision-making while also providing an embodied cognition perspective to inform decision behavior in human–computer interaction contexts. Full article

(This article belongs to the Section Cognition)

► Show Figures

Figure 1

28 pages, 642 KiB

Open AccessArticle

Contextual Emotions in Organizations: A Latent Profile Analysis of Their Co-Occurrence and Their Effects on Employee Well-Being

by Laura Petitta, Lixin Jiang and Valerio Ghezzi

Eur. J. Investig. Health Psychol. Educ. 2025, 15(7), 122; https://doi.org/10.3390/ejihpe15070122 - 2 Jul 2025

Viewed by 354

Abstract

Workplace contextual emotions are structured ways of emotionally thinking about specific cues in the context that employees share within their organization. These dynamics reflect how employees emotionally interpret and respond to organizational environments. Contextual emotions may shape working relationships into different types of [...] Read more.

Workplace contextual emotions are structured ways of emotionally thinking about specific cues in the context that employees share within their organization. These dynamics reflect how employees emotionally interpret and respond to organizational environments. Contextual emotions may shape working relationships into different types of toxic emotional dynamics (e.g., claiming, controlling, distrusting, provoking) or, conversely, positive emotional dynamics (i.e., exchanging), thus setting the emotional tone that affects employees’ actions and their level of comfort/discomfort. The present study uses latent profile analysis (LPA) to identify subpopulations of employees who may experience differing levels of both positive and negative emotional dynamics (i.e., different configurations of emotional patterns of workplace behavior). Moreover, it examines whether the emergent profiles predict work-related (i.e., job satisfaction, burnout) and health-related outcomes (i.e., sleep disturbances, physical and mental health). Using data from 801 Italian employees, we identified four latent profiles: “functional dynamics” (low toxic emotions and high exchange), “dialectical dynamics” (co-existence of medium toxic emotions and medium exchange), “mild dysfunctional dynamics” (moderately high toxic emotions and low exchange), and “highly dysfunctional dynamics” (extremely high toxic emotions and extremely low exchange). Moreover, employees in the dialectical, mild dysfunctional, and highly dysfunctional groups reported progressively higher levels of poor health outcomes and progressively lower levels of satisfaction, whereas the functional group was at low risk of stress and was the most satisfied group. The theoretical and practical implications of the LPA-classified emotional patterns of workplace behavior are discussed in light of the relevance of identifying vulnerable subpopulations of employees diversely exposed to toxic configurations of emotional/relational ambience. Full article

(This article belongs to the Special Issue Occupational Health Challenges: Mapping Psychosocial Factors Driving Healthy Organizations)

► Show Figures

Figure 1

24 pages, 595 KiB

Open AccessArticle

An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection

by Yexin Tian, Shuo Xu, Yuchen Cao, Zhongyan Wang and Zijing Wei

Mathematics 2025, 13(13), 2086; https://doi.org/10.3390/math13132086 - 25 Jun 2025

Viewed by 497

Abstract

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention [...] Read more.

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention to interpretability and computational tradeoffs—remain underexplored. In this study, we systematically evaluate the mathematical foundations and empirical performance of five representative models for automated fake news classification: three classical machine learning algorithms (Logistic Regression, Random Forest, and Light Gradient Boosting Machine) and two state-of-the-art deep learning architectures (A Lite Bidirectional Encoder Representations from Transformers—ALBERT and Gated Recurrent Units—GRUs). Leveraging the large-scale WELFake dataset, we conduct rigorous experiments under both headline-only and headline-plus-content input scenarios, providing a comprehensive assessment of each model’s capability to capture linguistic, contextual, and semantic cues. We analyze each model’s optimization framework, decision boundaries, and feature importance mechanisms, highlighting the empirical tradeoffs between representational capacity, generalization, and interpretability. Our results show that transformer-based models, especially ALBERT, achieve state-of-the-art performance (macro F1 up to 0.99) with rich context, while classical ensembles remain viable for constrained settings. These findings directly inform practical fake news detection. Full article

(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)

► Show Figures

Figure 1

16 pages, 1058 KiB

Open AccessArticle

Multi-Scale Context Enhancement Network with Local–Global Synergy Modeling Strategy for Semantic Segmentation on Remote Sensing Images

by Qibing Ma, Hongning Liu, Yifan Jin and Xinyue Liu

Electronics 2025, 14(13), 2526; https://doi.org/10.3390/electronics14132526 - 21 Jun 2025

Viewed by 310

Abstract

Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views [...] Read more.

Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views (e.g., indistinct boundaries, ambiguous textures, and low contrast) significantly complicates local–global information modeling and results in blurred boundaries and classification errors in model predictions. To address this issue, in this paper, we proposed a novel Multi-Scale Local–Global Mamba Feature Pyramid Network (MLMFPN) through designing a local–global information synergy modeling strategy, and guided and enhanced the cross-scale contextual information interaction in the feature fusion process to obtain quality semantic features to be used as cues for precise semantic reasoning. The proposed MLMFPN comprises two core components: Local–Global Align Mamba Fusion (LGAMF) and Context-Aware Cross-attention Interaction Module (CCIM). Specifically, LGAMF designs a local-enhanced global information modeling through asymmetric convolution for synergistic modeling of the receptive fields in vertical and horizontal directions, and further introduces the Vision Mamba structure to facilitate local–global information fusion. CCIM introduces positional encoding and cross-attention mechanisms to enrich the global-spatial semantics representation during multi-scale context information interaction, thereby achieving refined segmentation. The proposed methods are evaluated on the ISPRS Potsdam and Vaihingen datasets and the outperformance in the results verifies the effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Artificial Intelligence and Pattern Recognition for Intelligent Systems)

► Show Figures

Figure 1

20 pages, 481 KiB

Open AccessArticle

Understanding Ecotourism Decisions Through Dual-Process Theory: A Feature-Based Model from a Rural Region of Türkiye

by Kübra Karaman

Sustainability 2025, 17(13), 5701; https://doi.org/10.3390/su17135701 - 20 Jun 2025

Viewed by 350

Abstract

Grounded in information processing theory, this study explores how ecotourism decisions were formed within the rural district of Akdağmadeni (Türkiye), integrating both heuristic and systematic decision-making processes. The research adopts a two-phase mixed-methods design: First, it employs a survey-based factorial analysis involving 383 [...] Read more.

Grounded in information processing theory, this study explores how ecotourism decisions were formed within the rural district of Akdağmadeni (Türkiye), integrating both heuristic and systematic decision-making processes. The research adopts a two-phase mixed-methods design: First, it employs a survey-based factorial analysis involving 383 participants to examine preferences for nature-based activities such as trekking, cycling, and cultural tourism. Second, it uses in-depth interviews to investigate participants’ strategic evaluations of local landscape and heritage assets. The results reveal that individuals flexibly switch between intuitive and analytical judgments based on contextual factors. Key decision drivers identified include alignment with local development, ecological integrity, and socioeconomic contribution. This dual-process insight is operationalized through a novel “feature-based evaluation model” that synthesizes landscape identity values with cognitive-perceptual cues, providing a new lens for assessing geoheritage-based tourism behavior. It was determined that participants used both intuitive and systematic information processing strategies in their decision-making processes, and factors such as harmony with nature, economic contribution, and local identity were found to affect preferences. The study draws attention to the need to develop sustainable tourism policies, raise public awareness, and support infrastructure investments, and provides a road map for the effective use of the region’s ecotourism potential. Full article

(This article belongs to the Topic Human–Environmental Relations: Ecotourism and Sustainability)

► Show Figures

Figure 1

23 pages, 3843 KiB

Open AccessArticle

ApoE Isoform-Dependent Effects on Extinction of Contextual Fear Memory and Passive Avoidance Memory

by Elizabeth Saltonstall, Alexandra Pederson, Abigail O’Niel, Sarah Holden, Kat Kessler, Eileen Ruth Samson Torres and Jacob Raber

Int. J. Mol. Sci. 2025, 26(12), 5820; https://doi.org/10.3390/ijms26125820 - 17 Jun 2025

Viewed by 433

Abstract

Following exposure to trauma, avoidance behavior can be protective but also contribute to severe symptoms and interfere with exposure-based therapy. Extinction of fear conditioning by exposure to the same environment or environmental cues that were present during the initial traumatic event but without [...] Read more.

Following exposure to trauma, avoidance behavior can be protective but also contribute to severe symptoms and interfere with exposure-based therapy. Extinction of fear conditioning by exposure to the same environment or environmental cues that were present during the initial traumatic event but without including the aversive stimulus or stimuli is often used to study post-traumatic stress disorder (PTSD), a condition characterized by an inability to suppress conditioned fear responses. A limitation of this paradigm is that one cannot avoid the context or cues associated with the initial traumatic event. In contrast, in the passive avoidance test, one can escape the environment associated with the aversive stimulus. Genetic factors might modulate the ability to extinguish fear memory. In this study, we compared the effects of distinct human apoE isoforms on the extinction of contextual fear and passive avoidance memory, as well as on subsequent activity levels, depressive-like behavior, and hippocampal levels of tau, in targeted replacement mice. Full article

(This article belongs to the Special Issue Molecular Advances in Mental Health and Disorders)

► Show Figures

Figure 1

20 pages, 2799 KiB

Open AccessArticle

Section Recommendation of Online Medical Platform Based on Keyword Expansion with Self-Adaptive-Attention-Prompt-BERT-RCNN Modeling

by Tianbao Xie, Yuqi Han, Ganglong Duan, Siyu Yang, Shaoyang Zhang and Yongcheng Shao

Appl. Sci. 2025, 15(12), 6746; https://doi.org/10.3390/app15126746 - 16 Jun 2025

Viewed by 320

Abstract

Background: Implementing automatic classification of short texts in online healthcare platforms is crucial to increase the efficiency of their services and improve the user experience. A short text classification method combining the keyword expansion technique and a deep learning model is constructed to [...] Read more.

Background: Implementing automatic classification of short texts in online healthcare platforms is crucial to increase the efficiency of their services and improve the user experience. A short text classification method combining the keyword expansion technique and a deep learning model is constructed to solve the problems of feature sparsity and semantic ambiguity in short text classification. Methods: First, we use web crawlers to obtain patient data from the online medical platform “Good Doctor”; then, we use TF-IWF to weight the keyword importance and Word2vec to calculate the keyword similarity to expand the short text features; and then we integrate the cue learning and deep learning models to construct a self-adaptive attention model to solve the problem of sparse features and unclear semantics in short text classification in the adaptive-attention-Prompt-BERT-RCNN model to realize effective classification of medical short texts. Results: Empirical studies show that the classification effect after keyword expansion is significantly higher than that before expansion, the accuracy of the model in classifying medical short texts after expansion is as high as 97.84%, and the model performs well in different categories of medical short texts. Conclusions: The short text expansion methods of TF-IWF and Word2vec make up for the shortcomings of not taking into account the keyword rarity and the contextual information of the subwords, and the model can achieve effective classification of medical short texts by combining them. The model further improves the classification accuracy of short text by integrating Prompt’s bootstrapping, self-adaptive attention’s keyword weight weighting, BERT’s deep semantic understanding, and RCNN’s region awareness and feature extraction; however, the model’s accuracy in individual topics still needs to be improved. The results show that the recommender system can effectively improve the efficiency of patient consultation and support the development of online healthcare. Full article

► Show Figures

Figure 1

37 pages, 2359 KiB

Open AccessArticle

CAG-MoE: Multimodal Emotion Recognition with Cross-Attention Gated Mixture of Experts

by Axel Gedeon Mengara Mengara and Yeon-kug Moon

Mathematics 2025, 13(12), 1907; https://doi.org/10.3390/math13121907 - 7 Jun 2025

Cited by 1 | Viewed by 1016

Abstract

Multimodal emotion recognition faces substantial challenges due to the inherent heterogeneity of data sources, each with its own temporal resolution, noise characteristics, and potential for incompleteness. For example, physiological signals, audio features, and textual data capture complementary yet distinct aspects of emotion, requiring [...] Read more.

Multimodal emotion recognition faces substantial challenges due to the inherent heterogeneity of data sources, each with its own temporal resolution, noise characteristics, and potential for incompleteness. For example, physiological signals, audio features, and textual data capture complementary yet distinct aspects of emotion, requiring specialized processing to extract meaningful cues. These challenges include aligning disparate modalities, handling varying levels of noise and missing data, and effectively fusing features without diluting critical contextual information. In this work, we propose a novel Mixture of Experts (MoE) framework that addresses these challenges by integrating specialized transformer-based sub-expert networks, a dynamic gating mechanism with sparse Top-k activation, and a cross-modal attention module. Each modality is processed by multiple dedicated sub-experts designed to capture intricate temporal and contextual patterns, while the dynamic gating network selectively weights the contributions of the most relevant experts. Our cross-modal attention module further enhances the integration by facilitating precise exchange of information among modalities, thereby reinforcing robustness in the presence of noisy or incomplete data. Additionally, an auxiliary diversity loss encourages expert specialization, ensuring the fused representation remains highly discriminative. Extensive theoretical analysis and rigorous experiments on benchmark datasets—the Korean Emotion Multimodal Database (KEMDy20) and the ASCERTAIN dataset—demonstrate that our approach significantly outperforms state-of-the-art methods in emotion recognition, setting new performance baselines in affective computing. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

24 pages, 6314 KiB

Open AccessArticle

CDFAN: Cross-Domain Fusion Attention Network for Pansharpening

by Jinting Ding, Honghui Xu and Shengjun Zhou

Entropy 2025, 27(6), 567; https://doi.org/10.3390/e27060567 - 27 May 2025

Viewed by 472

Abstract

Pansharpening provides a computational solution to the resolution limitations of imaging hardware by enhancing the spatial quality of low-resolution hyperspectral (LRMS) images using high-resolution panchromatic (PAN) guidance. From an information-theoretic perspective, the task involves maximizing the mutual information between PAN and LRMS inputs [...] Read more.

Pansharpening provides a computational solution to the resolution limitations of imaging hardware by enhancing the spatial quality of low-resolution hyperspectral (LRMS) images using high-resolution panchromatic (PAN) guidance. From an information-theoretic perspective, the task involves maximizing the mutual information between PAN and LRMS inputs while minimizing spectral distortion and redundancy in the fused output. However, traditional spatial-domain methods often fail to preserve high-frequency texture details, leading to entropy degradation in the resulting images. On the other hand, frequency-based approaches struggle to effectively integrate spatial and spectral cues, often neglecting the underlying information content distributions across domains. To address these shortcomings, we introduce a novel architecture, termed the Cross-Domain Fusion Attention Network (CDFAN), specifically designed for the pansharpening task. CDFAN is composed of two core modules: the Multi-Domain Interactive Attention (MDIA) module and the Spatial Multi-Scale Enhancement (SMCE) module. The MDIA module utilizes discrete wavelet transform (DWT) to decompose the PAN image into frequency sub-bands, which are then employed to construct attention mechanisms across both wavelet and spatial domains. Specifically, wavelet-domain features are used to formulate query vectors, while key features are derived from the spatial domain, allowing attention weights to be computed over multi-domain representations. This design facilitates more effective fusion of spectral and spatial cues, contributing to superior reconstruction of high-resolution multispectral (HRMS) images. Complementing this, the SMCE module integrates multi-scale convolutional pathways to reinforce spatial detail extraction at varying receptive fields. Additionally, an Expert Feature Compensator is introduced to adaptively balance contributions from different scales, thereby optimizing the trade-off between local detail preservation and global contextual understanding. Comprehensive experiments conducted on standard benchmark datasets demonstrate that CDFAN achieves notable improvements over existing state-of-the-art pansharpening methods, delivering enhanced spectral–spatial fidelity and producing images with higher perceptual quality. Full article

(This article belongs to the Section Signal and Data Analysis)

► Show Figures

Figure 1

Search Results (111)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (111)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI