MDPI - Publisher of Open Access Journals

24 pages, 3636 KB

Open AccessArticle

VSGN: Visual–Semantic Guided Interaction Network for Multimodal Named Entity Recognition

by Jianjun Yao, Zhikun Zhou, Ruisheng Li, Jiaming Zhang and Zhiwei Qi

Symmetry 2026, 18(5), 769; https://doi.org/10.3390/sym18050769 - 29 Apr 2026

Multimodal Named Entity Recognition (MNER) aims to integrate textual and visual information to identify entities with specific semantic categories. However, existing methods often suffer from insufficient intra-modal semantic modeling, coarse cross-modal alignment, and vulnerability to noisy or ambiguous expressions in social media. To [...] Read more.

Multimodal Named Entity Recognition (MNER) aims to integrate textual and visual information to identify entities with specific semantic categories. However, existing methods often suffer from insufficient intra-modal semantic modeling, coarse cross-modal alignment, and vulnerability to noisy or ambiguous expressions in social media. To address these challenges, we propose a Visual–Semantic Guided Interaction Network (VSGN), which improves multimodal representation learning from both semantic and structural perspectives. Specifically, we first design an adaptive visual–semantic fusion module that incorporates visual descriptions as semantic guidance, enabling more informative cross-modal interactions. To further enhance feature quality, we introduce a deviation-aware channel-wise inhibitory routing (CIR) mechanism, which jointly models channel importance and distributional deviation to suppress noisy or redundant visual signals. In addition, we propose a visual–semantic guided graph structure learning module (VSG), which explicitly captures structural dependencies across modalities. By enforcing distribution-level alignment between textual and visual graph representations, the model achieves structure-aware cross-modal interaction and reduces modality inconsistency. Extensive experiments on the Twitter-2015 and Twitter-2017 datasets demonstrate the effectiveness of the proposed method, achieving F1 scores of 76.72% and 87.86%, respectively. The results show that jointly modeling semantic enhancement and structural alignment leads to more robust and discriminative multimodal representations. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

30 pages, 4542 KB

Open AccessArticle

A Multi-Task Multimodal Attention Graph Convolutional Network for Acoustic–Vibration Fusion-Based Rolling Bearing Fault Diagnosis

by Tong Wang, Yuanyuan Tang, Yibo He and Yinghao Li

Appl. Sci. 2026, 16(9), 4310; https://doi.org/10.3390/app16094310 - 28 Apr 2026

Abstract

Single-sensor-based fault diagnosis of rolling bearings often suffers from noise sensitivity, installation-dependent performance, and incomplete fault characterization. To address these limitations, this paper proposes a multi-task multimodal attention graph convolutional network (MTMAGNet) that integrates acoustic and vibration signals for bearing fault diagnosis. First, [...] Read more.

Single-sensor-based fault diagnosis of rolling bearings often suffers from noise sensitivity, installation-dependent performance, and incomplete fault characterization. To address these limitations, this paper proposes a multi-task multimodal attention graph convolutional network (MTMAGNet) that integrates acoustic and vibration signals for bearing fault diagnosis. First, one-dimensional convolutional neural networks are used to extract modality-specific features. These features are then fused through a multi-modal attention mechanism to exploit the complementary information contained in the two signal sources. Based on the fused representations, a dynamic k-nearest neighbor graph is constructed to model relationships among samples, and a graph convolutional network is employed to learn discriminative structural features. Moreover, a multi-task learning scheme is introduced, in which fault classification serves as the primary task and modal classification is used as an auxiliary task to enhance feature learning and improve model generalization. Experimental results on a self-built acoustic–vibration test bench collected under three rotational speeds (1800 rpm, 2400 rpm, and 3000 rpm) demonstrate that the proposed method achieves high diagnostic accuracy and strong generalization performance under different fault conditions. Full article

► Show Figures

Figure 1

15 pages, 287 KB

Open AccessArticle

Sex-Stratified Machine Learning for the Prediction of Post-COVID Condition: A Longitudinal Cohort Study

by Mikhail I. Krivonosov, Ekaterina Pazukhina, Mikhail Rumyantsev, Elina Abdeeva, Dina Baimukhambetova, Polina Bobkova, Yasmin El-Taravi, Maria Pikuza, Anastasia Trefilova, Aleksandr Zolotarev, Margarita Andreeva, Ekaterina Iakovleva, Nikolay Bulanov, Sergey Avdeev, Alexey Zaikin, Valentina Kapustina, Victor Fomin, Andrey A. Svistunov, Peter Timashev, Janna G. Oganezova, Nina Avdeenko, Yulia Ivanova, Lyudmila Fedorova, Elena Kondrikova, Irina Turina, Petr Glybochko, Denis Butnaru, Oleg Blyuss, Daniel Munblit and Sechenov StopCOVID Research Team Show full author list Hide full author list

J. Clin. Med. 2026, 15(9), 3367; https://doi.org/10.3390/jcm15093367 - 28 Apr 2026

Abstract

Background: Post-COVID-19 condition (PCC) affects many survivors, with evidence of sex-specific differences in prevalence and symptom profiles. However, few prediction studies have examined whether sex-stratified models improve prediction or generalize across sexes. This study aimed primarily to develop and compare sex-stratified machine [...] Read more.

Background: Post-COVID-19 condition (PCC) affects many survivors, with evidence of sex-specific differences in prevalence and symptom profiles. However, few prediction studies have examined whether sex-stratified models improve prediction or generalize across sexes. This study aimed primarily to develop and compare sex-stratified machine learning models for PCC prediction using routinely available baseline variables, and secondarily to assess cross-sex generalizability and adversarial robustness. Methods: We analyzed a prospective longitudinal cohort of 1006 adults hospitalized with COVID-19 at Sechenov University Hospital Network (Moscow, Russia). Demographics, smoking status, and pre-existing comorbidities were extracted from medical records, and PCC status was assessed at 6-month follow-up. Machine learning models—including classical algorithms and graph-based neural networks—were trained separately for males and females. Cross-sex validation evaluated generalizability, variable importance aided interpretation, and adversarial perturbations assessed model robustness. Results: PCC prevalence was higher in females (53.9%) than males (39.1%). Overall predictive performance was modest across all models, with AUC values ranging approximately 0.50–0.61. Graph-based models achieved the highest discrimination, with the best AUC reaching approximately 0.61, while classical approaches provided limited predictive value. Cross-sex validation showed minor asymmetry: models trained on male data performed slightly better on female cases than vice versa. Adversarial testing revealed sensitivity of all models to input perturbations. Conclusions: Demographics and comorbidities alone provide insufficient information for reliable PCC prediction. Modest sex-specific differences in model generalizability suggest distinct, sex-associated PCC phenotypes, but richer multimodal data—including clinical biomarkers, wearable-derived measures, and patient-reported outcomes—will be required to develop clinically useful and equitable predictive models. Sex-stratified approaches should be considered in future post-viral syndrome prediction studies. Full article

(This article belongs to the Special Issue Sequelae of COVID-19: Clinical to Prognostic Follow-Up)

26 pages, 1312 KB

Open AccessArticle

Structure-Aware Generative Information Extraction via Feature Space Alignment

by Yuanqing Li, Chen Tao, Baoyu Zhang and Weishan Zhang

Information 2026, 17(5), 409; https://doi.org/10.3390/info17050409 - 24 Apr 2026

Viewed by 172

Abstract

Large language models (LLMs) face difficulties in leveraging the syntactic structures and entity relations embedded in text for long-document information extraction. To address this issue, this paper proposes a generative extraction method integrating heterogeneous topology awareness and spatial alignment. The method first extracts [...] Read more.

Large language models (LLMs) face difficulties in leveraging the syntactic structures and entity relations embedded in text for long-document information extraction. To address this issue, this paper proposes a generative extraction method integrating heterogeneous topology awareness and spatial alignment. The method first extracts syntactic and coreference information to construct a heterogeneous document graph and employs a mixture-of-experts network to decouple and encode multi-type topological features. A component orthogonal projection mechanism and a graph-text contrastive learning strategy are then utilized to align the extracted graph features to the underlying semantic space of the language model with high fidelity. Furthermore, Topology-Aware Encoder compresses the global features into fixed-length structural prompts to guide text generation. Experiments on the ACE2005, WikiEvents, and DuEE datasets demonstrated that the proposed method achieved state-of-the-art performance on information extraction tasks. Consequently, these results suggest that the proposed framework is a promising approach for complex information extraction across base LLMs of different scales. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing, 2nd Edition)

► Show Figures

Figure 1

45 pages, 2083 KB

Open AccessSystematic Review

AI-Driven Breast Cancer Diagnosis: A Systematic Review of Imaging Modalities, Deep Learning, and Explainability

by Margo Sabry, Hossam Magdy Balaha, Khadiga M. Ali, Ali Mahmoud, Dibson Gondim, Mohammed Ghazal, Tayseer Hassan A. Soliman and Ayman El-Baz

Cancers 2026, 18(8), 1305; https://doi.org/10.3390/cancers18081305 - 20 Apr 2026

Viewed by 534

Abstract

Background: This article provides a comprehensive overview of recent advancements in artificial intelligence (AI) and deep-learning technologies for breast cancer (BC) diagnosis across various imaging modalities. Methods: A systematic review was conducted in strict adherence to the PRISMA guidelines, incorporating a [...] Read more.

Background: This article provides a comprehensive overview of recent advancements in artificial intelligence (AI) and deep-learning technologies for breast cancer (BC) diagnosis across various imaging modalities. Methods: A systematic review was conducted in strict adherence to the PRISMA guidelines, incorporating a comparative analysis of 65 peer-reviewed studies published between 2018 and 2024. The evaluation focused on diagnostic performance, architectural developments, and clinical integration strategies. Results: The review synthesizes primary findings on convolutional neural networks (CNNs), emerging architectures including graph neural networks, and hybrid models, with diagnostic accuracy, risk prediction, and personalized screening strategies identified as the leading research domains. Notable achievements include CNNs attaining up to 98.5% accuracy in mammography and Vision Transformers reaching 96% in histopathological analysis. Furthermore, the implementation of explainable AI methodologies, such as SHAP, LIME, and Grad-CAM, is emphasized for maintaining transparency, trust, and accountability in clinical decision-making. Conclusions: AI constitutes a pivotal factor in facilitating early BC diagnosis and optimizing treatment outcomes. Nevertheless, significant challenges persist, including dataset heterogeneity, model generalizability, standardization of imaging protocols, computational resource limitations, and the seamless integration of these technologies into established clinical workflows. Future research must prioritize robust multi-dataset validation and standardized implementation frameworks to overcome existing limitations and advance successful BC diagnostic practices. Full article

(This article belongs to the Section Methods and Technologies Development)

► Show Figures

Figure 1

25 pages, 3630 KB

Open AccessArticle

Modality-Specific Sparse Autoencoders for Efficient Multimodal ICU Alignment: A Symmetry–Asymmetry Learning Framework

by Hashim Ali and Muhammad Tahir Akhtar

Symmetry 2026, 18(4), 677; https://doi.org/10.3390/sym18040677 - 18 Apr 2026

Viewed by 158

Abstract

Intensive care units (ICUs) generate heterogeneous data streams, including structured electronic health records, physiological time series, and medical imaging, that describe the same patient state through different observational forms. Effective multimodal learning in this setting requires a principled balance between representation-level symmetry and [...] Read more.

Intensive care units (ICUs) generate heterogeneous data streams, including structured electronic health records, physiological time series, and medical imaging, that describe the same patient state through different observational forms. Effective multimodal learning in this setting requires a principled balance between representation-level symmetry and architectural asymmetry. Clinically corresponding patient states should exhibit cross-modal representational symmetry, whereas each modality retains intrinsic asymmetry in dimensionality, temporal resolution, noise characteristics, and missingness. This study proposes a modality-specific sparse autoencoder framework for efficient multimodal ICU representation learning under this symmetry–asymmetry principle. Separate sparse encoders are assigned to each modality to preserve the modality-dependent structure while suppressing redundant latent activity through adaptive gating. Representation-level symmetry is encouraged through a sparsity-aware contrastive objective that aligns paired latent embeddings across modalities only on active informative dimensions. To further model inter-patient dependencies, the framework incorporates a graph neural network (GNN) whose message-passing operations respect modality-specific sparsity patterns. Experimental results indicate that the proposed framework improves predictive performance and computational efficiency relative to conventional multimodal baselines, while also exhibiting stronger robustness under missing-modality conditions and more selective latent representations. Overall, the method provides an effective and clinically relevant multimodal learning strategy for ICU decision support while offering a measurable symmetry-aware and asymmetry-preserving formulation for heterogeneous medical data. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Machine Learning and Data Mining)

► Show Figures

Figure 1

21 pages, 4648 KB

Open AccessArticle

M-GNN: A Topology-Enhanced Multi-Modal Graph Neural Network for Cancer Driver Gene Prediction

by Lu Qin, Wen Zhu, Xinyi Liao and Yujing Zhang

Metabolites 2026, 16(4), 268; https://doi.org/10.3390/metabo16040268 - 16 Apr 2026

Viewed by 322

Abstract

Background: Accurate identification of cancer driver genes is essential for understanding tumorigenesis and developing targeted therapies. Although graph neural networks (GNNs) have advanced multi-omics integration, existing methods often simply concatenate omics features and underutilize the topological information of biological networks. Methods: We propose [...] Read more.

Background: Accurate identification of cancer driver genes is essential for understanding tumorigenesis and developing targeted therapies. Although graph neural networks (GNNs) have advanced multi-omics integration, existing methods often simply concatenate omics features and underutilize the topological information of biological networks. Methods: We propose M-GNN, a multi-modal GNN framework for cancer driver gene prediction. It employs separate Graph Convolutional Network (GCN) encoders to process four types of omics data (mutation, expression, methylation, copy number variation (CNV)), each represented as a 16-dimensional vector. We incorporate knowledge distillation by using soft labels from a pre-trained teacher model to enhance feature representation. An attention mechanism adaptively fuses the encoded omics features, and a dual-path classifier combining a GCN and a Multilayer Perceptron (MLP) preserves both intrinsic gene properties and network topology. Results: Experiments on three public protein–protein interaction (PPI) networks show that M-GNN consistently achieves the highest or second-highest AUPRC compared to five state-of-the-art methods. Ablation studies confirm the contribution of each module, and biological interpretability analysis—including analysis of GO enrichment and drug sensitivity—validates the reliability of the predicted genes. Conclusions: M-GNN provides a robust and interpretable computational tool for systematic cancer driver gene identification, effectively integrating multi-omics and network data. Full article

(This article belongs to the Special Issue Machine Learning in Metabolomics: Unlocking the Future of Data Analysis)

► Show Figures

Figure 1

40 pages, 7468 KB

Open AccessReview

Traffic Flow Prediction in Intelligent Transportation Systems: A Comprehensive Review of Graph Neural Networks and Hybrid Deep Learning Methods

by Zhenhua Wang, Xinmeng Wang, Lijun Wang, Zheng Wu, Jiangang Hu, Fujiang Yuan and Zhen Tian

Algorithms 2026, 19(4), 310; https://doi.org/10.3390/a19040310 - 16 Apr 2026

Viewed by 491

Abstract

Traffic flow prediction is a key component of Intelligent Transportation Systems (ITS), crucial for alleviating urban congestion, optimizing traffic management, and improving the overall efficiency of road networks. With the rapid growth in vehicle numbers and the increasing complexity of urban traffic patterns, [...] Read more.

Traffic flow prediction is a key component of Intelligent Transportation Systems (ITS), crucial for alleviating urban congestion, optimizing traffic management, and improving the overall efficiency of road networks. With the rapid growth in vehicle numbers and the increasing complexity of urban traffic patterns, accurate short-term traffic flow prediction has become increasingly important. This paper comprehensively reviews the latest advancements in traffic flow prediction methods, focusing on graph neural network (GNN)-based approaches and hybrid deep learning frameworks. First, we introduce the fundamental theoretical foundations, including graph neural networks, deep learning algorithms, heuristic optimization methods, and attention mechanisms. Subsequently, we summarize GNN-based prediction methods into four paradigms: (1) federated learning and privacy-preserving methods, enabling cross-regional collaboration while protecting sensitive data; (2) dynamically adaptive graph structure methods, capturing time-varying spatial dependencies; (3) multi-graph fusion and attention mechanism methods, enhancing feature representations from multiple perspectives; and (4) cross-domain technology integration methods, fusing novel architectures and interdisciplinary technologies. Furthermore, we investigate hybrid methods combining signal decomposition, heuristic optimization, and attention mechanisms with LSTM networks to address challenges related to non-stationarity and model optimization. For each category, we analyzed representative works and summarized their core innovations, strengths, and limitations using a systematic comparative table. Finally, we discussed current challenges, including computational complexity, model interpretability, and generalization ability, and outlined future research directions such as lightweight model design, uncertainty quantification, multimodal data fusion, and integration with traffic control systems. This review provides researchers and practitioners with a systematic understanding of the latest advances in traffic flow prediction and offers guidance for methodological selection and future research. Full article

► Show Figures

Figure 1

22 pages, 812 KB

Open AccessReview

AI-Driven BCR Modeling for Precision Immunology

by Tao Liu, Xusheng Zhao and Fan Yang

Int. J. Mol. Sci. 2026, 27(7), 3296; https://doi.org/10.3390/ijms27073296 - 5 Apr 2026

Viewed by 863

Abstract

The B cell receptor (BCR) repertoire captures an individual’s immunological history and antigen-driven evolution within a vast, high-dimensional sequence space. Although bulk and single-cell adaptive immune receptor repertoire sequencing (AIRR-seq) now enables deep profiling of BCR diversity, interpreting these datasets remains challenging due [...] Read more.

The B cell receptor (BCR) repertoire captures an individual’s immunological history and antigen-driven evolution within a vast, high-dimensional sequence space. Although bulk and single-cell adaptive immune receptor repertoire sequencing (AIRR-seq) now enables deep profiling of BCR diversity, interpreting these datasets remains challenging due to strong inter-individual heterogeneity, nonlinear sequence–structure–function relationships, dynamic clonal evolution, and the rarity of functionally relevant clones. Artificial intelligence (AI) provides a conceptual and computational framework for addressing these challenges. Here, we summarize how advanced deep learning architectures, including antibody-specific language models, graph neural networks (GNNs), and generative frameworks, uncover clonal topology, structural features, and antigen-binding semantics. We further highlight applications in cancer, infectious disease, and autoimmunity. Finally, we propose a closed-loop framework that integrates multimodal datasets, interpretable AI, and iterative experimental validation to advance predictive immunology and accelerate therapeutic antibody discovery. Full article

(This article belongs to the Special Issue Molecular Mechanism of Immune Response)

► Show Figures

Figure 1

26 pages, 2889 KB

Open AccessArticle

A Contrastive-Learning-Based Pre-Training Framework for Optical Property Prediction of Low-Data Rhodamines with Interpretable Multitask Graph Neural Networks

by Jiangguo Qiu, Yanling Wu, Hong Zhang, Menglong Li, Xuemei Pu and Yanzhi Guo

Molecules 2026, 31(7), 1149; https://doi.org/10.3390/molecules31071149 - 31 Mar 2026

Viewed by 423

Abstract

Accurate prediction of maximum absorption (λ_abs) and emission (λ_emi) wavelengths are essential for the design of high-performance rhodamine probes. However, available rhodamine optical data is extremely limited and heterogeneous, posing challenges for deep learning models. Here, we [...] Read more.

Accurate prediction of maximum absorption (λ_abs) and emission (λ_emi) wavelengths are essential for the design of high-performance rhodamine probes. However, available rhodamine optical data is extremely limited and heterogeneous, posing challenges for deep learning models. Here, we developed a contrastive-learning-based multitask graph neural networks framework to predict λ_abs and λ_emi of rhodamine derivatives, using a multi-modal feature by integrating atom–bond level graph representations with solvent descriptors. The model is first pre-trained on 48,148 xanthene-derived molecules with a self-supervised contrastive strategy, as well as fine-tuning on a curated rhodamine dataset containing 390 molecule–solvent pair samples. It yields excellent performance with R² of 0.923 for λ_abs and 0.913 for λ_emi, respectively, outperforming machine learning, single-task, and no-pre-training GNN baselines. External dataset tests and comparisons with theoretical calculations reveal the superiority of our proposed model. Then, attention-based interpretability identifies chemically meaningful regions, including the conjugated backbone and amino substituents, which is consistent with the known photophysical mechanisms. Finally, we designed three new rhodamine derivatives exhibiting high Stokes shifts, with minimum 9 nm deviation between predicted and experimental values. These findings demonstrate that this framework enables accurate fluorescence property prediction and mechanism-informed molecular design, offering a promising theoretical guide for designing next-generation probes. Full article

► Show Figures

Figure 1

20 pages, 4119 KB

Open AccessArticle

Multimodal Contrast-Enhanced Molecular Representation Learning and Property Prediction

by Hong Luo, Jie He, Zhichao Liu and Chen Zeng

Biophysica 2026, 6(2), 24; https://doi.org/10.3390/biophysica6020024 - 27 Mar 2026

Viewed by 470

Abstract

Molecular representation learning (MRL) has garnered significant attention due to its pivotal role in downstream applications such as molecular property prediction and drug discovery. In most MRL approaches, molecules are encoded into 2D topological graphs via graph neural network (GNN), which suffers from [...] Read more.

Molecular representation learning (MRL) has garnered significant attention due to its pivotal role in downstream applications such as molecular property prediction and drug discovery. In most MRL approaches, molecules are encoded into 2D topological graphs via graph neural network (GNN), which suffers from over-smoothing issues and limited receptive fields. Furthermore, most GNN models fail to utilize the 3D spatial structural information that determines molecular physicochemical properties and biological activity. To this end, here we propose multimodal contrast-enhanced molecular representation learning (MCMRL). This approach utilizes both the 2D topological information and 3D structural information of molecules for contrastive learning to enhance molecular graph representations. Further, it integrates additional molecular fingerprint information and feature fusion techniques to incorporate multimodal knowledge, yielding more reliable and generalizable molecular representations. MCMRL is pre-trained on ~10 million unlabeled molecules from PubChem, followed by various downstream benchmark tasks. Experimental results demonstrate that MCMRL achieves superior performance in 9 out of 13 benchmark tests for molecular property prediction, validating its effectiveness in molecular representation learning. Furthermore, potential molecular drugs binding to biological target protein DRD2 screened by MCMRL representation show promising affinity score, which also demonstrates the efficacy of the proposed method. Full article

(This article belongs to the Special Issue Latest Advances in Molecular Docking Involved in Biophysics)

► Show Figures

Figure 1

28 pages, 7008 KB

Open AccessArticle

Multimodal Deep Learning Framework for Profiling Socio-Economic Indicators and Public Health Determinants in Urban Environments

by Esaie Dufitimana, Jean Pierre Bizimana, Ernest Uwayezu, Paterne Gahungu and Emmy Mugisha

Urban Sci. 2026, 10(4), 177; https://doi.org/10.3390/urbansci10040177 - 25 Mar 2026

Viewed by 455

Abstract

Urbanization significantly enhances socio-economic conditions, health, and well-being for many by improving access to services, education, and economic opportunities. However, socio-economic and public health disparities are also being exacerbated by urbanization. The reliable data required to monitor these conditions are often unavailable, outdated, [...] Read more.

Urbanization significantly enhances socio-economic conditions, health, and well-being for many by improving access to services, education, and economic opportunities. However, socio-economic and public health disparities are also being exacerbated by urbanization. The reliable data required to monitor these conditions are often unavailable, outdated, or inconsistent. This study introduces a multimodal deep learning framework that integrates satellite imagery with street network datasets to predict urban socio-economic indicators and public health determinants at the sector level as a political administrative unit of public health planning in Rwanda. We extracted latent visual and topological embeddings of the urban built environment, using a Convolutional Neural Network (CNN) and Graph Neural Network (GNN). These embeddings were fused through an attentional mechanism to train a multi-task regression model that simultaneously predicts multiple socio-economic indicators and public health determinants. This framework was applied to the City of Kigali in Rwanda. Overall, the multimodal fusion model achieved the best average performance across targets, with an average correlation of 0.68 and MAE of 1.26 for socio-economic indicators, and 0.68 and 1.46 for public health determinants, demonstrating the benefit of integrating visual and topological information. The learned fused embedding space arranges socio-economic indicators and public health determinant deciles along a continuous morphological gradient from sparsely built rural settings to dense urban settings, demonstrating that the urban form encodes latent signals that capture socio-economic indicators and health determinants. Moreover, the study reveals a strong relationship between socio-economic indicators and the public health index, with education, cooking materials, and floor materials exhibiting a correlation above 0.96. This work demonstrates the utility of an integrated framework for socio-economic indicator profiling and public health planning in data-scarce urban contexts, offering a scalable approach for monitoring the indicators of Sustainable Development Goals in rapidly changing urban environments. Full article

(This article belongs to the Topic Geospatial AI: Systems, Model, Methods, and Applications)

► Show Figures

Figure 1

34 pages, 4142 KB

Open AccessArticle

Subject-Independent Multimodal Interaction Modeling for Joint Emotion and Immersion Estimation in Virtual Reality

by Haibing Wang and Mujiangshan Wang

Symmetry 2026, 18(3), 451; https://doi.org/10.3390/sym18030451 - 6 Mar 2026

Cited by 1 | Viewed by 439

Abstract

Virtual Reality (VR) has emerged as a powerful medium for immersive human–computer interaction, where users’ emotional and experiential states play a pivotal role in shaping engagement and perception. However, existing affective computing approaches often model emotion recognition and immersion estimation as independent problems, [...] Read more.

Virtual Reality (VR) has emerged as a powerful medium for immersive human–computer interaction, where users’ emotional and experiential states play a pivotal role in shaping engagement and perception. However, existing affective computing approaches often model emotion recognition and immersion estimation as independent problems, overlooking their intrinsic coupling and the structured relationships underlying multimodal physiological signals. In this work, we propose a modality-aware multi-task learning framework that jointly models emotion recognition and immersion estimation from a graph-structured and symmetry-aware interaction perspective. Specifically, heterogeneous physiological and behavioral modalities—including eye-tracking, electrocardiogram (ECG), and galvanic skin response (GSR)—are treated as relational components with structurally symmetric encoding and fusion mechanisms, while their cross-modality dependencies are adaptively aggregated to preserve interaction symmetry at the representation level and introduce controlled asymmetry at the task-optimization level through weighted multi-task learning, without introducing explicit graph neural network architectures. To support reproducible evaluation, the VREED dataset is further extended with quantitative immersion annotations derived from presence-related self-reports via weighted aggregation and factor analysis. Extensive experiments demonstrate that the proposed framework consistently outperforms recurrent, convolutional, and Transformer-based baselines. Compared with the strongest Transformer baseline, the proposed framework yields consistent relative performance gains of approximately 3–7% for emotion recognition metrics and reduces immersion estimation errors by nearly 9%. Beyond empirical improvements, this study provides a structured interpretation of multimodal affective modeling that highlights symmetry, coupling, and controlled symmetry breaking in multi-task learning, offering a principled foundation for adaptive VR systems, emotion-driven personalization, and dynamic user experience optimization. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

59 pages, 5629 KB

Open AccessArticle

Adaptive Neural Network Method for Detecting Crimes in the Digital Environment to Ensure Human Rights and Support Forensic Investigations

by Serhii Vladov, Oksana Mulesa, Petro Horvat, Yevhen Kobko, Victoria Vysotska, Vasyl Kikinchuk, Serhii Khursenko, Kostiantyn Karaman and Oksana Kochan

Data 2026, 11(3), 49; https://doi.org/10.3390/data11030049 - 2 Mar 2026

Viewed by 686

Abstract

This article presents an adaptive neural network method for the automated detection, reconstruction, and prioritisation of multi-stage criminal operations in the digital environment, aiming to protect human rights and ensure the legal security of digital evidence. The developed method combines multimodal temporal encoders, [...] Read more.

This article presents an adaptive neural network method for the automated detection, reconstruction, and prioritisation of multi-stage criminal operations in the digital environment, aiming to protect human rights and ensure the legal security of digital evidence. The developed method combines multimodal temporal encoders, a graph module based on GNN for entity correlation, and a correlation head with a link-prediction mechanism and differentiable path recovery. Sliding time windows, logarithmic transformation of volumetric features, and pseudonymization of identifiers with the ability to utilise privacy-preserving procedures (federated learning, differential privacy) are used for data aggregation and normalisation. Unique features of the developed method include an integrated risk function combining an anomaly component and graph significance, a module for automated forensic packet generation with chain of custody recording, and a mechanism for incremental model updates. Experimental results demonstrate high diagnostic metric values (AUC ≈ 0.97, F1 ≈ 0.99 on the test dataset after balancing), robust recovery of priority paths (“path_probability” > 0.7 for top operations), and pipeline performance in PII leak prioritisation and human trafficking reconstruction scenarios. The study’s contribution lies in a practice-oriented neural network method that integrates detection, correlation, and the collection of legally applicable evidence. Full article

(This article belongs to the Special Issue Navigating Emerging Advancements and Challenges in AI and Big Data Technologies for Business and Society)

► Show Figures

Figure 1

32 pages, 9401 KB

Open AccessArticle

A Leakage-Aware Multimodal Machine Learning Framework for Nutrition Supply–Demand Forecasting Using Temporal and Spatial Data Fusion

by Abdullah, Muhammad Ateeb Ather, Jose Luis Oropeza Rodriguez, Carlos Guzmán Sánchez-Mejorada, Miguel Jesús Torres Ruiz and Rolando Quintero Tellez

Computers 2026, 15(3), 156; https://doi.org/10.3390/computers15030156 - 2 Mar 2026

Cited by 1 | Viewed by 868

Abstract

Accurate forecasting of nutrition supply–demand dynamics is essential for reducing resource wastage and improving equitable allocation. However, this task remains challenging due to heterogeneous data sources, cold-start regions, and the risk of information leakage in spatiotemporal modeling. This study presents a leakage-aware multimodal [...] Read more.

Accurate forecasting of nutrition supply–demand dynamics is essential for reducing resource wastage and improving equitable allocation. However, this task remains challenging due to heterogeneous data sources, cold-start regions, and the risk of information leakage in spatiotemporal modeling. This study presents a leakage-aware multimodal machine learning framework for nutrition supply–demand forecasting. The framework integrates temporal, spatial, and contextual information within a unified architecture. It combines self-supervised temporal representation learning, causal time-lag modeling, and few-shot adaptation to improve generalization under limited or previously unseen data conditions. Heterogeneous inputs include epidemiological, environmental, demographic, sentiment, and biologically derived indicators. These signals are encoded using a PatchTST-inspired temporal backbone coupled with a feature-token transformer employing cross-modal attention. Spatial dependencies are explicitly modeled using graph neural networks. Hierarchical decoding enables multi-horizon forecasting with calibrated uncertainty estimates. Model evaluation is conducted under strict spatiotemporal hold-out protocols with explicit leakage detection. All synthetic signals are excluded from testing. Across geographically and temporally disjoint datasets, the proposed framework consistently outperforms strong unimodal and multimodal baselines. It achieves macro-F1 scores above 99.5% and stable early-warning lead times of approximately 9 days under distribution shift. Ablation studies indicate that causal time-lag enforcement and few-shot adaptation contribute most strongly to performance robustness. Closed-loop simulation experiments suggest potential reductions in nutrient wastage of approximately 38%, response latency of 19%, and operational costs of 16% when deployed as a decision-support tool. External validation on fully unseen regions confirms the generalizability of the framework under realistic forecasting constraints. Full article

(This article belongs to the Special Issue AI in Bioinformatics)

► Show Figures

Figure 1

Search Results (141)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (141)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI