Techniques and Applications of Multimodal Data Fusion

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 March 2026 | Viewed by 1693

Special Issue Editors

School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 639798, Singapore
Interests: computer vision; multimodal learning; AI4Science; evaluation technology

E-Mail Website
Guest Editor
College of Computer and Data Science (College of Software), College of Artificial Intelligence, Fuzhou University, Fuzhou 350116, China
Interests: large-scale multimodal learning; person re-identification; multi-label classification

E-Mail Website
Guest Editor
Department of Computer Science, The Hong Kong Polytechnic University, Hong Kong 999077, China
Interests: pre-trained language modeling (PLM); automatic machining learning (AutoML); multimodal (vision–language) learning

E-Mail Website
Guest Editor
Faculty of Computing, Engineering and Built Environment, Ulster University, Belfast BT52 1SA, UK
Interests: information retrieval; data fusion

Special Issue Information

Dear Colleagues,

Multimodal data fusion is a transformative area of artificial intelligence research, dedicated to integrating diverse data modalities—such as visual, linguistic, auditory, and sensory inputs—into unified frameworks. This integration enables systems to achieve more comprehensive understanding and robust decision making across complex, dynamic scenarios. Advances in machine learning, particularly in deep neural networks, have paved the way for innovative algorithms and methodologies, pushing the boundaries of multimodal representation learning, cross-modal reasoning, and data alignment techniques.

Despite these advancements, significant challenges persist. Issues such as data heterogeneity, noisy inputs, semantic misalignment, and the scalability of fusion methods limit the applicability of current approaches in real-world environments. Furthermore, evaluating the performance and robustness of multimodal systems remains a critical research area, requiring novel benchmarks and metrics.

This Special Issue aims to provide a platform for exploring state-of-the-art techniques, novel theoretical contributions, and diverse applications in multimodal data fusion. Suggested topics include, but are not limited to, the following:

  • Advanced algorithms for feature extraction, multimodal representation learning, and cross-modal alignment.
  • Scalable and robust fusion techniques capable of handling noisy, sparse, and high-dimensional data.
  • Integration of multimodal fusion in diverse applications, such as autonomous systems, personalized healthcare, education, and urban infrastructure.
  • New frameworks for evaluating multimodal systems, emphasizing robustness, interpretability, and computational efficiency.
  • Exploration of emergent fields, such as social science analytics, human-centric AI, and creative AI, through the lens of multimodal fusion.

While these topics highlight promising directions, we also encourage submissions that go beyond the current scope and propose innovative ideas from broader dimensions. Contributions bridging disciplines or presenting disruptive perspectives are particularly welcomed.

The interdisciplinary nature of multimodal data fusion fosters collaboration between computer science, engineering, social sciences, and even the arts. This cross-disciplinary approach holds the potential to redefine how intelligent systems interact with and understand the world. From enabling advanced vision–language interfaces to facilitating sustainable smart cities, the applications are vast and impactful.

We invite researchers from diverse backgrounds to contribute original research and reviews that push the boundaries of this exciting field. Through this Special Issue, we aim to create a platform for collaboration, innovation, and exploration, driving forward the development of multimodal data fusion to address both theoretical and practical challenges in the years to come.

Dr. Shiyu Hu
Dr. Wenjie Yang
Dr. Zhaorui Zhang
Dr. Shengli Wu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal data fusion
  • multimodal representation learning
  • robust fusion techniques
  • feature alignment
  • cross-modal reasoning
  • heterogeneous data integration

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 14929 KB  
Article
Educational Evaluation with MLLMs: Framework, Dataset, and Comprehensive Assessment
by Yuqing Chen, Yixin Li, Yupei Ren, Yixin Liu and Yiping Ma
Electronics 2025, 14(18), 3713; https://doi.org/10.3390/electronics14183713 - 19 Sep 2025
Abstract
With the rapid development of Multimodal Large Language Models (MLLMs) in education, their applications have mainly focused on content generation tasks such as text writing and courseware production. However, automated assessment of non-exam learning outcomes remains underexplored. This study shifts the application of [...] Read more.
With the rapid development of Multimodal Large Language Models (MLLMs) in education, their applications have mainly focused on content generation tasks such as text writing and courseware production. However, automated assessment of non-exam learning outcomes remains underexplored. This study shifts the application of MLLMs from content generation to content evaluation and designs a lightweight and extensible framework to enable automated assessment of students’ multimodal work. We constructed a multimodal dataset comprising student essays, slide decks, and presentation videos from university students, which were annotated by experts across five educational dimensions. Based on horizontal educational evaluation dimensions (Format Compliance, Content Quality, Slide Design, Verbal Expression, and Nonverbal Performance) and vertical model capability dimensions (consistency, stability, and interpretability), we systematically evaluated four leading multimodal large models (GPT-4o, Gemini 2.5, Doubao1.6, and Kimi 1.5) in assessing non-exam learning outcomes. The results indicate that MLLMs demonstrate good consistency with human evaluations across various assessment dimensions, with each model exhibiting its own strengths. Additionally, they possess high explainability and perform better in text-based tasks than in visual tasks, but their scoring stability still requires improvement. This study demonstrates the potential of MLLMs for non-exam learning assessment and provides a reference for advancing their applications in education. Full article
(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)
Show Figures

Figure 1

23 pages, 2406 KB  
Article
Research on Driving Fatigue Assessment Based on Physiological and Behavioral Data
by Ge Zhang, Zhangyu Song, Xiu-Li Li, Wenqing Li and Kuai Liang
Electronics 2025, 14(17), 3469; https://doi.org/10.3390/electronics14173469 - 29 Aug 2025
Viewed by 549
Abstract
Driving fatigue is a crucial factor affecting road traffic safety. Accurately assessing the driver’s fatigue status is critical for accident prevention. This paper explores the assessment methods of driving fatigue under different conditions based on multimodal physiological and behavioral data. Physiological data such [...] Read more.
Driving fatigue is a crucial factor affecting road traffic safety. Accurately assessing the driver’s fatigue status is critical for accident prevention. This paper explores the assessment methods of driving fatigue under different conditions based on multimodal physiological and behavioral data. Physiological data such as heart rate, brainwave, electromyography, and pupil diameter were collected through experiments, as well as behavioral data such as posture changes, vehicle acceleration, and throttle usage. The results show that physiological and behavioral indicators have significant sensitivity to driving fatigue, and the fusion of multimodal data can effectively improve the accuracy of fatigue detection. Based on this, a comprehensive driving fatigue assessment model was constructed, and its applicability and reliability in different driving scenarios were verified. This study provides a theoretical basis for the development and application of driver fatigue monitoring systems, helping to achieve real-time fatigue warnings and protections, thereby improving driving safety. Full article
(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)
Show Figures

Figure 1

23 pages, 978 KB  
Article
Emotional Analysis in a Morphologically Rich Language: Enhancing Machine Learning with Psychological Feature Lexicons
by Ron Keinan, Efraim Margalit and Dan Bouhnik
Electronics 2025, 14(15), 3067; https://doi.org/10.3390/electronics14153067 - 31 Jul 2025
Viewed by 478
Abstract
This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with [...] Read more.
This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with sentiment lexicons. The dataset consists of over 350,000 posts from 25,000 users on the health-focused social network “Camoni” from 2010 to 2021. Various machine learning models—SVM, Random Forest, Logistic Regression, and Multi-Layer Perceptron—were used, alongside ensemble techniques like Bagging, Boosting, and Stacking. TF-IDF was applied for feature selection, with word and character n-grams, and pre-processing steps like punctuation removal, stop word elimination, and lemmatization were performed to handle Hebrew’s linguistic complexity. The models were enriched with sentiment lexicons curated by professional psychologists. The study demonstrates that integrating sentiment lexicons significantly improves classification accuracy. Specific lexicons—such as those for negative and positive emojis, hostile words, anxiety words, and no-trust words—were particularly effective in enhancing model performance. Our best model classified depression with an accuracy of 84.1%. These findings offer insights into depression detection, suggesting that practitioners in mental health and social work can improve their machine learning models for detecting depression in online discourse by incorporating emotion-based lexicons. The societal impact of this work lies in its potential to improve the detection of depression in online Hebrew discourse, offering more accurate and efficient methods for mental health interventions in online communities. Full article
(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)
Show Figures

Figure 1

Back to TopTop