- 4.4Impact Factor
- 9.8CiteScore
- 24 daysTime to First Decision
Multimodal Deep Learning and Its Applications
Special Issue Information
Dear Colleagues,
The rapid evolution of artificial intelligence, coupled with the widespread availability of heterogeneous data sources, has brought multimodal deep learning to the forefront of modern research. Real-world data are no longer limited to a single modality; instead, they consist of complex combinations of text, images, videos, audio signals, graphs, and structured records. Effectively modeling and reasoning over such diverse information has become essential in a wide range of domains, including intelligent recommendation systems, video understanding, information retrieval, healthcare analytics, and decision support systems.
Recent advances in multimodal deep learning—particularly multimodal large language models—have demonstrated remarkable capabilities in integrating vision, language, and other modalities for reasoning, generation, and interaction. However, significant challenges remain in scalability, robustness, interpretability, and real-world deployment. Tasks such as video anomaly detection, cross-modal retrieval, and multimodal recommendation require models that can capture fine-grained temporal dynamics, semantic alignment across modalities, and complex relational structures. Moreover, emerging paradigms such as generative recommender systems and graph-based multimodal representation learning demand novel architectures and learning strategies that go beyond traditional fusion techniques.
This Special Issue, entitled “Multimodal Deep Learning and Its Applications,” aims to bring together cutting-edge research that advances the foundations, methodologies, and applications of multimodal deep learning. The goal is to highlight innovative models and systems that effectively combine multiple modalities, leverage large-scale pretraining, and support intelligent reasoning, generation, and decision-making in complex environments. We particularly encourage contributions that explore the synergy between multimodal learning, large language models, graph representations, and real-world applications.
We welcome original research papers and comprehensive review articles. Topics of interest include, but are not limited to, the following:
- Multimodal large language models and foundation models;
- Video understanding and video anomaly detection;
- Multimodal recommendation systems;
- Generative recommender models and user behavior modeling;
- Cross-modal and multimodal retrieval;
- Graph-based multimodal representation learning;
- Multimodal fusion, alignment, and representation techniques;
- Zero-shot and few-shot multimodal learning;
- Multimodal content generation and editing;
- Multimodal learning for big data analytics;
- Trustworthy, explainable, and efficient multimodal models;
- Continual and adaptive multimodal learning;
- Applications of multimodal deep learning in healthcare, economics, and social computing.
We believe this Special Issue will provide a timely and meaningful platform for researchers and practitioners to share novel ideas, methodological advances, and practical insights into multimodal deep learning. We look forward to your valuable contributions and to fostering interdisciplinary collaboration within this rapidly growing research area.
Prof. Dr. Ping Hu
Prof. Dr. Jie Zou
Dr. Lu Zhang
Guest Editors
Manuscript Submission Information
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.
Keywords
- multimodal deep learning
- multimodal large language models
- multimodal recommendation
- cross-modal retrieval
- graph representation learning
- multimodal data fusion
Benefits of Publishing in a Special Issue
- Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
- Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
- Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
- External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
- e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

