Advances and Innovations in Deep Learning: Unveiling Multidisciplinary Applications and Challenges

A special issue of Inventions (ISSN 2411-5134). This special issue belongs to the section "Inventions and Innovation in Design, Modeling and Computing Methods".

Deadline for manuscript submissions: closed (28 February 2026) | Viewed by 10710

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Data Science, City University of Macau, Taipa, Macau
Interests: artificial intelligence; multimodal learning; digital humanities
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China
Interests: multilingual artificial intelligence applications; multimodal learning; preservation of digital culture heritage
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The rapid evolution of artificial intelligence (AI) and deep learning has redefined the boundaries of technological innovation. This Special Issue welcomes submissions that traverse a wide spectrum of interdisciplinary research, exploring novel algorithms and architectures that could enhance the performance and efficiency of AI systems. This might involve advancements in neural network design, optimization techniques, or the development of hybrid models. Additionally, the application of AI in various sectors holds great promise. For instance, in healthcare, it can assist in disease diagnosis and treatment planning; in transportation, it enables autonomous driving and traffic optimization; and in finance, it aids in risk assessment and fraud detection. We also encourage investigations into the ethical and social implications of AI, as well as the challenges related to data privacy and security. All high-quality contributions that push the boundaries of this field are welcome, including research on the development of explainable AI models, the utilization of AI in emerging technologies such as the Internet of Things and blockchain, and the exploration of AI’s potential in creative fields such as art and music generation.

This Special Issue focuses on the recent and significant progress made in artificial intelligence and deep learning. It aims to bring together research that showcases the practical applications and theoretical advancements of these technologies. The scope includes, but is not limited to, the following:

  • Artificial intelligence;
  • Deep learning;
  • Large models;
  • Language processing;
  • Image processing;
  • Remote sensing;
  • AI security;
  • Intelligent systems;
  • Data analysis;
  • AI applications and inventions.

Prof. Dr. Yu Weng
Dr. Zheng Liu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Inventions is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • large models
  • language processing
  • image processing
  • remote sensing
  • ai security
  • intelligent systems
  • data analysis
  • AI applications and inventions

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

26 pages, 6390 KB  
Article
Image Captioning Using Enhanced Cross-Modal Attention with Multi-Scale Aggregation for Social Hotspot and Public Opinion Monitoring
by Shan Jiang, Yingzhao Chen, Rilige Chaomu and Zheng Liu
Inventions 2026, 11(1), 13; https://doi.org/10.3390/inventions11010013 - 2 Feb 2026
Viewed by 770
Abstract
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) [...] Read more.
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) often struggle with complex, context-rich, and socially meaningful images in real-world social media scenarios, mainly due to insufficient cross-modal interaction, redundant visual token representations, and an inadequate ability to capture multi-scale semantic cues. As a result, the generated captions tend to be incomplete or less informative. To address these limitations, this paper proposes ECMA (Enhanced Cross-Modal Attention), a lightweight module integrated into the Querying Transformer (Q-Former) of BLIP-2. ECMA enhances cross-modal interaction through bidirectional attention between visual features and query tokens, enabling more effective information exchange, while a multi-scale visual aggregation strategy is introduced to model semantic representations at different levels of abstraction. In addition, a semantic residual gating mechanism is designed to suppress redundant information while preserving task-relevant features. ECMA can be seamlessly incorporated into BLIP-2 without modifying the original architecture or fine-tuning the vision encoder or the large language model, and is fully compatible with OPT (Open Pre-trained Transformer)-based variants. Experimental results on the COCO (Common Objects in Context) benchmark demonstrate consistent performance improvements, where ECMA improves the CIDEr (Consensus-based Image Description Evaluation) score from 144.6 to 146.8 and the BLEU-4 score from 42.5 to 43.9 on the OPT-6.7B model, corresponding to relative gains of 1.52% and 3.29%, respectively, while also achieving competitive METEOR (Metric for Evaluation of Translation with Explicit Ordering) scores. Further evaluations on social media datasets show that ECMA generates more coherent, context-aware, and socially informative captions, particularly for images involving complex interactions and socially meaningful scenes. Full article
Show Figures

Figure 1

25 pages, 2201 KB  
Article
Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning
by Peilin Li, Ziyan Yan, Yuchen Zhou, Hongyun Li, Wei Gao and Dazhou Li
Inventions 2026, 11(1), 12; https://doi.org/10.3390/inventions11010012 - 26 Jan 2026
Viewed by 1229
Abstract
Dual-target drug design addresses complex diseases and drug resistance, yet existing computational approaches struggle with simultaneous multi-protein optimization. This study presents SFG-Drug, a novel dual-target molecular generation model combining Monte Carlo tree search with gated recurrent unit neural networks for simultaneous MEK1 and [...] Read more.
Dual-target drug design addresses complex diseases and drug resistance, yet existing computational approaches struggle with simultaneous multi-protein optimization. This study presents SFG-Drug, a novel dual-target molecular generation model combining Monte Carlo tree search with gated recurrent unit neural networks for simultaneous MEK1 and mTOR targeting. The methodology employed DigFrag digital fragmentation on ZINC-250k dataset, integrated low-frequency masking techniques for enhanced diversity, and utilized molecular docking scores as reward functions. Comprehensive evaluation on MOSES benchmark demonstrated superior performance compared to state-of-the-art methods, achieving perfect validity (1.000), uniqueness (1.000), and novelty (1.000) scores with highest internal diversity indices (0.878 for IntDiv1, 0.860 for IntDiv2). Over 90% of generated molecules exhibited favorable binding affinity with both targets, showing optimal drug-like properties including QED values in [0.2, 0.7] range and high synthetic accessibility scores. Generated compounds demonstrated structural novelty with Tanimoto coefficients below 0.25 compared to known inhibitors while maintaining dual-target binding capability. The SFG-Drug model successfully bridges the gap between computational prediction and practical drug discovery, offering significant potential for developing new dual-target therapeutic agents and advancing AI-driven pharmaceutical research methodologies. Full article
Show Figures

Figure 1

20 pages, 1579 KB  
Article
Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games
by Jesus GomezRomero-Borquez, Carolina Del-Valle-Soto, José A. Del-Puerto-Flores, Juan-Carlos López-Pimentel, Francisco R. Castillo-Soria, Roilhi F. Ibarra-Hernández and Leonardo Betancur Agudelo
Inventions 2025, 10(6), 97; https://doi.org/10.3390/inventions10060097 - 29 Oct 2025
Viewed by 1302
Abstract
This study investigates the impact of audio feedback on cognitive performance during VR puzzle games using EEG analysis. Thirty participants played three different VR puzzle games under two conditions (with and without audio) while their brain activity was recorded. To analyze concentration levels [...] Read more.
This study investigates the impact of audio feedback on cognitive performance during VR puzzle games using EEG analysis. Thirty participants played three different VR puzzle games under two conditions (with and without audio) while their brain activity was recorded. To analyze concentration levels and neural engagement patterns, we employed spectral analysis combined with a preprocessing algorithm and an optimized Deep Neural Network (DNN) model. The proposed processing stage integrates feature normalization, automatic labeling based on Principal Component Analysis (PCA), and Gamma band feature extraction, transforming concentration detection into a supervised classification problem. Experimental validation was conducted under the two gaming conditions in order to evaluate the impact of multisensory stimulation on model performance. The results show that the proposed approach significantly outperforms traditional machine learning classifiers (SVM, LR) and baseline deep learning models (DNN, DGCNN), achieving a 97% accuracy in the audio scenario and 83% without audio. These findings confirm that auditory stimulation reinforces neural coherence and improves the discriminability of EEG patterns, while the proposed method maintains a robust performance under less stimulating conditions. Full article
Show Figures

Figure 1

19 pages, 437 KB  
Article
Research on Generation and Quality Evaluation of Earthquake Emergency Language Service Contingency Plan Based on Chain-of-Thought Prompt Engineering for LLMs
by Wenyan Zhang, Kai Zhang, Ti Li and Wenhua Deng
Inventions 2025, 10(5), 74; https://doi.org/10.3390/inventions10050074 - 26 Aug 2025
Viewed by 1658
Abstract
China frequently experiences natural disasters, making emergency language services a key link in information transmission, cross-lingual communication, and resource coordination during disaster relief. Traditional contingency plans rely on manual experience, which results in low efficiency, limited coverage, and insufficient dynamic adaptability. Large language [...] Read more.
China frequently experiences natural disasters, making emergency language services a key link in information transmission, cross-lingual communication, and resource coordination during disaster relief. Traditional contingency plans rely on manual experience, which results in low efficiency, limited coverage, and insufficient dynamic adaptability. Large language models (LLMs), with their advantages in semantic understanding, multilingual adaptation, and scalability, provide new technical approaches for emergency language services. Our study establishes the country’s first generative evaluation index system for emergency language service contingency plans, covering eight major dimensions. Through an evaluation of 11 mainstream large language models, including Deepseek, we find that these models perform excellently in precise service stratification and resource network stereoscopic coordination but show significant shortcomings in legal/regulatory frameworks and mechanisms for dynamic evolution. It is recommended to construct a more comprehensive emergency language service system by means of targeted data augmentation, multi-model collaboration, and human–machine integration so as to improve cross-linguistic communication efficiency in emergencies and reduce secondary risks caused by information transmission barriers. Full article
Show Figures

Figure 1

22 pages, 5083 KB  
Article
Intelligent Mobile-Assisted Language Learning: A Deep Learning Approach for Pronunciation Analysis and Personalized Feedback
by Fengqin Liu, Korawit Orkphol, Natthapon Pannurat, Thanat Sooknuan, Thanin Muangpool, Sanya Kuankid and Montri Phothisonothai
Inventions 2025, 10(4), 46; https://doi.org/10.3390/inventions10040046 - 24 Jun 2025
Cited by 1 | Viewed by 3304
Abstract
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, [...] Read more.
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, mel-frequency cepstral coefficients (MFCCs), and formant frequencies in a manner that mirrors the auditory cortex’s interpretation of sound. The core of our approach utilizes a convolutional neural network (CNN) to classify pronunciation patterns from user-recorded speech. To enhance the assessment accuracy and provide nuanced feedback, we integrated a fuzzy inference system (FIS) that helps learners identify and correct specific pronunciation errors. The experimental results demonstrate that our multi-feature model achieved 82.41% to 90.52% accuracies in accent classification across diverse linguistic contexts. The user testing revealed statistically significant improvements in pronunciation skills, where learners showed a 5–20% enhancement in accuracy after using the system. The proposed MALL system offers a portable, accessible solution for language learners while establishing a foundation for future research in multilingual functionality and mobile platform optimization. By combining advanced speech analysis with intuitive feedback mechanisms, this system addresses a critical challenge in language acquisition and promotes more effective self-directed learning. Full article
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 5642 KB  
Review
Current Trends and Challenges in Applying Metaheuristics to the Innovative Area of Weight and Structure Determination Neuronets
by Spyridon D. Mourtas, Shuai Li, Xinwei Cao, Bolin Liao and Vasilios N. Katsikis
Inventions 2025, 10(4), 62; https://doi.org/10.3390/inventions10040062 - 24 Jul 2025
Cited by 2 | Viewed by 1235
Abstract
The weights and structure determination (WASD) neuronet (or neural network) is a single-hidden-layer feedforward neuronet that exhibits an excellent approximation ability, despite its simple structure. Thanks to its strong generalization, fast speed, and ease of implementation, the WASD neuronet has been the subject [...] Read more.
The weights and structure determination (WASD) neuronet (or neural network) is a single-hidden-layer feedforward neuronet that exhibits an excellent approximation ability, despite its simple structure. Thanks to its strong generalization, fast speed, and ease of implementation, the WASD neuronet has been the subject of many modifications, including metaheuristics, and applications in a wide range of scientific fields. As it has garnered significant attention in the last decade, the aim of this study is to provide an extensive overview of the WASD framework. Furthermore, the WASD has been effectively used in numerous real-time learning tasks like regression, multiclass classification, and binary classification due to its exceptional performance. In addition, we present WASD’s applications in social science, business, engineering, economics, and medicine. We aim to report these developments and provide some avenues for further research. Full article
Show Figures

Figure 1

Back to TopTop