AI and Machine Learning in the Big Data Era: Advanced Algorithms and Real-World Applications

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2026 | Viewed by 7361

Special Issue Editors


E-Mail Website
Guest Editor
School of Computing, Engineering and Technology, Robert Gordon University, Aberdeen AB10 7QB, Scotland, UK
Interests: artificial intelligence and applications in various industries; machine learning; natural language processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor Assistant
School of Computing, Engineering and Technology, Robert Gordon University, Aberdeen AB10 7QB, Scotland, UK
Interests: responsible AI; generative AI in business; information retrieval using RAG; vector databases; digital transformation; AI in energy, education and health sectors
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Information invites submissions to a Special Issue on "AI and Machine Learning in the Big Data Era: Advanced Algorithms and Real-World Applications".

As data continue to grow in volume, variety, and velocity, the fusion of artificial intelligence (AI) and machine learning (ML) with big data analytics is transforming how we extract actionable insights and build intelligent systems. From healthcare and finance to smart cities and industrial automation, advanced ML algorithms—ranging from deep learning to reinforcement learning and generative models—are being leveraged to tackle real-world challenges at scale.

This Special Issue will showcase cutting-edge research and practical innovations that harness AI and ML in the context of big data. We welcome original research and review articles that highlight algorithmic advances, scalable architectures, and applications with societal or industrial impacts.

We particularly encourage submissions that address interdisciplinary problems, real-world deployments, and methodological breakthroughs that push the boundaries of what AI and ML can achieve in the big data ecosystem.

Topics of interest include, but are not limited to, the following:

  • Scalable AI and ML algorithms for big data;
  • Deep learning and neural network architectures for large-scale data;
  • AI-driven data analytics and decision support systems;
  • Reinforcement learning and autonomous systems in big data contexts;
  • Federated learning and privacy-preserving ML for distributed data;
  • Real-time data processing and edge AI applications;
  • AI for IoT and smart environments;
  • Generative AI in high-dimensional data analysis;
  • Explainable AI (XAI) and interpretability in complex models;
  • Benchmarking and evaluation of AI models on big data;
  • Real-world and industrial applications of AI/ML;
  • AI ethics, bias, and fairness in big data environments.

We look forward to receiving your contributions to this timely and impactful Special Issue.

Dr. Ebuka Ibeke
Guest Editor

Dr. Chinedu Pascal Ezenkwu
Guest Editor Assistant

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • big data analytics
  • deep learning
  • scalable algorithms
  • predictive analytics
  • explainable AI
  • real-time processing
  • federated learning
  • generative AI

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 2700 KB  
Article
Bridging Stochasticity and Fuzziness: Automated Construction of Triangular Fuzzy Numbers via LLM Temperature Sampling for Managerial Decision Support
by Meng Zhang, Wenjie Bai, Yuanfei Guo, Wenlong Xu, Ranjun Wang, Yingdong Chen and Yuliang Zhao
Information 2026, 17(4), 349; https://doi.org/10.3390/info17040349 - 6 Apr 2026
Viewed by 355
Abstract
Traditional fuzzy decision-making often relies on manual expert calibration, which is labor-intensive and susceptible to subjective bias. This study addresses these limitations by proposing a novel framework that transforms the intrinsic probabilistic outputs of Large Language Models (LLMs) into Triangular Fuzzy Numbers (TFNs). [...] Read more.
Traditional fuzzy decision-making often relies on manual expert calibration, which is labor-intensive and susceptible to subjective bias. This study addresses these limitations by proposing a novel framework that transforms the intrinsic probabilistic outputs of Large Language Models (LLMs) into Triangular Fuzzy Numbers (TFNs). We introduce a multi-temperature sampling strategy coupled with weighted quantile aggregation and an adaptive interval adjustment mechanism to systematically map model stochasticity to fuzzy possibility distributions. Empirical validation on a structured prototype dataset demonstrates that the proposed method achieves high consistency with expert consensus, with GPT-4.2 exhibiting superior central accuracy and Gemini-2.5 excelling in uncertainty coverage. Furthermore, in complex unstructured scenarios involving business public opinion, the integration of Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG) significantly corrects cognitive biases and converges uncertainty boundaries. This research establishes a rigorous pathway from generative AI probabilities to fuzzy decision theory, offering a robust automated solution for quantitative risk assessment and intelligent decision support. Full article
Show Figures

Figure 1

24 pages, 13293 KB  
Article
Ensemble Learning Using YOLO Models for Semiconductor E-Waste Recycling
by Xinglong Zhou and Sos Agaian
Information 2026, 17(4), 322; https://doi.org/10.3390/info17040322 - 26 Mar 2026
Viewed by 427
Abstract
The global rise in electronic waste (e-waste), especially in semiconductor components such as circuit boards and microchips, underscores a critical need for improved recycling technology. Current industrial sorters often miss small, high-value components. This leads to the loss of precious metals and inefficient [...] Read more.
The global rise in electronic waste (e-waste), especially in semiconductor components such as circuit boards and microchips, underscores a critical need for improved recycling technology. Current industrial sorters often miss small, high-value components. This leads to the loss of precious metals and inefficient recycling processes. This paper introduces an automated detection framework for detecting semiconductor components in e-waste. It assesses ensemble learning methods that leverage the strengths of multiple YOLO (You Only Look Once) object detection models, including YOLOv5, YOLOv8, YOLOv9, YOLOv10, YOLOv11, and YOLOv12. Three ensemble fusion strategies are systematically compared: standard Non-Maximum Suppression (NMS), voting-based strategies (Affirmative, Consensus, Unanimous), and Weighted Box Fusion (WBF) with both static and dynamic weight optimization. Our simulations demonstrate that using multiple models together is far more effective than a single model for the following reasons. 1. Higher Accuracy: The best configuration, Top-4 Consensus Voting ensemble strategy, achieved an mAP@0.5 of 59.63%, a 10.3% improvement over the best individual model (YOLOv8s, 54.04%); 2. Greater Reliability: It significantly reduced “false negatives” (missed detections), even in cluttered or crowded e-waste scenarios; 3. Enhanced Detection: While the individual YOLOv8 model is fast (taking only 62.6 ms), supporting real-time detection, the best ensemble configuration (Consensus Top-4) takes 384.9 ms, creating a trade-off between detection accuracy and speed; 4. Well-Balanced Performance: Some fusion strategies showed slight trade-offs in mAP for certain parts, but collectively achieved a 7% rise in F1-score, indicating a better balance between precision and recall. This research marks significant progress in smart recycling. Improved component identification allows for more efficient recovery of high-purity materials. This promotes a circular economy by ensuring that rare and strategic materials in electronics are reused instead of discarded. Full article
Show Figures

Figure 1

14 pages, 1269 KB  
Article
Breaking the Spatio-Temporal Mismatch: A Preemptive Deep Reinforcement Learning Framework for Misinformation Defense
by Fulian Yin, Zhiqiang Zhang, Zhenyu Yu, Chang Wu, Junyi Chen and Yuewei Wu
Information 2026, 17(1), 67; https://doi.org/10.3390/info17010067 - 11 Jan 2026
Viewed by 541
Abstract
The containment of misinformation diffusion on social media is a critical challenge in computational social science. However, prevailing intervention strategies predominantly rely on static topological metrics or time-agnostic learning models, thereby overlooking the profound impact of temporal–demographic heterogeneity. This oversight frequently results in [...] Read more.
The containment of misinformation diffusion on social media is a critical challenge in computational social science. However, prevailing intervention strategies predominantly rely on static topological metrics or time-agnostic learning models, thereby overlooking the profound impact of temporal–demographic heterogeneity. This oversight frequently results in a “spatio-temporal mismatch”, where limited intervention resources are misallocated to structurally central but temporarily inactive nodes, particularly during non-stationary propagation bursts driven by exogenous triggers. To bridge this gap, we propose a Spatio-Temporal Deep Reinforcement Learning (ST-DRL) framework for proactive misinformation defense. By seamlessly integrating continuous trigonometric time encoding with demographic-aware Graph Attention Networks, our model explicitly captures the coupling dynamics between group-specific circadian rhythms and event-driven transmission surges. Extensive simulations on heterogeneous networks demonstrate that ST-DRL achieves a Peak Prevalence Reduction of 93.2%, significantly outperforming static heuristics and approaching the theoretical upper bound of oracle-assisted baselines. Crucially, interpretability analysis reveals that the agent autonomously evolves a “Preemptive Strike” strategy—prioritizing the sanitization of high-risk bridge nodes, such as bots, prior to event onsets—thus establishing a new paradigm for predictive rather than reactive network governance. Full article
Show Figures

Figure 1

42 pages, 22373 KB  
Article
Transforming Credit Risk Analysis: A Time-Series-Driven ResE-BiLSTM Framework for Post-Loan Default Detection
by Yue Yang, Yuxiang Lin, Ying Zhang, Zihan Su, Chang Chuan Goh, Tangtangfang Fang, Anthony Bellotti and Boon Giin Lee
Information 2026, 17(1), 5; https://doi.org/10.3390/info17010005 - 21 Dec 2025
Viewed by 1037
Abstract
Credit risk refers to the possibility that a borrower fails to meet contractual repayment obligations, posing potential losses to lenders. This study aims to enhance post-loan default prediction in credit risk management by constructing a time-series modeling framework based on repayment behavior data, [...] Read more.
Credit risk refers to the possibility that a borrower fails to meet contractual repayment obligations, posing potential losses to lenders. This study aims to enhance post-loan default prediction in credit risk management by constructing a time-series modeling framework based on repayment behavior data, enabling the capture of repayment risks that emerge after loan issuance. To achieve this objective, a Residual Enhanced Encoder Bidirectional Long Short-Term Memory (ResE-BiLSTM) model is proposed, in which the attention mechanism is responsible for discovering long-range correlations, while the residual connections ensure the preservation of distant information. This design mitigates the tendency of conventional recurrent architectures to overemphasize recent inputs while underrepresenting distant temporal information in long-term dependency modeling. Using the real-world large-scale Freddie Mac Single-Family Loan-Level Dataset, the model is evaluated on 44 independent cohorts and compared with five baseline models, including Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) across multiple evaluation metrics. The experimental results demonstrate that ResE-BiLSTM achieves superior performance on key indicators such as F1 and AUC, with average values of 0.92 and 0.97, respectively, and demonstrates robust performance across different feature window lengths and resampling settings. Ablation experiments and SHapley Additive exPlanations (SHAP)-based interpretability analyses further reveal that the model captures non-monotonic temporal importance patterns across key financial features. This study advances time-series–based anomaly detection for credit risk prediction by integrating global and local temporal learning. The findings offer practical value for financial institutions and risk management practitioners, while also providing methodological insights and a transferable modeling paradigm for future research on credit risk assessment. Full article
Show Figures

Figure 1

22 pages, 1236 KB  
Article
An Industrial Framework for Cold-Start Recommendation in Few-Shot and Zero-Shot Scenarios
by Xulei Cao, Wenyu Zhang, Feiyang Jiang and Xinming Zhang
Information 2025, 16(12), 1105; https://doi.org/10.3390/info16121105 - 15 Dec 2025
Viewed by 1292
Abstract
With the rise of online advertising, e-commerce industries, and new media platforms, recommendation systems have become an essential product form that connects users with a vast number of candidates. A major challenge in recommendation systems is the cold-start problem, where the absence of [...] Read more.
With the rise of online advertising, e-commerce industries, and new media platforms, recommendation systems have become an essential product form that connects users with a vast number of candidates. A major challenge in recommendation systems is the cold-start problem, where the absence of historical interaction data for new users and items leads to poor recommendation performance. We first analyze the causes of the cold-start problem, highlighting the limitations of existing embedding models when faced with a lack of interaction data. To address this, we classify the features of models into three categories, leveraging the Trans Block mapping to transfer features into the semantic space of missing features. Then, we propose a model-agnostic industrial framework (MAIF) with the Auto-Selection serving mechanism to address the cold-start recommendation problem in few-shot and zero-shot scenarios without requiring training from scratch. This framework can be applied to various online models without altering the prediction for warm entities, effectively avoiding the “seesaw phenomenon” between cold and warm entities. It improves prediction accuracy and calibration performance in three cold-start scenarios of recommendation systems. Finally, both the offline experiments on real-world industrial datasets and the online advertising system on the Dazhong Dianping app validate the effectiveness of our approach, showing significant improvements in recommendation performance for cold-start scenarios. Full article
Show Figures

Figure 1

14 pages, 838 KB  
Article
Leveraging LLMs for User Rating Prediction from Textual Reviews: A Hospitality Data Annotation Case Study
by Patricia Nnanna, Olasoji Amujo, Chinedu Pascal Ezenkwu and Ebuka Ibeke
Information 2025, 16(12), 1059; https://doi.org/10.3390/info16121059 - 2 Dec 2025
Viewed by 916
Abstract
The proliferation of user-generated content in today’s digital landscape has further increased dependence on online reviews as a source for decision-making in the hospitality industry. There has been an increasing interest in automating this decision-support mechanism through recommender systems. However, this process often [...] Read more.
The proliferation of user-generated content in today’s digital landscape has further increased dependence on online reviews as a source for decision-making in the hospitality industry. There has been an increasing interest in automating this decision-support mechanism through recommender systems. However, this process often requires a large amount of labelled corpus to train an effective algorithm, necessitating the use of human annotators for developing training data, where this is lacking. Although the manual annotation can be helpful in enriching the training corpus, it can, on the one hand, introduce errors and annotator bias, including subjectivity and cultural bias, which can affect the quality of the data and fairness in the model. This paper examines the alignment of ratings derived from different annotation sources and the original ratings provided by customers, which are treated as the ground truth. The paper compares the predictions from Generative Pre-trained Transformer (GPT) models against ratings assigned by Amazon Mechanical Turk (MTurk) workers. The GPT 4o annotation outputs closely mirror the original ratings, given its strong positive correlation (0.703) with the latter. The GPT-3.5 Turbo and MTurk showed weaker correlations (0.663 and 0.15, respectively) than GPT 4o. The potential cause of the large difference between original ratings and MTurk (largely driven by human perception) lies in the inherent challenges of subjectivity, quantitative bias, and variability in context comprehension. These findings suggest that the use of advanced models such as GPT-4o can significantly reduce the potential bias and variability introduced by Amazon MTurk annotators, thus improving the prediction accuracy of ratings with actual user sentiment as expressed in textual reviews. Moreover, with the per-annotation cost of an LLM shown to be thirty times cheaper than MTurk, our proposed LLM-based textual review annotation approach will be cost-effective for the hospitality industry. Full article
Show Figures

Figure 1

32 pages, 18611 KB  
Article
Optimization of Multi-Intelligent Body Strategies for UAV Adversarial Tasks Based on MADDPG-SASP
by Zhenfei Xiao, Fuyong Liu and Qian Wang
Information 2025, 16(12), 1050; https://doi.org/10.3390/info16121050 - 1 Dec 2025
Viewed by 574
Abstract
In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially [...] Read more.
In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially when enemy strategies are subject to continuous evolution, complicating agents’ ability to respond effectively. To address these challenges, this paper introduces a novel enhanced MARL framework, MADDPG-SASP, which integrates an improved self-attention mechanism with self-play within the MADDPG algorithm, thereby facilitating superior strategy optimization. The self-attention mechanism empowers agents to adaptively extract critical environmental features, thereby enhancing both the speed and accuracy of perception and decision-making processes. Concurrently, the adaptive self-battling mechanism iteratively refines agent strategies through continuous adversarial interactions, thereby bolstering the stability and flexibility of their responses. Empirical results indicate that after 600 rounds, the win rate of agents employing this framework saw a substantial increase, rising from 26.17% with the original MADDPG to a perfect 100%. Further validation through comparative experiments underscores the method’s efficacy, demonstrating considerable advantages in strategy optimization and agent performance in complex, dynamic environments. Moreover, in the Predator–Prey Scenario combat environment, when the enemy side employs a multi-agent strategy, the win rate for the drone agent side can reach 98.5% and 100%. Full article
Show Figures

Figure 1

23 pages, 619 KB  
Article
TisLLM: Temporal Integration-Enhanced Fine-Tuning of Large Language Models for Sequential Recommendation
by Xiaosong Zhu, Wenzheng Li, Bingqiang Zhang and Liqing Geng
Information 2025, 16(9), 818; https://doi.org/10.3390/info16090818 - 21 Sep 2025
Viewed by 1461
Abstract
In recent years, the remarkable versatility of large language models (LLMs) has spurred considerable interest in leveraging their capabilities for recommendation systems. Critically, we argue that the intrinsic aptitude of LLMs for modeling sequential patterns and temporal dynamics renders them uniquely suited for [...] Read more.
In recent years, the remarkable versatility of large language models (LLMs) has spurred considerable interest in leveraging their capabilities for recommendation systems. Critically, we argue that the intrinsic aptitude of LLMs for modeling sequential patterns and temporal dynamics renders them uniquely suited for sequential recommendation tasks—a foundational premise explored in depth later in this work. This potential, however, is tempered by significant hurdles: a discernible gap exists between the general competencies of conventional LLMs and the specialized needs of recommendation tasks, and their capacity to uncover complex, latent data interrelationships often proves inadequate, potentially undermining recommendation efficacy. To bridge this gap, our approach centers on adapting LLMs through fine-tuning on dedicated recommendation datasets, enhancing task-specific alignment. Further, we present the temporal Integration Enhanced Fine-Tuning of Large Language Models for Sequential Recommendation (TisLLM) framework. TisLLM specifically targets the deeper excavation of implicit associations within recommendation data streams. Its core mechanism involves partitioning sequential user interaction data using temporally defined sliding windows. These chronologically segmented slices are then aggregated to form enriched contextual representations, which subsequently drive the LLM fine-tuning process. This methodology explicitly strengthens the model’s compatibility with the inherently sequential nature of recommendation scenarios. Rigorous evaluation on benchmark datasets provides robust empirical validation, confirming the effectiveness of the TisLLM framework. Full article
Show Figures

Graphical abstract

Back to TopTop