Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 3816 KB  
Article
Machine Learning Based Impact Sensing Using Piezoelectric Sensors: From Simulated Training Data to Zero-Shot Experimental Application
by Petros Gkertzos, Johannes Gerritzen, Constantinos Tsakonas, Stefanos H. Panagiotou, Athanasios Kotzakolios, Ioannis Katsidimas, Andreas Hornig, Siavash Ghiasvand, Maik Gude, Vassilis Kostopoulos and Sotiris Nikoletseas
Big Data Cogn. Comput. 2026, 10(1), 5; https://doi.org/10.3390/bdcc10010005 - 23 Dec 2025
Viewed by 1560
Abstract
Modern impact monitoring systems combine multiple inputs with machine learning (ML) models for impact detection, localization, and event assessment. Their accuracy relies on large, event-representative datasets, used for algorithmic development and ML model training. High-fidelity numerical models can provide augmented datasets by overcoming [...] Read more.
Modern impact monitoring systems combine multiple inputs with machine learning (ML) models for impact detection, localization, and event assessment. Their accuracy relies on large, event-representative datasets, used for algorithmic development and ML model training. High-fidelity numerical models can provide augmented datasets by overcoming the cost and time limitations of experimental methods. This research presents an end-to-end numerical methodology for impact detection based on simulation (training) and experimental (testing) data. Initially, a finite element model (FEM) of our experimental setup utilizing piezoelectric transducer (PZT) sensors mounted on a thermoplastic plate is created. From the experimental impact signals, a few consistent cases are identified for feature extraction. A design of experiments explores the range of each parameter, and through surrogate optimization, the material and piezoelectric properties of the setup are determined. Subsequently, a virtual dataset, involving multiple impact cases, is created to train the ML models performing impact detection. Testing with experimental data shows results consistent with literature studies that used only experimental data for both training and testing. This work provides a systematic methodology for representative dataset generation and impact monitoring through ML, while addressing accurate FEM parameter identification from a few experimental tries. Full article
Show Figures

Figure 1

29 pages, 3175 KB  
Article
KANs Layer Integration: Benchmarking Deep Learning Architectures for Tornado Prediction
by Shuo (Luna) Yang, Ehsaneh Vilataj, Muhammad Faizan Raza and Satish Mahadevan Srinivasan
Big Data Cogn. Comput. 2025, 9(12), 324; https://doi.org/10.3390/bdcc9120324 - 16 Dec 2025
Viewed by 1090
Abstract
Tornado occurrence and detection are well established in mesoscale meteorology, yet the application of deep learning (DL) to radar-based tornado detection remains nascent and under-validated. This study benchmarks DL approaches on TorNet, a curated dataset of full-resolution, polarimetric Weather Surveillance Radar-1988 Doppler (WSR-88D) [...] Read more.
Tornado occurrence and detection are well established in mesoscale meteorology, yet the application of deep learning (DL) to radar-based tornado detection remains nascent and under-validated. This study benchmarks DL approaches on TorNet, a curated dataset of full-resolution, polarimetric Weather Surveillance Radar-1988 Doppler (WSR-88D) radar volumes. We evaluate three canonical architectures (e.g., CNN, VGG19, and Xception) under five optimizers and assess the effect of replacing conventional MLP heads with Kolmogorov–Arnold Network (KAN) layers. To address severe class imbalance and label noise, we implement radar-aware preprocessing and augmentation, temporal splits, and recall-sensitive training. Models are compared using accuracy, precision, recall, and ROC-AUC. Results show that KAN-augmented variants generally converge faster and deliver higher rare-event sensitivity and discriminative power than their baselines, with Adam and RMSprop providing the most stable training and Lion showing architecture-dependent gains. We contribute (i) a reproducible baseline suite for TorNet, (ii) evidence on the conditions under which KAN integration improves tornado detection, and (iii) practical guidance on optimizer–architecture choices for rare-event forecasting with weather radar. Full article
Show Figures

Figure 1

90 pages, 1718 KB  
Systematic Review
A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges
by Andrew Brown, Muhammad Roman and Barry Devereux
Big Data Cogn. Comput. 2025, 9(12), 320; https://doi.org/10.3390/bdcc9120320 - 12 Dec 2025
Cited by 9 | Viewed by 10418
Abstract
Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight. Objective: We aimed to synthesise empirical evidence on RAG effectiveness versus parametric-only [...] Read more.
Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight. Objective: We aimed to synthesise empirical evidence on RAG effectiveness versus parametric-only baselines, map datasets/architectures/evaluation practices, and surface limitations and research gaps. Methods: This systematic review was conducted and reported in accordance with PRISMA 2020. We searched the ACM Digital Library, IEEE Xplore, Scopus, ScienceDirect, and DBLP; all sources were last searched on 13 May 2025. This included studies from January 2020–May 2025 that addressed RAG or similar retrieval-supported systems producing text output, met citation thresholds (≥15 for 2025; ≥30 for 2024 or earlier), and offered original contributions; excluded non-English items, irrelevant works, duplicates, and records without accessible full text. Bias was appraised with a brief checklist; screening used one reviewer with an independent check and discussion. LLM suggestions were advisory only; 2025 citation thresholds were adjusted to limit citation-lag. We used a descriptive approach to synthesise the results, organising studies by themes aligned to RQ1–RQ4 and reporting summary counts/frequencies; no meta-analysis was undertaken due to heterogeneity of designs and metrics. Results: We included 128 studies spanning knowledge-intensive tasks (35/128; 27.3%), open-domain QA (20/128; 15.6%), software engineering (13/128; 10.2%), and medical domains (11/128; 8.6%). Methods have shifted from DPR + seq2seq baselines to modular, policy-driven RAG with hybrid/structure-aware retrieval, uncertainty-triggered loops, memory, and emerging multimodality. Evaluation remains overlap-heavy (EM/F1), with increasing use of retrieval diagnostics (e.g., Recall@k, MRR@k), human judgements, and LLM-as-judge protocols. Efficiency and security (poisoning, leakage, jailbreaks) are growing concerns. Discussion: Evidence supports a shift to modular, policy-driven RAG, combining hybrid/structure-aware retrieval, uncertainty-aware control, memory, and multimodality, to improve grounding and efficiency. To advance from prototypes to dependable systems, we recommend: (i) holistic benchmarks pairing quality with cost/latency and safety, (ii) budget-aware retrieval/tool-use policies, and (iii) provenance-aware pipelines that expose uncertainty and deliver traceable evidence. We note the evidence base may be affected by citation-lag from the inclusion thresholds and by English-only, five-library coverage. Funding: Advanced Research and Engineering Centre. Registration: Not registered. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

24 pages, 3009 KB  
Article
SpaceTime: A Deep Similarity Defense Against Poisoning Attacks in Federated Learning
by Geethapriya Thamilarasu and Christian Dunham
Big Data Cogn. Comput. 2025, 9(12), 313; https://doi.org/10.3390/bdcc9120313 - 5 Dec 2025
Viewed by 920
Abstract
Federated learning has gained popularity in recent years to enhance IoT security because the model allows decentralized devices to collaboratively learn a shared model without exchanging raw data. Despite its privacy advantages, federated learning is vulnerable to poisoning attacks, where malicious devices introduce [...] Read more.
Federated learning has gained popularity in recent years to enhance IoT security because the model allows decentralized devices to collaboratively learn a shared model without exchanging raw data. Despite its privacy advantages, federated learning is vulnerable to poisoning attacks, where malicious devices introduce manipulated data or model updates to corrupt the global model. These attacks can degrade the model’s performance or bias its outcomes, making it difficult to ensure the integrity of the learning process across decentralized devices. In this research, our goal is to develop a defense mechanism against poisoning attacks in federated learning models. Specifically, we develop a spacetime model, that combines the three dimensions of space and the one dimension of time into a four-dimensional manifold. Poisoning attacks have complex spatial and time relationships that present identifiable patterns in that manifold. We propose SpaceTime-Deep Similarity Defense (ST-DSD), a deep learning recurrent neural network that includes space and time perceptions to provide a defense against poisoning attacks for federated learning models. The proposed mechanism is built upon a time series regression many-to-one architecture using spacetime relationships to provide an adversarial trained deep learning poisoning defense. Simulation results show that SpaceTime defense outperforms existing solutions for poisoning defenses in IoT environments. Full article
Show Figures

Figure 1

17 pages, 1183 KB  
Article
High-Speed Scientific Computing Using Adaptive Spline Interpolation
by Daniel S. Soper
Big Data Cogn. Comput. 2025, 9(12), 308; https://doi.org/10.3390/bdcc9120308 - 2 Dec 2025
Viewed by 768
Abstract
The increasing scale of modern datasets has created a significant computational bottleneck for traditional scientific and statistical algorithms. To address this problem, the current paper describes and validates a high-performance method based on adaptive spline interpolation that can dramatically accelerate the calculation of [...] Read more.
The increasing scale of modern datasets has created a significant computational bottleneck for traditional scientific and statistical algorithms. To address this problem, the current paper describes and validates a high-performance method based on adaptive spline interpolation that can dramatically accelerate the calculation of foundational scientific and statistical functions. This is accomplished by constructing parsimonious spline models that approximate their target functions within a predefined, highly precise maximum error tolerance. The efficacy of the adaptive spline-based solutions was evaluated through benchmarking experiments that compared spline models against the widely used algorithms in the Python SciPy library for the normal, Student’s t, and chi-squared cumulative distribution functions. Across 30 trials of 10 million computations each, the adaptive spline models consistently achieved a maximum absolute error of no more than 1 × 10−8 while simultaneously ranging between 7.5 and 87.4 times faster than their corresponding SciPy algorithms. All of these improvements in speed were observed to be statistically significant at p < 0.001. The findings establish that adaptive spline interpolation can be both highly accurate and much faster than traditional scientific and statistical algorithms, thereby offering a practical pathway to accelerate both the analysis of large datasets and the progress of scientific inquiry. Full article
Show Figures

Figure 1

26 pages, 4013 KB  
Article
Music Genre Classification Using Prosodic, Stylistic, Syntactic and Sentiment-Based Features
by Erik-Robert Kovacs and Stefan Baghiu
Big Data Cogn. Comput. 2025, 9(11), 296; https://doi.org/10.3390/bdcc9110296 - 19 Nov 2025
Viewed by 3459
Abstract
Romanian popular music has had a storied history across the last century and a half. Incorporating different influences at different times, today it boasts a wide range of both autochthonous and imported genres, such as traditional folk music, rock, rap, pop, and manele, [...] Read more.
Romanian popular music has had a storied history across the last century and a half. Incorporating different influences at different times, today it boasts a wide range of both autochthonous and imported genres, such as traditional folk music, rock, rap, pop, and manele, to name a few. We aim to trace the linguistic differences between the lyrics of these genres using natural language processing and a computational linguistics approach by studying the prosodic, stylistic, syntactic, and sentiment-based features of each genre. For this purpose, we have crawled a dataset of ~14,000 Romanian songs from publicly available websites along with the user-provided genre labels, and characterized each song and each genre, respectively, with regard to these features, discussing similarities and differences. We improve on existing tools for Romanian language natural language processing by building a lexical analysis library well suited to song lyrics or poetry which encodes a set of 17 linguistic features. In addition, we build lexical analysis tools for profanity-based features and improve the SentiLex sentiment analysis library by manually rebalancing its lexemes to overcome the limitations introduced by it having been machine translated into Romanian. We estimate the accuracy gain using a benchmark Romanian sentiment analysis dataset and register a 25% increase in accuracy over the SentiLex baseline. The contribution is meant to describe the characteristics of the Romanian expression of autochthonous as well as international genres and provide technical support to researchers in natural language processing, musicology or the digital humanities in studying the lyrical content of Romanian music. We have released our data and code for research use. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

17 pages, 12830 KB  
Article
Your Eyes Under Pressure: Real-Time Estimation of Cognitive Load with Smooth Pursuit Tracking
by Pierluigi Dell’Acqua, Marco Garofalo, Francesco La Rosa and Massimo Villari
Big Data Cogn. Comput. 2025, 9(11), 288; https://doi.org/10.3390/bdcc9110288 - 13 Nov 2025
Cited by 3 | Viewed by 2265
Abstract
Understanding and accurately estimating cognitive workload is crucial for the development of adaptive, user-centered interactive systems across a variety of domains including augmented reality, automotive driving assistance, and intelligent tutoring systems. Cognitive workload assessment enables dynamic system adaptation to improve user experience and [...] Read more.
Understanding and accurately estimating cognitive workload is crucial for the development of adaptive, user-centered interactive systems across a variety of domains including augmented reality, automotive driving assistance, and intelligent tutoring systems. Cognitive workload assessment enables dynamic system adaptation to improve user experience and safety. In this work, we introduce a novel framework that leverages smooth pursuit eye movements as a non-invasive and temporally precise indicator of mental effort. A key innovation of our approach is the development of trajectory-independent algorithms that address a significant limitation of existing methods, which generally rely on a predefined or known stimulus trajectory. Our framework leverages two solutions to provide accurate cognitive load estimation, without requiring knowledge of the exact target path, based on Kalman filter and B-spline heuristic classifiers. This enables the application of our methods in more naturalistic and unconstrained environments where stimulus trajectories may be unknown. We evaluated these algorithms against classical supervised machine learning models on a publicly available benchmark dataset featuring diverse pursuit trajectories and varying cognitive workload conditions. The results demonstrate competitive performance along with robustness across different task complexities and trajectory types. Moreover, our framework supports real-time inference, making it viable for continuous cognitive workload monitoring. To further enhance deployment feasibility, we propose a federated learning architecture, allowing privacy-preserving adaptation of models across heterogeneous devices without the need to share raw gaze data. This scalable approach mitigates privacy concerns and facilitates collaborative model improvement in distributed real-world scenarios. Experimental findings confirm that metrics derived from smooth pursuit eye movements reliably reflect fluctuations in cognitive states induced by working memory load tasks, substantiating their use for real-time, continuous workload estimation. By integrating trajectory independence, robust classification techniques, and federated privacy-aware learning, our work advances the state of the art in adaptive human–computer interaction. This framework offers a scientifically grounded, privacy-conscious, and practically deployable solution for cognitive workload estimation that can be adapted to diverse application contexts. Full article
Show Figures

Figure 1

27 pages, 1176 KB  
Article
Reconciling Tensions in Security Operations Centers a Paradox Theory Approach
by Mehdi Saadallah, Abbas Shahim and Svetlana Khapova
Big Data Cogn. Comput. 2025, 9(11), 278; https://doi.org/10.3390/bdcc9110278 - 4 Nov 2025
Cited by 1 | Viewed by 1458
Abstract
There is pressure on security operations centers (SOCs) from public and private industries as they are coping with the surge of cyberattacks, which is making the reconciliation of inherent organizational tensions a priority. This study surfaces two persistent tensions: (1) expediency versus authority, [...] Read more.
There is pressure on security operations centers (SOCs) from public and private industries as they are coping with the surge of cyberattacks, which is making the reconciliation of inherent organizational tensions a priority. This study surfaces two persistent tensions: (1) expediency versus authority, and (2) adaptability versus consistency that have remained underexplored in cybersecurity literature. We based the research on empirical data collected across three organizational settings, an international consumer packaged goods, a non-departmental public body based in the Netherlands, and a global managed security service provider. Thus, we reveal these not as isolated trade-offs but as paradoxes that must be continuously navigated within SOC operations. Built upon both empirical analysis and Paradox Theory, we develop a conceptual model that explains how SOCs reconcile these tensions through the strategic integration of artificial intelligence (AI), automation, and human expertise. Our model emphases that AI and automation do not replace human analysts; rather, they allow a new form of organizational balance, through mechanisms such as Dynamic Equilibrium and iterative integration. The model demonstrates how SOCs embed technological and human capabilities to sustain simultaneously agility, consistency, authority, and speed. By reframing AI integration as a process of paradox reconciliation, not as a resistance or automation alone, this study contributes new theoretical insight into the sociotechnical dynamics shaping the future of cybersecurity operations. Full article
Show Figures

Figure 1

18 pages, 2417 KB  
Article
LizAI XT—AI-Accelerated Management Platform for Complex Healthcare Data at Scale, Beyond EMR/EHR and Dashboards
by Trung Tin Nguyen and David Raphael Elmaleh
Big Data Cogn. Comput. 2025, 9(11), 275; https://doi.org/10.3390/bdcc9110275 - 1 Nov 2025
Viewed by 1655
Abstract
In this study, we present LizAI XT, an AI-powered platform designed to automate the structuring, anonymization, and semantic integration of large-scale healthcare data from diverse sources, into one comprehensive table or any designated forms, based on diseases, clinical variables, and/or other defined parameters, [...] Read more.
In this study, we present LizAI XT, an AI-powered platform designed to automate the structuring, anonymization, and semantic integration of large-scale healthcare data from diverse sources, into one comprehensive table or any designated forms, based on diseases, clinical variables, and/or other defined parameters, beyond the creation of a dashboard or visualization. We evaluate the platform’s performance on a cluster of 4x NVIDIA A30 GPU 24GB, with 16 diseases—from deathly cancer and COPD, to conventional ones—ear infections, including a total 16,000 patients, ∼115,000 medical files, and ∼800 clinical variables. LizAI XT structures data from thousands of files into sets of variables for each disease in one file, achieving > 95.0% overall accuracy, while providing exceptional outputs in complicated cases of cancers (99.1%), COPD (98.89%), and asthma (98.12%), without model-overfitting. Data retrieval is sub-second for a variable per patient with a minimal GPU power, which can significantly be improved on more powerful GPUs. LizAI XT uniquely enables fully client-controlled data, complying with strict data security and privacy regulations per region/nation. Our advances complement the existing EMR/EHR, AWS HealthLake, and Google Vertex AI platforms, for healthcare data management and AI development, with large-scalability and expansion at any levels of HMOs, clinics, pharma, and government. Full article
Show Figures

Figure 1

30 pages, 2440 KB  
Article
Adaptive Segmentation and Statistical Analysis for Multivariate Big Data Forecasting
by Desmond Fomo and Aki-Hiro Sato
Big Data Cogn. Comput. 2025, 9(11), 268; https://doi.org/10.3390/bdcc9110268 - 24 Oct 2025
Cited by 1 | Viewed by 1462
Abstract
Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely [...] Read more.
Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely on recency-only windowing that discards informative historical fluctuation patterns, limiting robustness under strict resource budgets. This work makes two core contributions to big data forecasting. First, we establish a formal, multi-dimensional framework for quantifying “data bigness” across statistical, computational, and algorithmic complexities, providing a rigorous foundation for analyzing resource-constrained problems. Second, guided by this framework, we extend and validate the Adaptive High-Fluctuation Recursive Segmentation (AHFRS) algorithm for multivariate time series. By incorporating higher-order statistics such as covariance, skewness, and kurtosis, AHFRS improves predictive accuracy under strict computational budgets. We validate the approach in two stages. First, a real-world case study on a univariate Bitcoin time series provides a practical stress test using a Long Short-Term Memory (LSTM) network as a robust baseline. This validation reveals a significant increase in forecasting robustness, with our method reducing the Root Mean Squared Error (RMSE) by more than 76% in a challenging scenario. Second, its generalizability is established on synthetic multivariate data sets in Finance, Retail, and Healthcare using standard statistical models. Across domains, AHFRS consistently outperforms baselines; in our multivariate Finance simulation, RMSE decreases by up to 62.5% in Finance and Mean Absolute Percentage Error (MAPE) drops by more than 10 percentage points in Healthcare. These results demonstrate that the proposed framework and AHFRS advances the theoretical modeling of data complexity and the design of adaptive, resource-efficient forecasting pipelines for real-world, high-volume data ecosystems. Full article
Show Figures

Figure 1

27 pages, 3065 KB  
Article
Chinese Financial News Analysis for Sentiment and Stock Prediction: A Comparative Framework with Language Models
by Hsiu-Min Chuang, Hsiang-Chih He and Ming-Che Hu
Big Data Cogn. Comput. 2025, 9(10), 263; https://doi.org/10.3390/bdcc9100263 - 16 Oct 2025
Cited by 3 | Viewed by 5787
Abstract
Financial news has a significant impact on investor sentiment and short-term stock price trends. While many studies have applied natural language processing (NLP) techniques to financial forecasting, most have focused on single tasks or English corpora, with limited research in non-English language contexts [...] Read more.
Financial news has a significant impact on investor sentiment and short-term stock price trends. While many studies have applied natural language processing (NLP) techniques to financial forecasting, most have focused on single tasks or English corpora, with limited research in non-English language contexts such as Taiwan. This study develops a joint framework to perform sentiment classification and short-term stock price prediction using Chinese financial news from Taiwan’s top 50 listed companies. Five types of word embeddings—one-hot, TF-IDF, CBOW, skip-gram, and BERT—are systematically compared across 17 traditional, deep, and Transformer models, as well as a large language model (LLaMA3) fully fine-tuned on the Chinese financial texts. To ensure annotation quality, sentiment labels were manually assigned by annotators with finance backgrounds and validated through a double-checking process. Experimental results show that a CNN using skip-gram embeddings achieves the strongest performance among deep learning models, while LLaMA3 yields the highest overall F1-score for sentiment classification. For regression, LSTM consistently provides the most reliable predictive power across different volatility groups, with Bayesian Linear Regression remaining competitive for low-volatility firms. LLaMA3 is the only Transformer-based model to achieve a positive R2 under high-volatility conditions. Furthermore, forecasting accuracy is higher for the five-day horizon than for the fifteen-day horizon, underscoring the increasing difficulty of medium-term forecasting. These findings confirm that financial news provides valuable predictive signals for emerging markets and that short-term sentiment-informed forecasts enhance real-time investment decisions. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

24 pages, 13667 KB  
Article
Integrating Graph Retrieval-Augmented Generation into Prescriptive Recommender Systems
by Marvin Niederhaus, Nico Migenda, Julian Weller, Martin Kohlhase and Wolfram Schenck
Big Data Cogn. Comput. 2025, 9(10), 261; https://doi.org/10.3390/bdcc9100261 - 15 Oct 2025
Viewed by 3476
Abstract
Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users. [...] Read more.
Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users. However, the produced recommendations rely on complex models and optimization techniques that are difficult to understand or justify to non-expert users. Currently, there is a lack of platforms that offer easy integration of domain-specific prescriptive analytics workflows into production environments. In particular, there is no centralized environment and standardized approach for implementing such prescriptive workflows. To address these challenges, large language models (LLMs) can be leveraged to improve interpretability by translating complex recommendations into clear, context-specific explanations, enabling non-experts to grasp the rationale behind the suggested actions. Nevertheless, we acknowledge the inherent black-box nature of LLMs, which may introduce limitations in transparency. To mitigate these limitations and to provide interpretable recommendations based on real user knowledge, a knowledge graph is integrated. In this paper, we present and validate a prescriptive analytics platform that integrates ontology-based graph retrieval-augmented generation (GraphRAG) to enhance decision making by delivering actionable and context-aware recommendations. For this purpose, a knowledge graph is created through a fully automated workflow based on an ontology, which serves as the backbone of the prescriptive platform. Data sources for the knowledge graph are standardized and classified according to the ontology by employing a zero-shot classifier. For user-friendly presentation, we critically examine the usability of GraphRAG in prescriptive analytics platforms. We validate our prescriptive platform in a customer clinic with industry experts in our IoT-Factory, a dedicated research environment. Full article
Show Figures

Figure 1

38 pages, 913 KB  
Article
Towards the Adoption of Recommender Systems in Online Education: A Framework and Implementation
by Alex Martínez-Martínez, Águeda Gómez-Cambronero, Raul Montoliu and Inmaculada Remolar
Big Data Cogn. Comput. 2025, 9(10), 259; https://doi.org/10.3390/bdcc9100259 - 14 Oct 2025
Cited by 3 | Viewed by 3920
Abstract
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, [...] Read more.
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, offering adaptive learning pathways that respond to diverse student needs. For widespread adoption, these systems must align with pedagogical principles while ensuring transparency, interpretability, and seamless integration into Learning Management Systems (LMS). This paper introduces a comprehensive framework and implementation of an ERS designed for platforms such as Moodle. The system integrates big data processing pipelines to support scalability, real-time interaction, and multi-layered personalization, including data collection, preprocessing, recommendation generation, and retrieval. A detailed use case demonstrates its deployment in a real educational environment, underlining both technical feasibility and pedagogical value. Finally, the paper discusses challenges such as data sparsity, learner model complexity, and evaluation of effectiveness, offering directions for future research at the intersection of big data technologies and digital education. By bridging theoretical models with operational platforms, this work contributes to sustainable and data-driven personalization in online learning ecosystems. Full article
Show Figures

Figure 1

36 pages, 2906 KB  
Review
Data Organisation for Efficient Pattern Retrieval: Indexing, Storage, and Access Structures
by Paraskevas Koukaras and Christos Tjortjis
Big Data Cogn. Comput. 2025, 9(10), 258; https://doi.org/10.3390/bdcc9100258 - 13 Oct 2025
Cited by 2 | Viewed by 3015
Abstract
The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval [...] Read more.
The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval of structured patterns. We examine the underlying types of data and pattern outputs, common retrieval operations, and the variety of query types encountered in practice. Key indexing structures are surveyed, including prefix trees, inverted indices, hash-based approaches, and bitmap-based methods, each suited to different pattern representations and workloads. Storage designs are discussed with attention to metadata annotation, format choices, and redundancy mitigation. Query optimisation strategies are reviewed, emphasising index-aware traversal, caching, and ranking mechanisms. This paper also explores scalability through parallel, distributed, and streaming architectures, and surveys current systems and tools, which integrate mining and retrieval capabilities. Finally, we outline pressing challenges and emerging directions, such as supporting real-time and uncertainty-aware retrieval, and enabling semantic, cross-domain pattern access. Additional frontiers include privacy-preserving indexing and secure query execution, along with integration of repositories into machine learning pipelines for hybrid symbolic–statistical workflows. We further highlight the need for dynamic repositories, probabilistic semantics, and community benchmarks to ensure that progress is measurable and reproducible across domains. This review provides a comprehensive foundation for designing next-generation pattern retrieval systems, which are scalable, flexible, and tightly integrated into analytic workflows. The analysis and roadmap offered are relevant across application areas including finance, healthcare, cybersecurity, and retail, where robust and interpretable retrieval is essential. Full article
Show Figures

Figure 1

33 pages, 845 KB  
Review
An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment
by Michalis Savelonas
Big Data Cogn. Comput. 2025, 9(10), 255; https://doi.org/10.3390/bdcc9100255 - 10 Oct 2025
Cited by 4 | Viewed by 5399
Abstract
Accurate segmentation and analysis of thyroid nodules in ultrasound (US) images are essential for the diagnosis and management of thyroid conditions, including cancer. Despite advancements in medical imaging, achieving accurate and efficient segmentation remains a significant challenge due to the complexity and variability [...] Read more.
Accurate segmentation and analysis of thyroid nodules in ultrasound (US) images are essential for the diagnosis and management of thyroid conditions, including cancer. Despite advancements in medical imaging, achieving accurate and efficient segmentation remains a significant challenge due to the complexity and variability of US images. Recently, deep learning (DL) techniques, such as convolutional neural networks (CNNs) and vision transformers (ViTs), have emerged as powerful tools for computer-aided diagnosis (CAD). This review highlights recent advancements in thyroid US image segmentation, focusing on state-of-the-art DL techniques such as contrastive learning, consistency learning, and knowledge-driven DL. We explore various approaches to improve segmentation accuracy, including multi-task learning, self-supervised learning, and methods that minimize reliance on the availability of large, annotated datasets. Additionally, we examine the clinical significance of these methods in differentiating between benign and malignant nodules, as well as their potential for integration into clinically adopted, fully automated CAD systems. By addressing the latest developments and ongoing challenges, this review serves as a comprehensive reference for future research and clinical implementation of thyroid US diagnostics. Full article
Show Figures

Figure 1

18 pages, 478 KB  
Review
A Digital Twin Threat Survey
by Manuel Suárez-Román, Mario Sanz-Rodrigo, Andrés Marín-López and David Arroyo
Big Data Cogn. Comput. 2025, 9(10), 252; https://doi.org/10.3390/bdcc9100252 - 2 Oct 2025
Viewed by 3070
Abstract
Virtual and digital twins are means of high value to characterize, model and control physical systems, providing the basis for a simulation environment and lab. In the case of a digital twin, it is possible to have a replica of a physical environment [...] Read more.
Virtual and digital twins are means of high value to characterize, model and control physical systems, providing the basis for a simulation environment and lab. In the case of a digital twin, it is possible to have a replica of a physical environment by means of reliable sensor networks and accurate data. In this paper we analyse in detail the threats to the reliability of the information extracted from these sensor networks, along with a set of challenges to guarantee data liveness and trustworthiness. Full article
Show Figures

Figure 1

25 pages, 441 KB  
Review
A Meta-Survey of Generative AI in Education: Trends, Challenges, and Research Directions
by Sirine Bouguettaya, Francesco Pupo, Min Chen and Giancarlo Fortino
Big Data Cogn. Comput. 2025, 9(9), 237; https://doi.org/10.3390/bdcc9090237 - 16 Sep 2025
Cited by 4 | Viewed by 10185
Abstract
Education is experiencing a paradigm shift, evolving from traditional learning methods to computer-tool-based education, and now toward the integration of Generative Artificial Intelligence. While classical methods offer structured and standardized learning, they often do not fully address individual learner needs and accessibility. The [...] Read more.
Education is experiencing a paradigm shift, evolving from traditional learning methods to computer-tool-based education, and now toward the integration of Generative Artificial Intelligence. While classical methods offer structured and standardized learning, they often do not fully address individual learner needs and accessibility. The rise of digital technologies introduced adaptive learning platforms, online classrooms, and interactive educational tools, expanding the reach and flexibility of educational systems. Today, Generative Artificial Intelligence tools are redefining the education landscape by personalized learning experiences, automating content generation, and providing real-time feedback. Intelligent tutoring systems and personalized assessments empower students with customized learning pathways that enhance engagement and academic performance. This paper presents a meta-survey that systematically examines the role of Generative Artificial Intelligence in education, following PRISMA guidelines to analyze trends, frameworks, and research outcomes across a curated body of academic literature. Special attention is given to the emergence of commercial Generative Artificial Intelligence tools, which are increasingly embedded in learning environments. A structured comparison framework and research questions guide the review, offering insights into how Generative Artificial Intelligence technologies are shaping pedagogical practices, influencing assessment, and raising new ethical and technical challenges. The paper also explores future directions, highlighting how Generative Artificial Intelligence is driving the emergence of new learning models. Full article
Show Figures

Figure 1

25 pages, 539 KB  
Article
Leadership Uniformity in Timeout-Based Quorum Byzantine Fault Tolerance (QBFT) Consensus
by Andreas Polyvios Delladetsimas, Stamatis Papangelou, Elias Iosif and George Giaglis
Big Data Cogn. Comput. 2025, 9(8), 196; https://doi.org/10.3390/bdcc9080196 - 24 Jul 2025
Cited by 3 | Viewed by 3372
Abstract
This study evaluates leadership uniformity—the degree to which the proposer role is evenly distributed among validator nodes over time—in Quorum-based Byzantine Fault Tolerance (QBFT), a Byzantine Fault-Tolerant (BFT) consensus algorithm used in permissioned blockchain networks. By introducing simulated follower timeouts derived from uniform, [...] Read more.
This study evaluates leadership uniformity—the degree to which the proposer role is evenly distributed among validator nodes over time—in Quorum-based Byzantine Fault Tolerance (QBFT), a Byzantine Fault-Tolerant (BFT) consensus algorithm used in permissioned blockchain networks. By introducing simulated follower timeouts derived from uniform, normal, lognormal, and Weibull distributions, it models a range of network conditions and latency patterns across nodes. This approach integrates Raft-inspired timeout mechanisms into the QBFT framework, enabling a more detailed analysis of leader selection under different network conditions. Three leader selection strategies are tested: Direct selection of the node with the shortest timeout, and two quorum-based approaches selecting from the top 20% and 30% of nodes with the shortest timeouts. Simulations were conducted over 200 rounds in a 10-node network. Results show that leader selection was most equitable under the Weibull distribution with shape k=0.5, which captures delay behavior observed in real-world networks. In contrast, the uniform distribution did not consistently yield the most balanced outcomes. The findings also highlight the effectiveness of quorum-based selection: While choosing the node with the lowest timeout ensures responsiveness in each round, it does not guarantee uniform leadership over time. In low-variability distributions, certain nodes may be repeatedly selected by chance, as similar timeout values increase the likelihood of the same nodes appearing among the fastest. Incorporating controlled randomness through quorum-based voting improves rotation consistency and promotes fairer leader distribution, especially under heavy-tailed latency conditions. However, expanding the candidate pool beyond 30% (e.g., to 40% or 50%) introduced vote fragmentation, which complicated quorum formation in small networks and led to consensus failure. Overall, the study demonstrates the potential of timeout-aware, quorum-based leader selection as a more adaptive and equitable alternative to round-robin approaches, and provides a foundation for developing more sophisticated QBFT variants tailored to latency-sensitive networks. Full article
Show Figures

Figure 1

46 pages, 573 KB  
Systematic Review
State of the Art and Future Directions of Small Language Models: A Systematic Review
by Flavio Corradini, Matteo Leonesi and Marco Piangerelli
Big Data Cogn. Comput. 2025, 9(7), 189; https://doi.org/10.3390/bdcc9070189 - 21 Jul 2025
Cited by 8 | Viewed by 16852
Abstract
Small Language Models (SLMs) have emerged as a critical area of study within natural language processing, attracting growing attention from both academia and industry. This systematic literature review provides a comprehensive and reproducible analysis of recent developments and advancements in SLMs post-2023. Drawing [...] Read more.
Small Language Models (SLMs) have emerged as a critical area of study within natural language processing, attracting growing attention from both academia and industry. This systematic literature review provides a comprehensive and reproducible analysis of recent developments and advancements in SLMs post-2023. Drawing on 70 English-language studies published between January 2023 and January 2025, identified through Scopus, IEEE Xplore, Web of Science, and ACM Digital Library, and focusing primarily on SLMs (including those with up to 7 billion parameters), this review offers a structured overview of the current state of the art and potential future directions. Designed as a resource for researchers seeking an in-depth global synthesis, the review examines key dimensions such as publication trends, visual data representations, contributing institutions, and the availability of public datasets. It highlights prevailing research challenges and outlines proposed solutions, with a particular focus on widely adopted model architectures, as well as common compression and optimization techniques. This study also evaluates the criteria used to assess the effectiveness of SLMs and discusses emerging de facto standards for industry. The curated data and insights aim to support and inform ongoing and future research in this rapidly evolving field. Full article
Show Figures

Figure 1

18 pages, 1663 KB  
Article
CNN-Based Framework for Classifying COVID-19, Pneumonia, and Normal Chest X-Rays
by Cristian Randieri, Andrea Perrotta, Adriano Puglisi, Maria Grazia Bocci and Christian Napoli
Big Data Cogn. Comput. 2025, 9(7), 186; https://doi.org/10.3390/bdcc9070186 - 11 Jul 2025
Cited by 18 | Viewed by 4053
Abstract
This paper describes the development of a CNN model for the analysis of chest X-rays and the automated diagnosis of pneumonia, bacterial or viral, and lung pathologies resulting from COVID-19, offering new insights for further research through the development of an AI-based diagnostic [...] Read more.
This paper describes the development of a CNN model for the analysis of chest X-rays and the automated diagnosis of pneumonia, bacterial or viral, and lung pathologies resulting from COVID-19, offering new insights for further research through the development of an AI-based diagnostic tool, which can be automatically implemented and made available for rapid differentiation between normal pneumonia and COVID-19 starting from X-ray images. The model developed in this work is capable of performing three-class classification, achieving 97.48% accuracy in distinguishing chest X-rays affected by COVID-19 from other pneumonias (bacterial or viral) and from cases defined as normal, i.e., without any obvious pathology. The novelty of our study is represented not only by the quality of the results obtained in terms of accuracy but, above all, by the reduced complexity of the model in terms of parameters and a shorter inference time compared to other models currently found in the literature. The excellent trade-off between the accuracy and computational complexity of our model allows for easy implementation on numerous embedded hardware platforms, such as FPGAs, for the creation of new diagnostic tools to support medical practice. Full article
Show Figures

Figure 1

18 pages, 380 KB  
Article
Gait-Based Parkinson’s Disease Detection Using Recurrent Neural Networks for Wearable Systems
by Carlos Rangel-Cascajosa, Francisco Luna-Perejón, Saturnino Vicente-Diaz and Manuel Domínguez-Morales
Big Data Cogn. Comput. 2025, 9(7), 183; https://doi.org/10.3390/bdcc9070183 - 7 Jul 2025
Cited by 4 | Viewed by 2025
Abstract
Parkinson’s disease is one of the neurodegenerative conditions that has seen a significant increase in prevalence in recent decades. The lack of specific screening tests and notable disease biomarkers, combined with the strain on healthcare systems, leads to delayed detection of the disease, [...] Read more.
Parkinson’s disease is one of the neurodegenerative conditions that has seen a significant increase in prevalence in recent decades. The lack of specific screening tests and notable disease biomarkers, combined with the strain on healthcare systems, leads to delayed detection of the disease, which worsens its progression. The development of diagnostic support tools can support early detection and facilitate timely intervention. The ability of Deep Learning algorithms to identify complex features from clinical data has proven to be a promising approach in various medical domains as support tools. In this study, we present an investigation of different architectures based on Gated Recurrent Neural Networks to assess their effectiveness in identifying subjects with Parkinson’s disease from gait records. Models with Long-Short term Memory (LSTM) and Gated Recurrent Unit (GRU) layers were evaluated. Performance results reach competitive effectiveness values with the current state-of-the-art accuracy (up to 93.75% (average ± SD: 86 ± 5%)), simplifying computational complexity, which represents an advance in the implementation of executable screening and diagnostic support tools in systems with few computational resources in wearable devices. Full article
Show Figures

Figure 1

47 pages, 6244 KB  
Review
Toward the Mass Adoption of Blockchain: Cross-Industry Insights from DeFi, Gaming, and Data Analytics
by Shezon Saleem Mohammed Abdul, Anup Shrestha and Jianming Yong
Big Data Cogn. Comput. 2025, 9(7), 178; https://doi.org/10.3390/bdcc9070178 - 3 Jul 2025
Cited by 3 | Viewed by 18562
Abstract
Blockchain’s promise of decentralised, tamper-resistant services is gaining real traction in three arenas: decentralized finance (DeFi), blockchain gaming, and data-driven analytics. These sectors span finance, entertainment, and information services, offering a representative setting in which to study real-world adoption. This survey analyzes how [...] Read more.
Blockchain’s promise of decentralised, tamper-resistant services is gaining real traction in three arenas: decentralized finance (DeFi), blockchain gaming, and data-driven analytics. These sectors span finance, entertainment, and information services, offering a representative setting in which to study real-world adoption. This survey analyzes how each domain implements blockchain, identifies the incentives that accelerate uptake, and maps the technical and organizational barriers that still limit scale. By examining peer-reviewed literature and recent industry developments, this review distils common design features such as token incentives, verifiable digital ownership, and immutable data governance. It also pinpoints the following domain-specific challenges: capital efficiency in DeFi, asset portability and community engagement in gaming, and high-volume, low-latency querying in analytics. Moreover, cross-sector links are already forming, with DeFi liquidity tools supporting in-game economies and analytics dashboards improving decision-making across platforms. Building on these findings, this paper offers guidance on stronger interoperability and user-centered design and sets research priorities in consensus optimization, privacy-preserving analytics, and inclusive governance. Together, the insights equip developers, policymakers, and researchers to build scalable, interoperable platforms and reuse proven designs while avoiding common pitfalls. Full article
(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)
Show Figures

Figure 1

17 pages, 711 KB  
Article
Boost-Classifier-Driven Fault Prediction Across Heterogeneous Open-Source Repositories
by Philip König, Sebastian Raubitzek, Alexander Schatten, Dennis Toth, Fabian Obermann, Caroline König and Kevin Mallinger
Big Data Cogn. Comput. 2025, 9(7), 174; https://doi.org/10.3390/bdcc9070174 - 2 Jul 2025
Cited by 4 | Viewed by 1498
Abstract
Ensuring reliability, availability, and security in modern software systems hinges on early fault detection, yet predicting which parts of a codebase are most at risk remains a significant challenge. In this paper, we analyze 2.4 million commits drawn from 33 heterogeneous open-source projects, [...] Read more.
Ensuring reliability, availability, and security in modern software systems hinges on early fault detection, yet predicting which parts of a codebase are most at risk remains a significant challenge. In this paper, we analyze 2.4 million commits drawn from 33 heterogeneous open-source projects, spanning healthcare, security tools, data processing, and more. By examining each repository per file and per commit, we derive process metrics (e.g., churn, file age, revision frequency) alongside size metrics and entropy-based indicators of how scattered changes are over time. We train and tune a gradient boosting model to classify bug-prone commits under realistic class-imbalance conditions, achieving robust predictive performance across diverse repositories. Moreover, a comprehensive feature-importance analysis shows that files with long lifespans (high age), frequent edits (revision count), and widely scattered changes (entropy metrics) are especially vulnerable to defects. These insights can help practitioners and researchers prioritize testing and tailor maintenance strategies, ultimately strengthening software dependability. Full article
Show Figures

Figure 1

20 pages, 3062 KB  
Article
Cognitive Networks and Text Analysis Identify Anxiety as a Key Dimension of Distress in Genuine Suicide Notes
by Massimo Stella, Trevor James Swanson, Andreia Sofia Teixeira, Brianne N. Richson, Ying Li, Thomas T. Hills, Kelsie T. Forbush and David Watson
Big Data Cogn. Comput. 2025, 9(7), 171; https://doi.org/10.3390/bdcc9070171 - 27 Jun 2025
Cited by 1 | Viewed by 2109
Abstract
Understanding the mindset of people who die by suicide remains a key research challenge. We map conceptual and emotional word–word co-occurrences in 139 genuine suicide notes and in reference word lists, an Emotional Recall Task, from 200 individuals grouped by high/low depression, anxiety, [...] Read more.
Understanding the mindset of people who die by suicide remains a key research challenge. We map conceptual and emotional word–word co-occurrences in 139 genuine suicide notes and in reference word lists, an Emotional Recall Task, from 200 individuals grouped by high/low depression, anxiety, and stress levels on DASS-21. Positive words cover most of the suicide notes’ vocabulary; however, co-occurrences in suicide notes overlap mostly with those produced by individuals with low anxiety (Jaccard index of 0.42 for valence and 0.38 for arousal). We introduce a “words not said” method: It removes every word that corpus A shares with a comparison corpus B and then checks the emotions of “residual” words in AB. With no leftover emotions, A and B are similar in expressing the same emotions. Simulations indicate this method can classify high/low levels of depression, anxiety and stress with 80% accuracy in a balanced task. After subtracting suicide note words, only the high-anxiety corpus displays no significant residual emotions. Our findings thus pin anxiety as a key latent feature of suicidal psychology and offer an interpretable language-based marker for suicide risk detection. Full article
Show Figures

Figure 1

19 pages, 2755 KB  
Article
Real-Time Algal Monitoring Using Novel Machine Learning Approaches
by Seyit Uguz, Yavuz Selim Sahin, Pradeep Kumar, Xufei Yang and Gary Anderson
Big Data Cogn. Comput. 2025, 9(6), 153; https://doi.org/10.3390/bdcc9060153 - 9 Jun 2025
Cited by 10 | Viewed by 3812
Abstract
Monitoring algal growth rates and estimating microalgae concentration in photobioreactor systems are critical for optimizing production efficiency. Traditional methods—such as microscopy, fluorescence, flow cytometry, spectroscopy, and macroscopic approaches—while accurate, are often costly, time-consuming, labor-intensive, and susceptible to contamination or production interference. To overcome [...] Read more.
Monitoring algal growth rates and estimating microalgae concentration in photobioreactor systems are critical for optimizing production efficiency. Traditional methods—such as microscopy, fluorescence, flow cytometry, spectroscopy, and macroscopic approaches—while accurate, are often costly, time-consuming, labor-intensive, and susceptible to contamination or production interference. To overcome these limitations, this study proposes an automated, real-time, and cost-effective solution by integrating machine learning with image-based analysis. We evaluated the performance of Decision Trees (DTS), Random Forests (RF), Gradient Boosting Machines (GBM), and K-Nearest Neighbors (k-NN) algorithms using RGB color histograms extracted from images of Scenedesmus dimorphus cultures. Ground truth data were obtained via manual cell enumeration under a microscope and dry biomass measurements. Among the models tested, DTS achieved the highest accuracy for cell count prediction (R2 = 0.77), while RF demonstrated superior performance for dry biomass estimation (R2 = 0.66). Compared to conventional methods, the proposed ML-based approach offers a low-cost, non-invasive, and scalable alternative that significantly reduces manual effort and response time. These findings highlight the potential of machine learning–driven imaging systems for continuous, real-time monitoring in industrial-scale microalgae cultivation. Full article
Show Figures

Graphical abstract

32 pages, 2079 KB  
Review
The Use of Large Language Models in Ophthalmology: A Scoping Review on Current Use-Cases and Considerations for Future Works in This Field
by Ye King Clarence See, Khai Shin Alva Lim, Wei Yung Au, Si Yin Charlene Chia, Xiuyi Fan and Zhenghao Kelvin Li
Big Data Cogn. Comput. 2025, 9(6), 151; https://doi.org/10.3390/bdcc9060151 - 6 Jun 2025
Cited by 3 | Viewed by 4146
Abstract
The advancement of generative artificial intelligence (AI) has resulted in its use permeating many areas of life. Amidst this eruption of scientific output, a wide range of research regarding the usage of Large Language Models (LLMs) in ophthalmology has emerged. In this study, [...] Read more.
The advancement of generative artificial intelligence (AI) has resulted in its use permeating many areas of life. Amidst this eruption of scientific output, a wide range of research regarding the usage of Large Language Models (LLMs) in ophthalmology has emerged. In this study, we aim to map out the landscape of LLM applications in ophthalmology, and by consolidating the work carried out, we aim to produce a point of reference to guide the conduct of future works. Eight databases were searched for articles from 2019 to 2024. In total, 976 studies were screened, and a final 49 were included. The study designs and outcomes of these studies were analysed. The performance of LLMs was further analysed in the areas of exam taking and patient education, diagnostic capability, management capability, administration, inaccuracies, and harm. LLMs performed acceptably in most studies, even surpassing humans in some. Despite their relatively good performance, issues pertaining to study design, grading protocols, hallucinations, inaccuracies, and harm were found to be pervasive. LLMs have received considerable attention through their introduction to the public and have found potential applications in the field of medicine, and in particular, ophthalmology. However, using standardised evaluation frameworks and addressing gaps in the current literature when applying LLMs in ophthalmology is recommended through this review. Full article
Show Figures

Figure 1

28 pages, 2486 KB  
Article
A Framework for Rapidly Prototyping Data Mining Pipelines
by Flavio Corradini, Luca Mozzoni, Marco Piangerelli, Barbara Re and Lorenzo Rossi
Big Data Cogn. Comput. 2025, 9(6), 150; https://doi.org/10.3390/bdcc9060150 - 5 Jun 2025
Viewed by 2948
Abstract
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced [...] Read more.
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced competitive advantage. To address this, systems for analyzing data can help prototype data mining pipelines, mitigating the risks of failure and resource wastage, especially when experimenting with novel techniques. Moreover, business experts often lack deep technical expertise and need robust support to validate their pipeline designs quickly. This paper presents Rainfall, a novel framework for rapidly prototyping data mining pipelines, developed through collaborative projects with industry. The framework’s requirements stem from a combination of literature review findings, iterative industry engagement, and analysis of existing tools. Rainfall enables the visual programming, execution, monitoring, and management of data mining pipelines, lowering the barrier for non-technical users. Pipelines are composed of configurable nodes that encapsulate functionalities from popular libraries or custom user-defined code, fostering experimentation. The framework is evaluated through a case study and SWOT analysis with INGKA, a large-scale industry partner, alongside usability testing with real users and validation against scenarios from the literature. The paper then underscores the value of industry–academia collaboration in bridging theoretical innovation with practical application. Full article
Show Figures

Graphical abstract

34 pages, 20058 KB  
Article
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
by Grant Wardle and Teo Sušnjak
Big Data Cogn. Comput. 2025, 9(6), 149; https://doi.org/10.3390/bdcc9060149 - 3 Jun 2025
Cited by 1 | Viewed by 3711
Abstract
Our study investigates how the sequencing of text and image inputs within multi-modal prompts affects the reasoning performance of Large Language Models (LLMs). Through empirical evaluations of three major commercial LLM vendors—OpenAI, Google, and Anthropic—alongside a user study on interaction strategies, we develop [...] Read more.
Our study investigates how the sequencing of text and image inputs within multi-modal prompts affects the reasoning performance of Large Language Models (LLMs). Through empirical evaluations of three major commercial LLM vendors—OpenAI, Google, and Anthropic—alongside a user study on interaction strategies, we develop and validate practical heuristics for optimising multi-modal prompt design. Our findings reveal that modality sequencing is a critical factor influencing reasoning performance, particularly in tasks with varying cognitive load and structural complexity. For simpler tasks involving a single image, positioning the modalities directly impacts model accuracy, whereas in complex, multi-step reasoning scenarios, the sequence must align with the logical structure of inference, often outweighing the specific placement of individual modalities. Furthermore, we identify systematic challenges in multi-hop reasoning within transformer-based architectures, where models demonstrate strong early-stage inference but struggle with integrating prior contextual information in later reasoning steps. Building on these insights, we propose a set of validated, user-centred heuristics for designing effective multi-modal prompts, enhancing both reasoning accuracy and user interaction with AI systems. Our contributions inform the design and usability of interactive intelligent systems, with implications for applications in education, medical imaging, legal document analysis, and customer support. By bridging the gap between intelligent system behaviour and user interaction strategies, this study provides actionable guidance on how users can effectively structure prompts to optimise multi-modal LLM reasoning within real-world, high-stakes decision-making contexts. Full article
Show Figures

Figure 1

44 pages, 1434 KB  
Review
The Importance of AI Data Governance in Large Language Models
by Saurabh Pahune, Zahid Akhtar, Venkatesh Mandapati and Kamran Siddique
Big Data Cogn. Comput. 2025, 9(6), 147; https://doi.org/10.3390/bdcc9060147 - 28 May 2025
Cited by 18 | Viewed by 14736
Abstract
AI data governance is a crucial framework for ensuring that data are utilized in the lifecycle of large language model (LLM) activity, from the development process to the end-to-end testing process, model validation, secure deployment, and operations. This requires the data to be [...] Read more.
AI data governance is a crucial framework for ensuring that data are utilized in the lifecycle of large language model (LLM) activity, from the development process to the end-to-end testing process, model validation, secure deployment, and operations. This requires the data to be managed responsibly, confidentially, securely, and ethically. The main objective of data governance is to implement a robust and intelligent data governance framework for LLMs, which tends to impact data quality management, the fine-tuning of model performance, biases, data privacy laws, security protocols, ethical AI practices, and regulatory compliance processes in LLMs. Effective data governance steps are important for minimizing data breach activity, enhancing data security, ensuring compliance and regulations, mitigating bias, and establishing clear policies and guidelines. This paper covers the foundation of AI data governance, key components, types of data governance, best practices, case studies, challenges, and future directions of data governance in LLMs. Additionally, we conduct a comprehensive detailed analysis of data governance and how efficient the integration of AI data governance must be for LLMs to gain a trustable approach for the end user. Finally, we provide deeper insights into the comprehensive exploration of the relevance of the data governance framework to the current landscape of LLMs in the healthcare, pharmaceutical, finance, supply chain management, and cybersecurity sectors and address the essential roles to take advantage of the approach of data governance frameworks and their effectiveness and limitations. Full article
Show Figures

Figure 1

15 pages, 1196 KB  
Article
Bone Segmentation in Low-Field Knee MRI Using a Three-Dimensional Convolutional Neural Network
by Ciro Listone, Diego Romano and Marco Lapegna
Big Data Cogn. Comput. 2025, 9(6), 146; https://doi.org/10.3390/bdcc9060146 - 28 May 2025
Cited by 1 | Viewed by 2466
Abstract
Bone segmentation in magnetic resonance imaging (MRI) is crucial for clinical and research applications, including diagnosis, surgical planning, and treatment monitoring. However, it remains challenging due to anatomical variability and complex bone morphology. Manual segmentation is time-consuming and operator-dependent, fostering interest in automated [...] Read more.
Bone segmentation in magnetic resonance imaging (MRI) is crucial for clinical and research applications, including diagnosis, surgical planning, and treatment monitoring. However, it remains challenging due to anatomical variability and complex bone morphology. Manual segmentation is time-consuming and operator-dependent, fostering interest in automated methods. This study proposes an automated segmentation method based on a 3D U-Net convolutional neural network to segment the femur, tibia, and patella from low-field MRI scans. Low-field MRI offers advantages in cost, patient comfort, and accessibility but presents challenges related to lower signal quality. Our method achieved a Dice Similarity Coefficient (DSC) of 0.9838, Intersection over Union (IoU) of 0.9682, and Average Hausdorff Distance (AHD) of 0.0223, with an inference time of approximately 3.96 s per volume on a GPU. Although post-processing had minimal impact on metrics, it significantly enhanced the visual smoothness of bone surfaces, which is crucial for clinical use. The final segmentations enabled the creation of clean, 3D-printable bone models, beneficial for preoperative planning. These results demonstrate that the model achieves accurate segmentation with a high degree of overlap compared to manually segmented reference data. This accuracy results from meticulous fine-tuning of the network, along with the application of advanced data augmentation and post-processing techniques. Full article
Show Figures

Figure 1

18 pages, 597 KB  
Article
No-Code Edge Artificial Intelligence Frameworks Comparison Using a Multi-Sensor Predictive Maintenance Dataset
by Juan M. Montes-Sánchez, Plácido Fernández-Cuevas, Francisco Luna-Perejón, Saturnino Vicente-Diaz and Ángel Jiménez-Fernández
Big Data Cogn. Comput. 2025, 9(6), 145; https://doi.org/10.3390/bdcc9060145 - 26 May 2025
Cited by 1 | Viewed by 2704
Abstract
Edge Computing (EC) is one of the proposed solutions to address the problems that the industry is facing when implementing Predictive Maintenance (PdM) implementations that can benefit from Edge Artificial Intelligence (Edge AI) systems. In this work, we have compared six of the [...] Read more.
Edge Computing (EC) is one of the proposed solutions to address the problems that the industry is facing when implementing Predictive Maintenance (PdM) implementations that can benefit from Edge Artificial Intelligence (Edge AI) systems. In this work, we have compared six of the most popular no-code Edge AI frameworks in the market. The comparison considers economic cost, the number of features, usability, and performance. We used a combination of the analytic hierarchy process (AHP) and the technique for order performance by similarity to the ideal solution (TOPSIS) to compare the frameworks. We consulted ten independent experts on Edge AI, four employed in industry and the other six in academia. These experts defined the importance of each criterion by deciding the weights of TOPSIS using AHP. We performed two different classification tests on each framework platform using data from a public dataset for PdM on biomedical equipment. Magnetometer data were used for test 1, and accelerometer data were used for test 2. We obtained the F1 score, flash memory, and latency metrics. There was a high level of consensus between the worlds of academia and industry when assigning the weights. Therefore, the overall comparison ranked the analyzed frameworks similarly. NanoEdgeAIStudio ranked first when considering all weights and industry only weights, and Edge Impulse was the first option when using academia only weights. In terms of performance, there is room for improvement in most frameworks, as they did not reach the metrics of the previously developed custom Edge AI solution. We identified some limitations that should be fixed to improve the comparison method in the future, like adding weights to the feature criteria or increasing the number and variety of performance tests. Full article
Show Figures

Figure 1

25 pages, 2733 KB  
Article
Polarity of Yelp Reviews: A BERT–LSTM Comparative Study
by Rachid Belaroussi, Sié Cyriac Noufe, Francis Dupin and Pierre-Olivier Vandanjon
Big Data Cogn. Comput. 2025, 9(5), 140; https://doi.org/10.3390/bdcc9050140 - 21 May 2025
Cited by 5 | Viewed by 4987
Abstract
With the rapid growth in social network comments, the need for more effective methods to classify their polarity—negative, neutral, or positive—has become essential. Sentiment analysis, powered by natural language processing, has evolved significantly with the adoption of advanced deep learning techniques. Long Short-Term [...] Read more.
With the rapid growth in social network comments, the need for more effective methods to classify their polarity—negative, neutral, or positive—has become essential. Sentiment analysis, powered by natural language processing, has evolved significantly with the adoption of advanced deep learning techniques. Long Short-Term Memory networks capture long-range dependencies in text, while transformers, with their attention mechanisms, excel at preserving contextual meaning and handling high-dimensional, semantically complex data. This study compares the performance of sentiment analysis models based on LSTM and BERT architectures using key evaluation metrics. The dataset consists of business reviews from the Yelp Open Dataset. We tested LSTM-based methods against BERT and its variants—RoBERTa, BERTweet, and DistilBERT—leveraging popular pipelines from the Hugging Face Hub. A class-by-class performance analysis is presented, revealing that more complex BERT-based models do not always guarantee superior results in the classification of Yelp reviews. Additionally, the use of bidirectionality in LSTMs does not necessarily lead to better performance. However, across a diversity of test sets, transformer models outperform traditional RNN-based models, as their generalization capability is greater than that of a simple LSTM model. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

29 pages, 4204 KB  
Article
A Comparative Study of Ensemble Machine Learning and Explainable AI for Predicting Harmful Algal Blooms
by Omer Mermer, Eddie Zhang and Ibrahim Demir
Big Data Cogn. Comput. 2025, 9(5), 138; https://doi.org/10.3390/bdcc9050138 - 20 May 2025
Cited by 14 | Viewed by 3677
Abstract
Harmful algal blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study enhances the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled [...] Read more.
Harmful algal blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study enhances the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled with explainable artificial intelligence (XAI) for interpretability. Using water quality data from 2013 to 2020, various physical, chemical, and biological parameters were analyzed to predict chlorophyll-a (Chl-a) concentrations, which are a commonly used indicator of phytoplankton biomass and a proxy for algal blooms. This study employed multiple ensemble ML models, including random forest (RF), deep forest (DF), gradient boosting (GB), and XGBoost, and compared their performance against individual models, such as support vector machine (SVM), decision tree (DT), and multi-layer perceptron (MLP). The findings revealed that the ensemble models, particularly XGBoost and deep forest (DF), achieved superior predictive accuracy, with R2 values of 0.8517 and 0.8544, respectively. The application of SHapley Additive exPlanations (SHAPs) provided insights into the relative importance of the input features, identifying the particulate organic nitrogen (PON), particulate organic carbon (POC), and total phosphorus (TP) as the critical factors influencing the Chl-a concentrations. This research demonstrates the effectiveness of ensemble ML models for achieving high predictive accuracy, while the integration of XAI enhances model interpretability. The results support the development of proactive water quality management strategies and highlight the potential of advanced ML techniques for environmental monitoring. Full article
(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)
Show Figures

Figure 1

26 pages, 2125 KB  
Article
Adaptive Augmented Reality Architecture for Optimising Assistance and Safety in Industry 4.0
by Ginés Morales Méndez and Francisco del Cerro Velázquez
Big Data Cogn. Comput. 2025, 9(5), 133; https://doi.org/10.3390/bdcc9050133 - 19 May 2025
Cited by 4 | Viewed by 2595
Abstract
The present study proposes adaptive augmented reality (AR) architecture, specifically designed to enhance real-time operator assistance and occupational safety in industrial environments, which is representative of Industry 4.0. The proposed system addresses key challenges in AR adoption, such as the need for dynamic [...] Read more.
The present study proposes adaptive augmented reality (AR) architecture, specifically designed to enhance real-time operator assistance and occupational safety in industrial environments, which is representative of Industry 4.0. The proposed system addresses key challenges in AR adoption, such as the need for dynamic personalisation of instructions based on operator profiles and the mitigation of technical and cognitive barriers. Architecture integrates theoretical modelling, modular design, and real-time adaptability to match instruction complexity with user expertise and environmental conditions. A working prototype was implemented using Microsoft HoloLens 2, Unity 3D, and Vuforia and validated in a controlled industrial scenario involving predictive maintenance and assembly tasks. The experimental results demonstrated statistically significant enhancements in task completion time, error rates, perceived cognitive load, operational efficiency, and safety indicators in comparison with conventional methods. The findings underscore the system’s capacity to enhance both performance and consistency while concomitantly bolstering risk mitigation in intricate operational settings. This study proposes a scalable and modular AR framework with built-in safety and adaptability mechanisms, demonstrating practical benefits for human–machine interaction in Industry 4.0. The present study is subject to certain limitations, including validation in a simulated environment, which limits the direct extrapolation of the results to real industrial scenarios; further evaluation in various operational contexts is required to verify the overall scalability and applicability of the proposed system. It is recommended that future research studies explore the long-term ergonomics, scalability, and integration of emerging technologies in decision support within adaptive AR systems. Full article
Show Figures

Figure 1

21 pages, 12662 KB  
Review
Benchmarking of Anomaly Detection Methods for Industry 4.0: Evaluation, Ranking, and Practical Recommendations
by Aurélie Cools, Mohammed Amin Belarbi and Sidi Ahmed Mahmoudi
Big Data Cogn. Comput. 2025, 9(5), 128; https://doi.org/10.3390/bdcc9050128 - 13 May 2025
Cited by 2 | Viewed by 4690
Abstract
Quality control and predictive maintenance are two essential pillars of Industry 4.0, aiming to optimize production, reduce operational costs, and enhance system reliability. Real-time visual inspection ensures early detection of manufacturing defects, assembly errors, or texture inconsistencies, preventing defective products from reaching customers. [...] Read more.
Quality control and predictive maintenance are two essential pillars of Industry 4.0, aiming to optimize production, reduce operational costs, and enhance system reliability. Real-time visual inspection ensures early detection of manufacturing defects, assembly errors, or texture inconsistencies, preventing defective products from reaching customers. Predictive maintenance leverages sensor data by analyzing vibrations, temperature, and pressure signals to anticipate failures and avoid production downtime. Image-based quality control has become critical in industries such as automotive, electronics, aerospace, and food processing, where visual appearance is a key quality indicator. Although advances in deep learning and computer vision have significantly improved anomaly detection, industrial deployments remain challenged by the scarcity of labeled anomalies and the variability of defects. These issues increasingly lead to the adoption of unsupervised methods and generative approaches, which, despite their effectiveness, introduce substantial computational complexity. We conduct a unified comparison of ten anomaly detection methods, categorizing them according to their reliance on synthetic anomaly generation and their detection strategy, either reconstruction-based or feature-based. All models are trained exclusively on normal data to mirror realistic industrial conditions. Our evaluation framework combines performance metrics such as recall, precision, and their harmonic mean, emphasizing the need to minimize false negatives that could lead to critical production failures. In addition, we assess environmental impact and hardware complexity to better guide method selection. Practical recommendations are provided to balance robustness, operational feasibility, and sustainability in industrial applications. Full article
(This article belongs to the Special Issue Fault Diagnosis and Detection Based on Deep Learning)
Show Figures

Figure 1

20 pages, 1750 KB  
Article
Enhancing Recommendation Systems with Real-Time Adaptive Learning and Multi-Domain Knowledge Graphs
by Zeinab Shahbazi, Rezvan Jalali and Zahra Shahbazi
Big Data Cogn. Comput. 2025, 9(5), 124; https://doi.org/10.3390/bdcc9050124 - 8 May 2025
Cited by 11 | Viewed by 5553
Abstract
In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt [...] Read more.
In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt dynamically to users’ evolving interests across multiple content domains in real-time. To address this gap, the cross-domain adaptive recommendation system (CDARS) is proposed, which integrates real-time behavioral tracking with multi-domain knowledge graphs to refine user preference modeling continuously. Unlike conventional methods that rely on static or historical data, CDARS dynamically adjusts its recommendation strategies based on contextual factors such as real-time engagement, sentiment fluctuations, and implicit preference drifts. Furthermore, a novel explainable adaptive learning (EAL) module was introduced, providing transparent insights into recommendations’ evolving nature, thereby improving user trust and system interpretability. To enable such real-time adaptability, CDARS incorporates multimodal sentiment analysis of user-generated content, behavioral pattern mining (e.g., click timing, revisit frequency), and learning trajectory modeling through time-aware embeddings and incremental updates of user representations. These dynamic signals are mapped into evolving knowledge graphs, forming continuously updated learning charts that drive more context-aware and emotionally intelligent recommendations. Our experimental results on datasets spanning social media, e-commerce, and entertainment domains demonstrate that CDARS significantly enhances recommendation relevance, achieving an average improvement of 7.8% in click-through rate (CTR) and 8.3% in user engagement compared to state-of-the-art models. This research presents a paradigm shift toward truly dynamic and explainable recommendation systems, creating a way for more personalized and user-centric experiences in the digital landscape. Full article
Show Figures

Figure 1

19 pages, 5047 KB  
Article
Robust Anomaly Detection of Multivariate Time Series Data via Adversarial Graph Attention BiGRU
by Yajing Xing, Jinbiao Tan, Rui Zhang and Jiafu Wan
Big Data Cogn. Comput. 2025, 9(5), 122; https://doi.org/10.3390/bdcc9050122 - 8 May 2025
Viewed by 2805
Abstract
Multivariate time series data (MTSD) anomaly detection due to complex spatio-temporal dependencies among sensors and pervasive environmental noise. The existing methods struggle to balance anomaly detection accuracy with robustness against data contamination. Hence, this paper proposes a robust multivariate temporal data anomaly detection [...] Read more.
Multivariate time series data (MTSD) anomaly detection due to complex spatio-temporal dependencies among sensors and pervasive environmental noise. The existing methods struggle to balance anomaly detection accuracy with robustness against data contamination. Hence, this paper proposes a robust multivariate temporal data anomaly detection method based on graph attention for training convolutional neural networks (PGAT-BiGRU-NRA). Firstly, the parallel graph attention (PGAT) mechanism extracts the time-dependent and spatially related features of MTSD to realize the MTSD fusion. Then, a bidirectional gate recurrent unit (BiGRU) is utilized to extract the contextual information of the data to avoid information loss. In addition, reconstructing the noise for adversarial training aims to achieve a more robust anomaly detection of MTSD. The experiments conducted on real industrial equipment datasets evaluate the effectiveness of the method in the task of MTSD, and the comparative experiments verify that the proposed method outperforms the mainstream baseline model. The proposed method achieves anomaly detection and robust performance in noise interference, which provides feasible technical support for the stable operation of industrial equipment in complex environments. Full article
Show Figures

Figure 1

47 pages, 29654 KB  
Review
A Survey on Object-Oriented Assembly and Disassembly Operations in Nuclear Applications
by Wenxing Liu, Ipek Caliskanelli, Hanlin Niu, Kaiqiang Zhang and Robert Skilton
Big Data Cogn. Comput. 2025, 9(5), 118; https://doi.org/10.3390/bdcc9050118 - 28 Apr 2025
Viewed by 2447
Abstract
Nuclear environments demand exceptional precision, reliability, and safety, given the high stakes involved in handling radioactive materials and maintaining reactor systems. Object-oriented assembly and disassembly operations in nuclear applications represent a cutting-edge approach to managing complex, high-stakes operations with enhanced precision and safety. [...] Read more.
Nuclear environments demand exceptional precision, reliability, and safety, given the high stakes involved in handling radioactive materials and maintaining reactor systems. Object-oriented assembly and disassembly operations in nuclear applications represent a cutting-edge approach to managing complex, high-stakes operations with enhanced precision and safety. This paper discusses the challenges associated with nuclear robotic remote operations, summarizes current methods for handling object-oriented assembly and disassembly operations, and explores potential future research directions in this field. Object-oriented assembly and disassembly operations are vital in nuclear applications due to their ability to manage complexity, ensure precision, and enhance safety and reliability, all of which are paramount in the demanding and high-risk environment of nuclear technology. Full article
(This article belongs to the Special Issue Field Robotics and Artificial Intelligence (AI))
Show Figures

Figure 1

14 pages, 1934 KB  
Article
Evaluating Deep Learning Architectures for Breast Tumor Classification and Ultrasound Image Detection Using Transfer Learning
by Christopher Kormpos, Fotios Zantalis, Stylianos Katsoulis and Grigorios Koulouras
Big Data Cogn. Comput. 2025, 9(5), 111; https://doi.org/10.3390/bdcc9050111 - 23 Apr 2025
Cited by 5 | Viewed by 3699
Abstract
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images, [...] Read more.
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images, whether sourced from healthcare professionals or individuals, are valid and relevant for analysis. To address this, we propose an initial binary classification filter to distinguish between relevant and irrelevant images, ensuring only meaningful data proceeds to subsequent analysis. However, the primary focus of this study lies in investigating the performance of a hierarchical two-tier classification architecture compared to a traditional flat three-class classification model, by employing a well-established breast ultrasound images dataset. Specifically, we explore whether sequentially breaking down the problem into binary classifications, first identifying normal versus tumorous tissue and then distinguishing benign from malignant tumors, yields better accuracy and robustness than directly classifying all three categories in a single step. Using a range of evaluation metrics, the hierarchical architecture demonstrates notable advantages in certain critical aspects of model performance. The findings of this study provide valuable guidance for selecting the optimal architecture for the final model, facilitating its seamless integration into a web application for deployment. These insights are further anticipated to advance future algorithm development and broaden the potential of the research applicability across diverse fields. Full article
Show Figures

Figure 1

21 pages, 541 KB  
Article
Cognitive Computing with Large Language Models for Student Assessment Feedback
by Noorhan Abbas and Eric Atwell
Big Data Cogn. Comput. 2025, 9(5), 112; https://doi.org/10.3390/bdcc9050112 - 23 Apr 2025
Cited by 4 | Viewed by 2387
Abstract
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B [...] Read more.
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B and CodeLlama-7B, to streamline feedback generation for student reports containing both Python programming elements and English narrative content. The findings indicate that these models can provide contextually appropriate feedback on both technical Python coding and English specification and documentation. They effectively identified coding weaknesses and provided constructive suggestions for improvement, as well as insightful feedback on English language quality, structure, and clarity in report writing. These results contribute to the growing body of knowledge on automated assessment feedback in higher education, offering practical insights for institutions considering the implementation of open-source LLMs in their workflows. There are around 22 thousand assessment submissions per year in the School of Computer Science, which is one of eight schools in the Faculty of Engineering and Physical Sciences, which is one of seven faculties in the University of Leeds, which is one of one hundred and sixty-six universities in the UK, so there is clear potential for our methods to scale up to millions of assessment submissions. This study also examines the limitations of current approaches and proposes potential enhancements. The findings support a hybrid system where cognitive computing manages routine tasks and educators focus on complex, personalised evaluations, enhancing feedback quality, consistency, and efficiency in educational settings. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
21 pages, 1529 KB  
Article
Semantic-Driven Approach for Validation of IoT Streaming Data in Trustable Smart City Decision-Making and Monitoring Systems
by Oluwaseun Bamgboye, Xiaodong Liu, Peter Cruickshank and Qi Liu
Big Data Cogn. Comput. 2025, 9(4), 108; https://doi.org/10.3390/bdcc9040108 - 21 Apr 2025
Cited by 1 | Viewed by 2148
Abstract
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. [...] Read more.
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. This paper describes a semantic IoT streaming data validation approach to provide a semantic IoT data model and process IoT streaming data with the semantic stream processing systems to check the quality requirements of IoT streams. The proposed approach enhances the understanding of smart city data while supporting real-time, data-driven decision-making and monitoring processes. A publicly available sensor dataset collected from a busy road in Milan city is constructed, annotated and semantically processed by the proposed approach and its architecture. The architecture, built on a robust semantic-based system, incorporates a reasoning technique based on forward rules, which is integrated within the semantic stream query processing system. It employs serialized Resource Description Framework (RDF) data formats to enhance stream expressiveness and enables the real-time validation of missing and inconsistent data streams within continuous sliding-window operations. The effectiveness of the approach is validated by deploying multiple RDF stream instances to the architecture before evaluating its accuracy and performance (in terms of reasoning time). The approach underscores the capability of semantic technology in sustaining the validation of IoT streaming data by accurately identifying up to 99% of inconsistent and incomplete streams in each streaming window. Also, it can maintain the performance of the semantic reasoning process in near real time. The approach provides an enhancement to data quality and credibility, capable of providing near-real-time decision support mechanisms for critical smart city applications, and facilitates accurate situational awareness across both the application and operational levels of the smart city. Full article
Show Figures

Figure 1

31 pages, 14157 KB  
Article
Assessing the Impact of Temperature and Precipitation Trends of Climate Change on Agriculture Based on Multiple Global Circulation Model Projections in Malta
by Benjamin Mifsud Scicluna and Charles Galdies
Big Data Cogn. Comput. 2025, 9(4), 105; https://doi.org/10.3390/bdcc9040105 - 17 Apr 2025
Cited by 4 | Viewed by 3811
Abstract
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison [...] Read more.
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison project phase 5 (CMIP5) models under two Representative Concentration pathways (RCPs). Through statistical and spatial analysis, the study demonstrates that climate change will have significant adverse effects on Maltese agriculture. Regardless of the RCP scenario considered, projections indicate a substantial increase in temperature and a decline in precipitation, exacerbating aridity and intensifying heat stress. These changes are expected to reduce soil moisture availability and challenge traditional agricultural practices. The study identifies the Western District as a relatively more favourable area for crop cultivation due to its comparatively lower temperatures, whereas the Northern and South Eastern peripheries are projected to experience more severe heat stress. Adaptation strategies, including the selection of heat-tolerant crop varieties such as Tetyda and Finezja, optimised water management techniques, and intercropping practices, are proposed to enhance agricultural resilience. This study is among the few comprehensive assessments of bioclimatic and physical factors affecting Maltese agriculture and highlights the urgent need for targeted adaptation measures to safeguard food production in the region. Full article
Show Figures

Figure 1

23 pages, 2189 KB  
Article
From Rating Predictions to Reliable Recommendations in Collaborative Filtering: The Concept of Recommendation Reliability Classes
by Dionisis Margaris, Costas Vassilakis and Dimitris Spiliotopoulos
Big Data Cogn. Comput. 2025, 9(4), 106; https://doi.org/10.3390/bdcc9040106 - 17 Apr 2025
Viewed by 1514
Abstract
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to [...] Read more.
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to them. Collaborative filtering is a popular recommender system technique which generates rating prediction scores by blending the ratings that users with similar preferences have previously given to these products. However, predictions may entail errors, which will either lead to recommending products that the users would not accept or failing to recommend products that the users would actually accept. The first case is considered much more critical, since the recommender system will lose a significant amount of reliability and consequently interest. In this paper, after performing a study on rating prediction confidence factors in collaborative filtering, (a) we introduce the concept of prediction reliability classes, (b) we rank these classes in relation to the utility of the rating predictions belonging to each class, and (c) we present a collaborative filtering recommendation algorithm which exploits these reliability classes for prediction formulation. The efficacy of the presented algorithm is evaluated through an extensive multi-parameter evaluation process, which demonstrates that it significantly enhances recommendation quality. Full article
Show Figures

Figure 1

20 pages, 4739 KB  
Perspective
LLM Fine-Tuning: Concepts, Opportunities, and Challenges
by Xiao-Kun Wu, Min Chen, Wanyi Li, Rui Wang, Limeng Lu, Jia Liu, Kai Hwang, Yixue Hao, Yanru Pan, Qingguo Meng, Kaibin Huang, Long Hu, Mohsen Guizani, Naipeng Chao, Giancarlo Fortino, Fei Lin, Yonglin Tian, Dusit Niyato and Fei-Yue Wang
Big Data Cogn. Comput. 2025, 9(4), 87; https://doi.org/10.3390/bdcc9040087 - 2 Apr 2025
Cited by 51 | Viewed by 19858
Abstract
As a foundation of large language models, fine-tuning drives rapid progress, broad applicability, and profound impacts on human–AI collaboration, surpassing earlier technological advancements. This paper provides a comprehensive overview of large language model (LLM) fine-tuning by integrating hermeneutic theories of human comprehension, with [...] Read more.
As a foundation of large language models, fine-tuning drives rapid progress, broad applicability, and profound impacts on human–AI collaboration, surpassing earlier technological advancements. This paper provides a comprehensive overview of large language model (LLM) fine-tuning by integrating hermeneutic theories of human comprehension, with a focus on the essential cognitive conditions that underpin this process. Drawing on Gadamer’s concepts of Vorverständnis, Distanciation, and the Hermeneutic Circle, the paper explores how LLM fine-tuning evolves from initial learning to deeper comprehension, ultimately advancing toward self-awareness. It examines the core principles, development, and applications of fine-tuning techniques, emphasizing its growing significance across diverse field and industries. The paper introduces a new term, “Tutorial Fine-Tuning (TFT)”, which annotates a process of intensive tuition given by a “tutor” to a small number of “students”, to define the latest round of LLM fine-tuning advancements. By addressing key challenges associated with fine-tuning, including ensuring adaptability, precision, credibility and reliability, this paper explores potential future directions for the co-evolution of humans and AI. By bridging theoretical perspectives with practical implications, this work provides valuable insights into the ongoing development of LLMs, emphasizing their potential to achieve higher levels of cognitive and operational intelligence. Full article
Show Figures

Figure 1

26 pages, 30835 KB  
Article
Uncertainty-Aware δ-GLMB Filtering for Multi-Target Tracking
by M. Hadi Sepanj, Saed Moradi, Zohreh Azimifar and Paul Fieguth
Big Data Cogn. Comput. 2025, 9(4), 84; https://doi.org/10.3390/bdcc9040084 - 31 Mar 2025
Cited by 1 | Viewed by 1885
Abstract
The δ-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in [...] Read more.
The δ-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in the field. Theoretically, the δ-GLMB filter handles uncertainties in measurements in its filtering procedure. However, in practice, degeneration of the measurement quality affects the performance of this filter. In this paper, we discuss the effects of increasing measurement uncertainty on the δ-GLMB filter and also propose two heuristic methods to improve the performance of the filter in such conditions. The base idea of the proposed methods is to utilise the information stored in the history of the filtering procedure, which can be used to decrease the measurement uncertainty effects on the filter. Since GLMB filters have shown good results in the field of multi-target tracking, an uncertainty-immune δ-GLMB can serve as a strong tool in this area. In this study, the results indicate that the proposed heuristic ideas can improve the performance of filtering in the presence of uncertain observations. Experimental evaluations demonstrate that the proposed methods enhance track continuity and robustness, particularly in scenarios with low detection rates and high clutter, while maintaining computational feasibility. Full article
Show Figures

Figure 1

38 pages, 9923 KB  
Article
A Verifiable, Privacy-Preserving, and Poisoning Attack-Resilient Federated Learning Framework
by Washington Enyinna Mbonu, Carsten Maple, Gregory Epiphaniou and Christo Panchev
Big Data Cogn. Comput. 2025, 9(4), 85; https://doi.org/10.3390/bdcc9040085 - 31 Mar 2025
Cited by 3 | Viewed by 3292
Abstract
Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a [...] Read more.
Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a malicious server can indirectly obtain sensitive information through shared gradients. On the other hand, the correctness of the global model can be corrupted through poisoning attacks from malicious clients using carefully manipulated updates. Many related works on secure aggregation and poisoning attack detection have been proposed and applied in various scenarios to address these two issues. Nevertheless, existing works are based on the trust confidence that the server will return correctly aggregated results to the participants. However, a malicious server may return false aggregated results to participants. It is still an open problem to simultaneously preserve users’ privacy and defend against poisoning attacks while enabling participants to verify the correctness of aggregated results from the server. In this paper, we propose a privacy-preserving and poisoning attack-resilient federated learning framework that supports the verification of aggregated results from the server. Specifically, we design a zero-trust dual-server architectural framework instead of a traditional single-server scheme based on trust. We exploit additive secret sharing to eliminate the single point of exposure of the training data and implement a weight selection and filtering strategy to enhance robustness to poisoning attacks while supporting the verification of aggregated results from the servers. Theoretical analysis and extensive experiments conducted on real-world data demonstrate the practicability of our proposed framework. Full article
Show Figures

Figure 1

20 pages, 496 KB  
Article
GenAI Learning for Game Design: Both Prior Self-Transcendent Pursuit and Material Desire Contribute to a Positive Experience
by Dongpeng Huang and James E. Katz
Big Data Cogn. Comput. 2025, 9(4), 78; https://doi.org/10.3390/bdcc9040078 - 27 Mar 2025
Cited by 4 | Viewed by 1927
Abstract
This study explores factors influencing positive experiences with generative AI (GenAI) in a learning game design context. Using a sample of 26 master’s-level students in a course on AI’s societal aspects, this study examines the impact of (1) prior knowledge and attitudes toward [...] Read more.
This study explores factors influencing positive experiences with generative AI (GenAI) in a learning game design context. Using a sample of 26 master’s-level students in a course on AI’s societal aspects, this study examines the impact of (1) prior knowledge and attitudes toward technology and learning, and (2) personal value orientations. Results indicated that both students’ self-transcendent goals and desire for material benefits have positive correlations with collaborative, cognitive, and affective outcomes. However, self-transcendent goals are a stronger predictor, as determined by stepwise regression analysis. Attitudes toward technology were positively associated with cognitive and affective outcomes during the first week, though this association did not persist into the second week. Most other attitudinal variables were not associated with collaborative or cognitive outcomes but were linked to negative affect. These findings suggest that students’ personal values correlate more strongly with the collaborative, cognitive, and affective aspects of using GenAI for educational game design than their attitudinal attributes. This result may indicate that the design experience neutralizes the effect of earlier attitudes towards technology, with major influences deriving from personal value orientations. If these findings are borne out, this study has implications for the utility of current educational efforts to change students’ attitudes towards technology, especially those that encourage more women to study STEM topics. Thus, it may be that, rather than pro-technology instruction, a focus on value orientations would be a more effective way to encourage diverse students to participate in STEM programs. Full article
Show Figures

Figure 1

21 pages, 2021 KB  
Article
A Data Mining Approach to Identify NBA Player Quarter-by-Quarter Performance Patterns
by Dimitrios Iatropoulos, Vangelis Sarlis and Christos Tjortjis
Big Data Cogn. Comput. 2025, 9(4), 74; https://doi.org/10.3390/bdcc9040074 - 25 Mar 2025
Cited by 14 | Viewed by 12015
Abstract
Sports analytics is a fast-evolving domain using advanced data science methods to find useful insights. This study explores the way NBA player performance metrics evolve from quarter to quarter and affect game outcomes. Using Association Rule Mining, we identify key offensive, defensive, and [...] Read more.
Sports analytics is a fast-evolving domain using advanced data science methods to find useful insights. This study explores the way NBA player performance metrics evolve from quarter to quarter and affect game outcomes. Using Association Rule Mining, we identify key offensive, defensive, and overall impact metrics that influence success in both regular-season and playoff contexts. Defensive metrics become more critical in late-game situations, while offensive efficiency is paramount in the playoffs. Ball handling peaks in the second quarter, affecting early momentum, while overall impact metrics, such as Net Rating and Player Impact Estimate, consistently correlate with winning. In the collected dataset we performed preprocessing, applying advanced anomaly detection and discretization techniques. By segmenting performance into five categories—Offense, Defense, Ball Handling, Overall Impact, and Tempo—we uncovered strategic insights for teams, coaches, and analysts. Results emphasize the importance of managing player fatigue, optimizing lineups, and adjusting strategies based on quarter-specific trends. The analysis provides actionable recommendations for coaching decisions, roster management, and player evaluation. Future work can extend this approach to other leagues and incorporate additional contextual factors to refine evaluation and predictive models. Full article
Show Figures

Figure 1

30 pages, 2168 KB  
Article
Generation Z’s Travel Behavior and Climate Change: A Comparative Study for Greece and the UK
by Athanasios Demiris, Grigorios Fountas, Achille Fonzone and Socrates Basbas
Big Data Cogn. Comput. 2025, 9(3), 70; https://doi.org/10.3390/bdcc9030070 - 17 Mar 2025
Cited by 7 | Viewed by 5921
Abstract
Climate change is one of the most pressing global threats, endangering the sustainability of the planet and quality of life, whilst urban mobility significantly contributes to exacerbating its effects. Recently, policies aimed at mitigating these effects have been implemented, emphasizing the promotion of [...] Read more.
Climate change is one of the most pressing global threats, endangering the sustainability of the planet and quality of life, whilst urban mobility significantly contributes to exacerbating its effects. Recently, policies aimed at mitigating these effects have been implemented, emphasizing the promotion of sustainable travel culture. Prior research has indicated that both environmental awareness and regulatory efforts could encourage the shift towards greener mobility; however, factors that affect young people’s travel behavior remain understudied. This study examined whether and how climate change impacts travel behavior, particularly among Generation Z in Greece. A comprehensive online survey was conducted, from 31 March to 8 April 2024, within a Greek academic community, yielding 904 responses from Generation Z individuals. The design of the survey was informed by an adaptation of Triandis’ Theory of Interpersonal Behavior. The study also incorporated a comparative analysis using data from the UK’s National Travel Attitudes Survey (NTAS), offering insights from a different cultural and socio-economic context. Blending an Exploratory Factor Analysis and latent variable ordered probit and logit models, the key determinants of the willingness to reduce car use and self-reported reduction in car use in response to climate change were identified. The results indicate that emotional factors, social roles, and norms, along with socio-demographic characteristics, current behaviors, and local environmental concerns, significantly influence car-related travel choices among Generation Z. For instance, concerns about local air quality are consistently correlated with a higher likelihood of having already reduced car use due to climate change and a higher willingness to reduce car travel in the future. The NTAS data reveal that flexibility in travel habits and social norms are critical determinants of the willingness to reduce car usage. The findings of the study highlight the key role of policy interventions, such as the implementation of Low-Emission Zones, leveraging social media for environmental campaigns, and enhancing infrastructure for active travel and public transport to foster broader cultural shifts towards sustainable travel behavior among Generation Z. Full article
Show Figures

Figure 1

23 pages, 528 KB  
Article
Defining, Detecting, and Characterizing Power Users in Threads
by Gianluca Bonifazi, Christopher Buratti, Enrico Corradini, Michele Marchetti, Federica Parlapiano, Domenico Ursino and Luca Virgili
Big Data Cogn. Comput. 2025, 9(3), 69; https://doi.org/10.3390/bdcc9030069 - 16 Mar 2025
Cited by 3 | Viewed by 2600
Abstract
Threads is a new social network that was launched by Meta in July 2023 and conceived as a direct alternative to X. It is a unique case study in the social network landscape, as it is content-based like X, but has an Instagram-based [...] Read more.
Threads is a new social network that was launched by Meta in July 2023 and conceived as a direct alternative to X. It is a unique case study in the social network landscape, as it is content-based like X, but has an Instagram-based growth model, which makes it significantly different from X. As it was launched recently, studies on Threads are still scarce. One of the most common investigations in social networks regards power users (also called influencers, lead users, influential users, etc.), i.e., those users who can significantly influence information dissemination, user behavior, and ultimately the current dynamics and future development of a social network. In this paper, we want to contribute to the knowledge of Threads by showing that there are indeed power users in this social network and then attempt to understand the main features that characterize them. The definition of power users that we adopt here is novel and leverages the four classical centrality measures of Social Network Analysis. This ensures that our study of power users can benefit from the enormous knowledge on centrality measures that has accumulated in the literature over the years. In order to conduct our analysis, we had to build a Threads dataset, as none existed in the literature that contained the information necessary for our studies. Once we built such a dataset, we decided to make it open and thus available to all researchers who want to perform analyses on Threads. This dataset, the new definition of power users, and the characterization of Threads power users are the main contributions of this paper. Full article
Show Figures

Figure 1

Back to TopTop