Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Computer Science Applications)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 24.5 days after submission; acceptance to publication is undertaken in 4.6 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.
Impact Factor:
4.4 (2024);
5-Year Impact Factor:
4.2 (2024)
Latest Articles
Artificial Intelligence in Data Governance for Financial Decision-Making: A Systematic Review
Big Data Cogn. Comput. 2026, 10(1), 8; https://doi.org/10.3390/bdcc10010008 (registering DOI) - 25 Dec 2025
Abstract
Artificial intelligence (AI) has been increasingly embedded within data-driven financial decision-making; however, its effectiveness was found to remain dependent upon the maturity of data governance frameworks. This systematic review was conducted in accordance with PRISMA 2020 guidelines to synthesise evidence from 1155 Scopus-indexed
[...] Read more.
Artificial intelligence (AI) has been increasingly embedded within data-driven financial decision-making; however, its effectiveness was found to remain dependent upon the maturity of data governance frameworks. This systematic review was conducted in accordance with PRISMA 2020 guidelines to synthesise evidence from 1155 Scopus-indexed studies published between 2015 and 2025. A mixed-methods design combining corpus analysis, quantile radar regression, and radar visualisation of structural equation modelling (SEM) was employed. Empirical validation was found to demonstrate a robust model fit (CFI = 0.947; RMSEA = 0.041). Governance maturity was confirmed as a mediating construct ( = 0.73) linking AI integration ( = 0.76) to financial outcomes ( = 0.71). The findings were found to indicate that algorithmic capacity alone does not ensure decision quality without transparent, auditable, and ethically grounded governance systems. A quantile-sensitive radar visualisation is advanced in this review, offering conceptual and methodological novelty for explainable, responsible, and data-centric financial analytics. This study is found to contribute to the ongoing discourse on sustainable digital transformation within AI-enabled financial ecosystems.
Full article
(This article belongs to the Special Issue Application of Digital Technology in Financial Development)
►
Show Figures
Open AccessArticle
Empirical Evaluation of Big Data Stacks: Performance and Design Analysis of Hadoop, Modern, and Cloud Architectures
by
Widad Elouataoui and Youssef Gahi
Big Data Cogn. Comput. 2026, 10(1), 7; https://doi.org/10.3390/bdcc10010007 - 24 Dec 2025
Abstract
The proliferation of big data applications across various industries has led to a paradigm shift in data architecture, with traditional approaches giving way to more agile and scalable frameworks. The evolution of big data architecture began with the emergence of the Hadoop-based data
[...] Read more.
The proliferation of big data applications across various industries has led to a paradigm shift in data architecture, with traditional approaches giving way to more agile and scalable frameworks. The evolution of big data architecture began with the emergence of the Hadoop-based data stack, leveraging technologies like Hadoop Distributed File System (HDFS) and Apache Spark for efficient data processing. However, recent years have seen a shift towards modern data stacks, offering flexibility and diverse toolsets tailored to specific use cases. Concurrently, cloud computing has revolutionized big data management, providing unparalleled scalability and integration capabilities. Despite their benefits, navigating these data stack paradigms can be challenging. While existing literature offers valuable insights into individual data stack paradigms, there remains a dearth of studies that offer practical, in-depth comparisons of these paradigms across the entire big data value chain. To address this gap in the field, this paper examines three main big data stack paradigms: the Hadoop data stack, modern data stack, and cloud-based data stack. Indeed, we conduct in this study an exhaustive architectural comparison of these stacks covering the entire big data value chain from data acquisition to exposition. Moreover, this study extends beyond architectural considerations to include end-to-end use case implementations for a comprehensive evaluation of each stack. Using a large dataset of Amazon reviews, different data stack scenarios are implemented and compared. Furthermore, the paper explores critical factors such as data integration, implementation costs, and ease of deployment to provide researchers and practitioners with a relevant and up-to-date reference for navigating the complex landscape of big data technologies and making informed decisions about data strategies.
Full article
(This article belongs to the Topic Big Data and Artificial Intelligence, 3rd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation
by
Aswathi Padmavilochanan, Veena Gangadharan, Tarek Rashed and Amritha Natarajan
Big Data Cogn. Comput. 2026, 10(1), 6; https://doi.org/10.3390/bdcc10010006 - 24 Dec 2025
Abstract
►▼
Show Figures
This interdisciplinary pilot study examines the use of Natural Language Processing (NLP) techniques, specifically Large Language Models (LLMs) with Prompt Engineering (PE), to analyze economic vulnerability from qualitative self-narratives. Seventy narratives from twenty-five women in the Palk Bay coastal region of Rameshwaram, India
[...] Read more.
This interdisciplinary pilot study examines the use of Natural Language Processing (NLP) techniques, specifically Large Language Models (LLMs) with Prompt Engineering (PE), to analyze economic vulnerability from qualitative self-narratives. Seventy narratives from twenty-five women in the Palk Bay coastal region of Rameshwaram, India were analyzed using a schema adapted from a contextual empowerment framework. The study operationalizes theoretical constructs into structured Information Extraction (IE) templates, enabling systematic identification of multiple vulnerability aspects, contributing factors, and experiential expressions. Prompt templates were iteratively refined and validated through dual-annotator review, achieving an F1-score of 0.78 on a held-out subset. Extracted elements were examined through downstream analysis, including pattern grouping and graph-based visualization, to reveal co-occurrence structures and recurring vulnerability configurations across narratives. The findings demonstrate that LLMs, when aligned with domain-specific conceptual models and supported by human-in-the-loop validation, can enable interpretable and replicable analysis of self-narratives. While findings are bounded by the pilot scale and community-specific context, the approach supports translation of narrative evidence into community-level program design and targeted grassroots outreach, with planned expansion to multi-site, multilingual datasets for broader applicability.
Full article

Figure 1
Open AccessArticle
Machine Learning Based Impact Sensing Using Piezoelectric Sensors: From Simulated Training Data to Zero-Shot Experimental Application
by
Petros Gkertzos, Johannes Gerritzen, Constantinos Tsakonas, Stefanos H. Panagiotou, Athanasios Kotzakolios, Ioannis Katsidimas, Andreas Hornig, Siavash Ghiasvand, Maik Gude, Vassilis Kostopoulos and Sotiris Nikoletseas
Big Data Cogn. Comput. 2026, 10(1), 5; https://doi.org/10.3390/bdcc10010005 (registering DOI) - 23 Dec 2025
Abstract
Modern impact monitoring systems combine multiple inputs with machine learning (ML) models for impact detection, localization, and event assessment. Their accuracy relies on large, event-representative datasets, used for algorithmic development and ML model training. High-fidelity numerical models can provide augmented datasets by overcoming
[...] Read more.
Modern impact monitoring systems combine multiple inputs with machine learning (ML) models for impact detection, localization, and event assessment. Their accuracy relies on large, event-representative datasets, used for algorithmic development and ML model training. High-fidelity numerical models can provide augmented datasets by overcoming the cost and time limitations of experimental methods. This research presents an end-to-end numerical methodology for impact detection based on simulation (training) and experimental (testing) data. Initially, a finite element model (FEM) of our experimental setup utilizing piezoelectric transducer (PZT) sensors mounted on a thermoplastic plate is created. From the experimental impact signals, a few consistent cases are identified for feature extraction. A design of experiments explores the range of each parameter, and through surrogate optimization, the material and piezoelectric properties of the setup are determined. Subsequently, a virtual dataset, involving multiple impact cases, is created to train the ML models performing impact detection. Testing with experimental data shows results consistent with literature studies that used only experimental data for both training and testing. This work provides a systematic methodology for representative dataset generation and impact monitoring through ML, while addressing accurate FEM parameter identification from a few experimental tries.
Full article
(This article belongs to the Special Issue Recent Advances in Machine Learning Methods for Imperfect Large-Scale Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Fine-Tuning LLaMA2 for Summarizing Discharge Notes: Evaluating the Role of Highlighted Information
by
Mahshad Koohi Habibi Dehkordi, Yehoshua Perl, Fadi P. Deek and Hao Liu
Big Data Cogn. Comput. 2026, 10(1), 4; https://doi.org/10.3390/bdcc10010004 - 22 Dec 2025
Abstract
This study investigates whether incorporating highlighted information in discharge notes improves the quality of the summaries generated by Large Language Models (LLMs). Specifically, it evaluates the effect of using highlighted versus unhighlighted inputs for fine-tuning LLaMA2-13B model for summarization tasks. We fine-tuned LlaMA2-13B
[...] Read more.
This study investigates whether incorporating highlighted information in discharge notes improves the quality of the summaries generated by Large Language Models (LLMs). Specifically, it evaluates the effect of using highlighted versus unhighlighted inputs for fine-tuning LLaMA2-13B model for summarization tasks. We fine-tuned LlaMA2-13B in two variants using MIMIC-IV-Ext-BHC dataset: one variant fine-tuned with the highlighted discharge notes (H-LLaMA), and the other on the same set of notes without highlighting (U-LLaMA). Highlighting was performed automatically using a Cardiology Interface Terminology (CIT) presented in our previous work. H-LLaMA and U-LLaMA were evaluated on a randomly selected test set of 100 discharge notes using multiple metrics (including BERTScore, ROUGE-L, BLEU, and SummaC_CONV). Additionally, LLM-based judgment via ChatGPT-4o rated coherence, fluency, conciseness, and correctness, alongside a manual completeness evaluation on a random sample of 40 notes. H-LLaMA consistently outperformed U-LLaMA across all metrics. H-summaries, generated using H-LLaMA, in comparison to U-summaries, generated using U-LLaMA, achieved higher BERTScore (63.75 vs. 59.61), ROUGE-L (23.43 vs. 21.82), BLEU (10.4 vs. 8.41), and SummaC_CONV (67.7 vs. 40.2). Manual review also showed improved completeness for H-summaries (54.8% vs. 47.6%). All improvements were statistically significant (p < 0.05). Moreover, LLM-based evaluation indicated higher average ratings across coherence, correctness, and conciseness.
Full article
(This article belongs to the Special Issue Advances in Large Language Models for Biological and Medical Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
DL-VLM: A Dynamic Lightweight Vision-Language Model for Bridge Health Diagnosis
by
Shenghao Liang, Zhiheng He, Hao Gui and Feng Liu
Big Data Cogn. Comput. 2026, 10(1), 3; https://doi.org/10.3390/bdcc10010003 - 22 Dec 2025
Abstract
Bridge health diagnosis plays a vital role in ensuring structural safety and extending service life while reducing maintenance costs. Traditional structural health monitoring approaches rely on sensor-based measurements, which are costly, labor-intensive, and limited in coverage. To address these challenges, we propose a
[...] Read more.
Bridge health diagnosis plays a vital role in ensuring structural safety and extending service life while reducing maintenance costs. Traditional structural health monitoring approaches rely on sensor-based measurements, which are costly, labor-intensive, and limited in coverage. To address these challenges, we propose a three-phase solution that integrates the Dynamic Lightweight Vision-Language Model (DL-VLM), domain adaptation, and knowledge-enhanced reasoning. First, as the core of the framework, the DL-VLM consists of three components: a visual information encoder with multi-scale feature selection, a text encoder for processing inspection-related language, and a multimodal alignment module. Second, to enhance practical applicability, we further introduce domain-specific fine-tuning on the Bridge-SHM dataset, enabling the model to acquire specialized knowledge of bridge construction, defects, and structural components. Third, a knowledge retrieval augmentation module is incorporated, leveraging external knowledge graphs and vector-based retrieval to provide contextually relevant information and improve diagnostic reasoning. Experiments on high-resolution bridge inspection datasets demonstrate that DL-VLM achieves competitive diagnostic accuracy while substantially reducing computational cost. The combination of domain-specific fine-tuning and knowledge augmentation significantly improves performance on specialized tasks, supporting efficient and practical deployment in real-world structural health monitoring scenarios.
Full article
(This article belongs to the Topic Generative AI and Interdisciplinary Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
AI Assisted System for Automated Evaluation of Entity-Relationship Diagram and Schema Diagram Using Large Language Models
by
Raji Ramachandran, Parvathy Vijayan, Athulya Anilkumar and Veena Gangadharan
Big Data Cogn. Comput. 2026, 10(1), 2; https://doi.org/10.3390/bdcc10010002 - 20 Dec 2025
Abstract
►▼
Show Figures
Automated assessment in education has seen rapid growth with the integration of AI, particularly for objective and structured tasks. However, evaluating open-ended design problems such as Entity Relationship (ER) diagrams and relational schemas remains a significant challenge due to the variability in valid
[...] Read more.
Automated assessment in education has seen rapid growth with the integration of AI, particularly for objective and structured tasks. However, evaluating open-ended design problems such as Entity Relationship (ER) diagrams and relational schemas remains a significant challenge due to the variability in valid representations. This paper proposes an AI-assisted framework using Large Language Models(LLMs) to interpret natural language database scenarios, generate reference ER diagrams and schemas in PlantUML format. and compare student submissions against the system generated solutions to assess correctness. We proposed a novel scoring mechanism for evaluating the semantic and structural similarity of entities, relationships, keys, and table mappings, rather than relying on exact syntax matching. Additionally, manual verification of AI-generated reference outputs enables human oversight and refinement, making the system a supportive tool rather than a replacement for educators. This approach offers scalable, intelligent evaluation for database design tasks, reducing the manual grading effort while ensuring fair and concept-driven assessment. Experimental results demonstrate the system’s effectiveness in accurately evaluating varied student submissions while maintaining adaptability across different design styles.
Full article

Figure 1
Open AccessArticle
Transformer Encoder vs. Mamba SSM: Lightweight Architectures for Machining Stability-Induced Surface-Quality Categorization
by
Jeong Hoon Ko
Big Data Cogn. Comput. 2026, 10(1), 1; https://doi.org/10.3390/bdcc10010001 - 19 Dec 2025
Abstract
Recent advances in large language models (LLMs) have revolutionized many domains; however, their adoption in manufacturing remains limited. This article explores the potential of state-of-the-art AI methodologies for chatter-induced surface-quality categorization in machining. While LLMs typically demand extremely high computational resources (FLOPs), AI
[...] Read more.
Recent advances in large language models (LLMs) have revolutionized many domains; however, their adoption in manufacturing remains limited. This article explores the potential of state-of-the-art AI methodologies for chatter-induced surface-quality categorization in machining. While LLMs typically demand extremely high computational resources (FLOPs), AI systems for chatter recognition must instead be designed for efficient classification or regression on structured sensor signals—requiring compact architectures capable of approximately—0.001 s inference. To address this, the present study applies and compares two representative architectures: the transformer encoder, an attention-based model, and the Mamba SSM (State-Space Model), a non-attention-based state-space alternative. Two case studies are conducted to evaluate their performance: one using machining-dynamics-based simulation data, and the other employing actual experimental measurements. Finally, the advantages and limitations of these cutting-edge AI approaches are critically discussed, emphasizing their suitability and practical challenges for real-world manufacturing deployment.
Full article
(This article belongs to the Special Issue Smart Manufacturing in the AI Era)
►▼
Show Figures

Graphical abstract
Open AccessArticle
A Multi-Scale Feature Fusion Linear Attention Model for Movie Review Sentiment Analysis
by
Zi Jiang and Chengjun Xu
Big Data Cogn. Comput. 2025, 9(12), 325; https://doi.org/10.3390/bdcc9120325 - 18 Dec 2025
Abstract
►▼
Show Figures
Sentiment classification is a key technique for analyzing the emotional tendency of user reviews and is of great significance to movie recommendation systems. However, existing methods often face challenges in practical applications due to complex model structures, low computational efficiency, or difficulties in
[...] Read more.
Sentiment classification is a key technique for analyzing the emotional tendency of user reviews and is of great significance to movie recommendation systems. However, existing methods often face challenges in practical applications due to complex model structures, low computational efficiency, or difficulties in balancing local details with global contextual features. To address these issues, this paper proposes a Multi-Scale Feature Fusion Linear Attention model (MSFFLA). The model consists of three core modules: the BERT Encoder module for extracting basic semantic features; the Parallel Multi-scale Feature Extraction module (PMFE), which employs multi-branch dilated convolutions to accurately capture local fine-grained features; and the Global Multi-scale Linear Feature Extraction module (MGLFE), which introduces a Multi-Scale Linear Attention mechanism (MSLA) to efficiently model global contextual dependencies with approximately linear computational complexity. Extensive experiments were conducted on three public datasets: SST-2, Amazon Reviews, and MR. The results show that compared to the state-of-the-art BERT-CondConv model, our model achieves improvements in accuracy and F1-Score by 1.8% and 0.4%, respectively, on the SST-2 dataset, and by 1.5% and 0.3% on the Amazon Reviews dataset. This study not only validates the effectiveness of the proposed model but also provides an efficient and lightweight solution for sentiment classification tasks in movie recommendation systems, demonstrating promising practical application prospects.
Full article

Figure 1
Open AccessArticle
KANs Layer Integration: Benchmarking Deep Learning Architectures for Tornado Prediction
by
Shuo (Luna) Yang, Ehsaneh Vilataj, Muhammad Faizan Raza and Satish Mahadevan Srinivasan
Big Data Cogn. Comput. 2025, 9(12), 324; https://doi.org/10.3390/bdcc9120324 - 16 Dec 2025
Abstract
►▼
Show Figures
Tornado occurrence and detection are well established in mesoscale meteorology, yet the application of deep learning (DL) to radar-based tornado detection remains nascent and under-validated. This study benchmarks DL approaches on TorNet, a curated dataset of full-resolution, polarimetric Weather Surveillance Radar-1988 Doppler (WSR-88D)
[...] Read more.
Tornado occurrence and detection are well established in mesoscale meteorology, yet the application of deep learning (DL) to radar-based tornado detection remains nascent and under-validated. This study benchmarks DL approaches on TorNet, a curated dataset of full-resolution, polarimetric Weather Surveillance Radar-1988 Doppler (WSR-88D) radar volumes. We evaluate three canonical architectures (e.g., CNN, VGG19, and Xception) under five optimizers and assess the effect of replacing conventional MLP heads with Kolmogorov–Arnold Network (KAN) layers. To address severe class imbalance and label noise, we implement radar-aware preprocessing and augmentation, temporal splits, and recall-sensitive training. Models are compared using accuracy, precision, recall, and ROC-AUC. Results show that KAN-augmented variants generally converge faster and deliver higher rare-event sensitivity and discriminative power than their baselines, with Adam and RMSprop providing the most stable training and Lion showing architecture-dependent gains. We contribute (i) a reproducible baseline suite for TorNet, (ii) evidence on the conditions under which KAN integration improves tornado detection, and (iii) practical guidance on optimizer–architecture choices for rare-event forecasting with weather radar.
Full article

Figure 1
Open AccessArticle
Influence Mechanism of Rock Compressive Mechanical Properties Under Freeze-Thaw Cycles: Insights from Machine Learning
by
Shuai Gao, Zhongyuan Gu, Xin Xiong and Chengnian Wang
Big Data Cogn. Comput. 2025, 9(12), 323; https://doi.org/10.3390/bdcc9120323 - 16 Dec 2025
Abstract
►▼
Show Figures
In plateau and high-altitude areas, freeze-thaw cycles often alter the uniaxial compressive strength (UCS) of rock, thereby impacting the stability of geotechnical engineering. Acquiring rock samples in these areas for UCS testing is often time-consuming and labor-intensive. This study developed a hybrid model
[...] Read more.
In plateau and high-altitude areas, freeze-thaw cycles often alter the uniaxial compressive strength (UCS) of rock, thereby impacting the stability of geotechnical engineering. Acquiring rock samples in these areas for UCS testing is often time-consuming and labor-intensive. This study developed a hybrid model based on the XGBoost algorithm to predict the UCS of rock under freeze-thaw conditions. First, a database was created containing longitudinal wave velocity (Vp), rock porosity (P), rock density (D), freezing temperature (T), number of freeze-thaw cycles (FTCs), and UCS. Four swarm intelligence optimization algorithms—artificial bee colony, Newton–Raphson, particle swarm optimization, and dung beetle optimization—were used to optimize the maximum iterations, depth, and learning rate of the XGBoost model, thereby enhancing model accuracy and developing four hybrid models. The four hybrid models were compared to a single XGBoost model and a random forest (RF) model to evaluate overall performance, and the optimal model was selected. The results demonstrate that all hybrid models outperform the single models. The XGBoost model optimized by the sparrow algorithm (R2 = 0.94, RMSE = 10.10, MAPE = 0.095, MAE = 7.22) performed best in predicting UCS. SHapley Additive exPlanations (SHAP) were used to assess the marginal contribution of each input variable to the UCS prediction of freeze-thawed rock. This study is expected to provide a reference for predicting the UCS of freeze-thawed rock using machine learning.
Full article

Figure 1
Open AccessArticle
Spatio-Temporal and Semantic Dual-Channel Contrastive Alignment for POI Recommendation
by
Chong Bu, Yujie Liu, Jing Lu, Manqi Huang, Maoyi Li and Jiarui Li
Big Data Cogn. Comput. 2025, 9(12), 322; https://doi.org/10.3390/bdcc9120322 - 15 Dec 2025
Abstract
Point-of-Interest (POI) recommendation predicts users’ future check-ins based on their historical trajectories and plays a key role in location-based services (LBS). Traditional approaches such as collaborative filtering and matrix factorization model user–POI interaction matrices fail to fully leverage spatio-temporal information and semantic attributes,
[...] Read more.
Point-of-Interest (POI) recommendation predicts users’ future check-ins based on their historical trajectories and plays a key role in location-based services (LBS). Traditional approaches such as collaborative filtering and matrix factorization model user–POI interaction matrices fail to fully leverage spatio-temporal information and semantic attributes, leading to weak performance on sparse and long-tail POIs. Recently, Graph Neural Networks (GNNs) have been applied by constructing heterogeneous user–POI graphs to capture high-order relations. However, they still struggle to effectively integrate spatio-temporal and semantic information and enhance the discriminative power of learned representations. To overcome these issues, we propose Spatio-Temporal and Semantic Dual-Channel Contrastive Alignment for POI Recommendation (S2DCRec), a novel framework integrating spatio-temporal and semantic information. It employs hierarchical relational encoding to capture fine-grained behavioral patterns and high-level semantic dependencies. The model jointly captures user–POI interactions, temporal dynamics, and semantic correlations in a unified framework. Furthermore, our alignment strategy ensures micro-level collaborative and spatio-temporal consistency and macro-level semantic coherence, enabling fine-grained embedding fusion and interpretable contrastive learning. Experiments on real-world datasets, Foursquare NYC, and Yelp, show that S2DCRec outperforms all baselines, improving F1 scores by 4.04% and 3.01%, respectively. These results demonstrate the effectiveness of the dual-channel design in capturing both sequential and semantic dependencies for accurate POI recommendation.
Full article
(This article belongs to the Topic Graph Neural Networks and Learning Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
A Tabular Data Imputation Technique Using Transformer and Convolutional Neural Networks
by
Charlène Béatrice Bridge-Nduwimana, Salah Eddine El Harrauss, Aziza El Ouaazizi and Majid Benyakhlef
Big Data Cogn. Comput. 2025, 9(12), 321; https://doi.org/10.3390/bdcc9120321 - 13 Dec 2025
Abstract
►▼
Show Figures
Upstream processes strongly influence downstream analysis in sequential data-processing workflows, particularly in machine learning, where data quality directly affects model performance. Conventional statistical imputations often fail to capture nonlinear dependencies, while deep learning approaches typically lack uncertainty quantification. We introduce a hybrid imputation
[...] Read more.
Upstream processes strongly influence downstream analysis in sequential data-processing workflows, particularly in machine learning, where data quality directly affects model performance. Conventional statistical imputations often fail to capture nonlinear dependencies, while deep learning approaches typically lack uncertainty quantification. We introduce a hybrid imputation model that integrates a deep learning autoencoder with Convolutional Neural Network (CNN) layers and a Transformer-based contextual modeling architecture to address systematic variation across heterogeneous data sources. Performing multiple imputations in the autoencoder–transformer latent space and averaging representations provides implicit batch correction that suppresses context-specific remains without explicit batch identifiers. We performed experiments on datasets in which 10% of missing data was artificially introduced by completely random missing data (MCAR) and non-random missing data (MNAR) mechanisms. They demonstrated practical performance, jointly ranking first among the imputation methods evaluated. This imputation technique reduced the root mean square error (RMSE) by 50% compared to denoising autoencoders (DAE) and by 46% compared to iterative imputation (MICE). Performance was comparable for adversarial models (GAIN) and attention-based models (MIDA), and both provided interpretable uncertainty estimates (CV = 0.08–0.15). Validation on datasets from multiple sources confirmed the robustness of the technique: notably, on a forensic dataset from multiple laboratories, our imputation technique achieved a practical improvement over GAIN (0.146 vs. 0.189 RMSE), highlighting its effectiveness in mitigating batch effects.
Full article

Graphical abstract
Open AccessSystematic Review
A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges
by
Andrew Brown, Muhammad Roman and Barry Devereux
Big Data Cogn. Comput. 2025, 9(12), 320; https://doi.org/10.3390/bdcc9120320 - 12 Dec 2025
Abstract
Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight. Objective: We aimed to synthesise empirical evidence on RAG effectiveness versus parametric-only
[...] Read more.
Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight. Objective: We aimed to synthesise empirical evidence on RAG effectiveness versus parametric-only baselines, map datasets/architectures/evaluation practices, and surface limitations and research gaps. Methods: This systematic review was conducted and reported in accordance with PRISMA 2020. We searched the ACM Digital Library, IEEE Xplore, Scopus, ScienceDirect, and DBLP; all sources were last searched on 13 May 2025. This included studies from January 2020–May 2025 that addressed RAG or similar retrieval-supported systems producing text output, met citation thresholds (≥15 for 2025; ≥30 for 2024 or earlier), and offered original contributions; excluded non-English items, irrelevant works, duplicates, and records without accessible full text. Bias was appraised with a brief checklist; screening used one reviewer with an independent check and discussion. LLM suggestions were advisory only; 2025 citation thresholds were adjusted to limit citation-lag. We used a descriptive approach to synthesise the results, organising studies by themes aligned to RQ1–RQ4 and reporting summary counts/frequencies; no meta-analysis was undertaken due to heterogeneity of designs and metrics. Results: We included 128 studies spanning knowledge-intensive tasks (35/128; 27.3%), open-domain QA (20/128; 15.6%), software engineering (13/128; 10.2%), and medical domains (11/128; 8.6%). Methods have shifted from DPR + seq2seq baselines to modular, policy-driven RAG with hybrid/structure-aware retrieval, uncertainty-triggered loops, memory, and emerging multimodality. Evaluation remains overlap-heavy (EM/ ), with increasing use of retrieval diagnostics (e.g., Recall@k, MRR@k), human judgements, and LLM-as-judge protocols. Efficiency and security (poisoning, leakage, jailbreaks) are growing concerns. Discussion: Evidence supports a shift to modular, policy-driven RAG, combining hybrid/structure-aware retrieval, uncertainty-aware control, memory, and multimodality, to improve grounding and efficiency. To advance from prototypes to dependable systems, we recommend: (i) holistic benchmarks pairing quality with cost/latency and safety, (ii) budget-aware retrieval/tool-use policies, and (iii) provenance-aware pipelines that expose uncertainty and deliver traceable evidence. We note the evidence base may be affected by citation-lag from the inclusion thresholds and by English-only, five-library coverage. Funding: Advanced Research and Engineering Centre. Registration: Not registered.
Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
►▼
Show Figures

Figure 1
Open AccessArticle
Identifying New Promising Research Directions with Open Peer Reviews and Contextual Top2Vec
by
Dmitry Devyatkin, Ilya V. Sochenkov, Dmitrii Popov, Denis Zubarev, Anastasia Ryzhova, Fyodor Abanin and Oleg Grigoriev
Big Data Cogn. Comput. 2025, 9(12), 319; https://doi.org/10.3390/bdcc9120319 - 12 Dec 2025
Abstract
►▼
Show Figures
The reliable and early detection of promising research directions is of great practical importance, especially in cases of limited resources. It enables researchers, funding experts, and science authorities to focus their efforts effectively. Although citation analysis has been commonly considered the primary tool
[...] Read more.
The reliable and early detection of promising research directions is of great practical importance, especially in cases of limited resources. It enables researchers, funding experts, and science authorities to focus their efforts effectively. Although citation analysis has been commonly considered the primary tool to detect directions for a long time, it lacks responsiveness, as it requires time for citations to emerge. In this paper, we propose a conceptual framework that detects new research directions with a contextual Top2Vec model, collects and analyzes reviews for those directions via Transformer-based classifiers, ranks them, and generates short summaries for the highest-scoring ones with a BART model. Averaging review scores for a whole topic helps mitigate the review bias problem. Experiments on past ICLR open reviews show that the highly ranked directions detected are significantly better cited; additionally, in most cases, they exhibit better publication dynamics.
Full article

Figure 1
Open AccessReview
Intelligent Modulation Recognition of Frequency-Hopping Communications: Theory, Methods, and Challenges
by
Mengxuan Lan, Zhongqiang Luo and Mingjun Jiang
Big Data Cogn. Comput. 2025, 9(12), 318; https://doi.org/10.3390/bdcc9120318 - 11 Dec 2025
Abstract
►▼
Show Figures
In wireless communication, information security, and anti-interference technology, modulation recognition of frequency-hopping signals has always been a key technique. Its widespread application in satellite communications, military communications, and drone communications holds broad prospects. Traditional modulation recognition techniques often rely on expert experience to
[...] Read more.
In wireless communication, information security, and anti-interference technology, modulation recognition of frequency-hopping signals has always been a key technique. Its widespread application in satellite communications, military communications, and drone communications holds broad prospects. Traditional modulation recognition techniques often rely on expert experience to construct likelihood functions or manually extract relevant features, involving cumbersome steps and low efficiency. In contrast, deep learning-based modulation recognition replaces manual feature extraction with an end-to-end feature extraction and recognition integrated architecture, where neural networks automatically extract signal features, significantly enhancing recognition efficiency. Current deep learning-based modulation recognition research primarily focuses on conventional fixed-frequency signals, leaving gaps in intelligent modulation recognition for frequency-hopping signals. This paper aims to summarise the current research progress in intelligent modulation recognition for frequency-hopping signals. It categorises intelligent modulation recognition for frequency-hopping signals into two mainstream approaches, analyses them in conjunction with the development of intelligent modulation recognition, and explores the close relationship between intelligent modulation recognition and parameter estimation for frequency-hopping signals. Finally, the paper summarises and outlines future research directions and challenges in the field of intelligent modulation recognition for frequency-hopping signals.
Full article

Figure 1
Open AccessArticle
Automated Trading Framework Using LLM-Driven Features and Deep Reinforcement Learning
by
Ive Botunac, Tomislav Petković and Jurica Bosna
Big Data Cogn. Comput. 2025, 9(12), 317; https://doi.org/10.3390/bdcc9120317 - 11 Dec 2025
Abstract
►▼
Show Figures
Stock trading faces significant challenges due to market volatility and the complexity of integrating diverse data sources, such as financial texts and numerical market data. This paper proposes an innovative automated trading system that integrates advanced natural language processing (NLP) and deep reinforcement
[...] Read more.
Stock trading faces significant challenges due to market volatility and the complexity of integrating diverse data sources, such as financial texts and numerical market data. This paper proposes an innovative automated trading system that integrates advanced natural language processing (NLP) and deep reinforcement learning (DRL) to address these challenges. The system combines two novel components: PrimoGPT, a Transformer-based NLP model fine-tuned on financial texts using instruction-based datasets to generate actionable features like sentiment and trend direction, and PrimoRL, a DRL model that expands its state space with these NLP-derived features for enhanced decision-making precision compared to traditional DRL models like FinRL. An experimental evaluation over seven months of leading technology stocks reveals cumulative returns of up to 58.47% for individual stocks and 27.14% for a diversified portfolio, with a Sharpe ratio of 1.70, outperforming traditional and advanced benchmarks. This work advances AI-driven quantitative finance by offering a scalable framework that bridges qualitative analysis and strategic action, thereby fostering smarter and more equitable participation in financial markets.
Full article

Figure 1
Open AccessArticle
Confidence-Guided Code Recognition for Shipping Containers Using Deep Learning
by
Sanele Hlabisa, Ray Leroy Khuboni and Jules-Raymond Tapamo
Big Data Cogn. Comput. 2025, 9(12), 316; https://doi.org/10.3390/bdcc9120316 - 6 Dec 2025
Abstract
►▼
Show Figures
Shipping containers are vital to the transportation industry due to their cost-effectiveness and compatibility with intermodal systems. With the significant increase in container usage since the mid-20th century, manual tracking at port terminals has become inefficient and prone to errors. Recent advancements in
[...] Read more.
Shipping containers are vital to the transportation industry due to their cost-effectiveness and compatibility with intermodal systems. With the significant increase in container usage since the mid-20th century, manual tracking at port terminals has become inefficient and prone to errors. Recent advancements in Deep Learning for object detection have introduced Computer Vision as a solution for automating this process. However, challenges such as low-quality images, varying font sizes & illumination, and environmental conditions hinder recognition accuracy. This study explores various architectures and proposes a Container Code Localization Network (CCLN), utilizing ResNet and UNet for code identification, and a Container Code Recognition Network (CCRN), which combines Convolutional Neural Networks with Long Short-Term Memory to convert the image text into a machine-readable format. By enhancing existing shipping container localization and recognition datasets with additional images, our models exhibited improved generalization capabilities on other datasets, such as Syntext, for text recognition. Experimental results demonstrate that our system achieves accuracy at frames per second under challenging conditions such as varying font sizes, illumination, tilt, and depth, effectively simulating real port terminal environments. The proposed solution promises to enhance workflow efficiency and productivity in container handling processes, making it highly applicable in modern port operations.
Full article

Figure 1
Open AccessArticle
Sentence-Level Rhetorical Role Labeling in Judicial Decisions
by
Gergely Márk Csányi, István Üveges, Dorina Lakatos, Dóra Ripszám, Kornélia Kozák, Dániel Nagy and János Pál Vadász
Big Data Cogn. Comput. 2025, 9(12), 315; https://doi.org/10.3390/bdcc9120315 - 5 Dec 2025
Abstract
►▼
Show Figures
This paper presents an in-production Rhetorical Role Labeling (RRL) classifier developed for Hungarian judicial decisions. RRL is a sequential classification problem in Natural Language Processing, aiming to assign functional roles (such as facts, arguments, decision, etc.) to every segment or sentence in a
[...] Read more.
This paper presents an in-production Rhetorical Role Labeling (RRL) classifier developed for Hungarian judicial decisions. RRL is a sequential classification problem in Natural Language Processing, aiming to assign functional roles (such as facts, arguments, decision, etc.) to every segment or sentence in a legal document. The study was conducted on a human-annotated sentence-level RRL corpus and compares multiple neural architectures, including BiLSTM, attention-based networks, and a support vector machine as baseline. It further investigates the impact of late chunking during vectorization, in contrast to classical approaches. Results from tests on the labeled dataset and annotator agreement statistics are reported, and performance is analyzed across architecture types and embedding strategies. Contrary to recent findings in retrieval tasks, late chunking does not show consistent improvements for sentence-level RRL, suggesting that contextualization through chunk embeddings may introduce noise rather than useful context in Hungarian legal judgments. The work also discusses the unique structure and labeling challenges of Hungarian cases compared to international datasets and provides empirical insights for future legal NLP research in non-English court decisions.
Full article

Figure 1
Open AccessArticle
Sophimatics: A Two-Dimensional Temporal Cognitive Architecture for Paradox-Resilient Artificial Intelligence
by
Gerardo Iovane and Giovanni Iovane
Big Data Cogn. Comput. 2025, 9(12), 314; https://doi.org/10.3390/bdcc9120314 - 5 Dec 2025
Abstract
►▼
Show Figures
This work represents the natural continuation of the development of the cognitive architecture developed and named Sophimatics, organically integrating the spatio-temporal processing mechanisms of the Super Time Cognitive Neural Network (STCNN) with the advanced principles of Sophimatics. Sophimatics’ goal is as challenging as
[...] Read more.
This work represents the natural continuation of the development of the cognitive architecture developed and named Sophimatics, organically integrating the spatio-temporal processing mechanisms of the Super Time Cognitive Neural Network (STCNN) with the advanced principles of Sophimatics. Sophimatics’ goal is as challenging as it is fraught with obstacles, but its ultimate aim is to achieve a more humanized post-generative artificial intelligence, capable of understanding and analyzing context and evaluating the user’s purpose and intent, viewing time not only as a chronological sequence but also as an experiential continuum. The path to achieving this extremely ambitious goal has been made possible thanks to some previous work in which the philosophical thinking of interest in AI was first inherited as the inspiration for the aforementioned capabilities of the Sophimatic framework, then the issue of mapping concepts and philosophical thinking in Sophimatics’ AI infrastructure was addressed, and finally a cognitive-inspired network such as STCNN was created. This work, on the other hand, addresses the challenge of how to endow the infrastructure with both chronological and experiential time and its powerful implications, such as the innate ability to resolve paradoxes, which generative AI does not have among its prerogatives precisely because of structural limitations. To reach these results, the model operates in the two-dimensional complex time domain ℂ2, extending cognitive processing capabilities through the implementation of dual temporal operators that simultaneously manage the real temporal dimension, where past, present, and future are managed and the imaginary one, that considers memory, creativity, and imagination. The resulting architecture demonstrates superior capabilities in resolving informational paradoxes and integrating apparently contradictory cognitive states, maintaining computational coherence through adaptive Sophimatic mechanisms. In conclusion, this work introduces Phase 4 of the Sophimatic framework, enabling management of two-dimensional time within a novel cognitively inspired neural architecture grounded in philosophical concepts. It connects with existing research on temporal cognition, hybrid symbolic–connectionist models, and ethical AI. The methodology translates philosophical insights into formal computational systems, culminating in a mathematical formalization that supports two-dimensional temporal reasoning and paradox resolution. Experimental results demonstrate efficiency, predictive accuracy, and computational feasibility, highlighting potential real-world applications, future research directions, and present limitations.
Full article

Figure 1
Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Conferences
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Drones, Electronics, Future Internet, IoT, Technologies, Telecom, BDCC
Internet of Things Architectures, Applications, and Strategies: Emerging Paradigms, Technologies, and Advancing AI Integration
Topic Editors: Oleksandr Kuznetsov, Cristian RandieriDeadline: 31 December 2025
Topic in
Computers, Information, AI, Electronics, Technologies, BDCC
Graph Neural Networks and Learning Systems
Topic Editors: Huijia Li, Jun Hu, Weichen Zhao, Jie CaoDeadline: 31 January 2026
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 31 March 2026
Topic in
Applied Sciences, Electronics, J. Imaging, MAKE, Information, BDCC, Signals
Applications of Image and Video Processing in Medical Imaging
Topic Editors: Jyh-Cheng Chen, Kuangyu ShiDeadline: 30 April 2026
Conferences
Special Issues
Special Issue in
BDCC
Application of Deep Neural Networks
Guest Editors: Linfeng Zhang, Wanyue Xu, Jiaye TengDeadline: 31 December 2025
Special Issue in
BDCC
Industrial Applications of IoT and Blockchain for Sustainable Environment
Guest Editors: Xiaodong Liu, Qi Liu, Amjad UllahDeadline: 31 December 2025
Special Issue in
BDCC
Deep Learning-Based Pose Estimation: Applications in Vision, Robotics, and Beyond
Guest Editors: Jyotindra Narayan, Chaiyawan AuepanwiriyakulDeadline: 31 December 2025
Special Issue in
BDCC
Transforming Cyber Security Provision Through Utilizing Artificial Intelligence
Guest Editors: Peter R. J. Trim, Yang-Im LeeDeadline: 31 December 2025




