Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Computer Science Applications)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 24.5 days after submission; acceptance to publication is undertaken in 4.6 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
4.4 (2024);
5-Year Impact Factor:
4.2 (2024)
Latest Articles
Chinese Financial News Analysis for Sentiment and Stock Prediction: A Comparative Framework with Language Models
Big Data Cogn. Comput. 2025, 9(10), 263; https://doi.org/10.3390/bdcc9100263 - 16 Oct 2025
Abstract
Financial news has a significant impact on investor sentiment and short-term stock price trends. While many studies have applied natural language processing (NLP) techniques to financial forecasting, most have focused on single tasks or English corpora, with limited research in non-English language contexts
[...] Read more.
Financial news has a significant impact on investor sentiment and short-term stock price trends. While many studies have applied natural language processing (NLP) techniques to financial forecasting, most have focused on single tasks or English corpora, with limited research in non-English language contexts such as Taiwan. This study develops a joint framework to perform sentiment classification and short-term stock price prediction using Chinese financial news from Taiwan’s top 50 listed companies. Five types of word embeddings—one-hot, TF-IDF, CBOW, skip-gram, and BERT—are systematically compared across 17 traditional, deep, and Transformer models, as well as a large language model (LLaMA3) fully fine-tuned on the Chinese financial texts. To ensure annotation quality, sentiment labels were manually assigned by annotators with finance backgrounds and validated through a double-checking process. Experimental results show that a CNN using skip-gram embeddings achieves the strongest performance among deep learning models, while LLaMA3 yields the highest overall F1-score for sentiment classification. For regression, LSTM consistently provides the most reliable predictive power across different volatility groups, with Bayesian Linear Regression remaining competitive for low-volatility firms. LLaMA3 is the only Transformer-based model to achieve a positive under high-volatility conditions. Furthermore, forecasting accuracy is higher for the five-day horizon than for the fifteen-day horizon, underscoring the increasing difficulty of medium-term forecasting. These findings confirm that financial news provides valuable predictive signals for emerging markets and that short-term sentiment-informed forecasts enhance real-time investment decisions.
Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
►
Show Figures
Open AccessArticle
Source Robust Non-Parametric Reconstruction of Epidemic-like Event-Based Network Diffusion Processes Under Online Data
by
Jiajia Xie, Chen Lin, Xinyu Guo and Cassie S. Mitchell
Big Data Cogn. Comput. 2025, 9(10), 262; https://doi.org/10.3390/bdcc9100262 - 16 Oct 2025
Abstract
Temporal network diffusion models play a crucial role in healthcare, information technology, and machine learning, enabling the analysis of dynamic event-based processes such as disease spread, information propagation, and behavioral diffusion. This study addresses the challenge of reconstructing temporal network diffusion events in
[...] Read more.
Temporal network diffusion models play a crucial role in healthcare, information technology, and machine learning, enabling the analysis of dynamic event-based processes such as disease spread, information propagation, and behavioral diffusion. This study addresses the challenge of reconstructing temporal network diffusion events in real time under conditions of missing and evolving data. A novel non-parametric reconstruction method by simple weights differentiationis proposed to enhance source detection robustness with provable improved error bounds. The approach introduces adaptive cost adjustments, dynamically reducing high-risk source penalties and enabling bounded detours to mitigate errors introduced by missing edges. Theoretical analysis establishes enhanced upper bounds on false positives caused by detouring, while a stepwise evaluation of dynamic costs minimizes redundant solutions, resulting in robust Steiner tree reconstructions. Empirical validation on three real-world datasets demonstrates a 5% improvement in Matthews correlation coefficient (MCC), a twofold reduction in redundant sources, and a 50% decrease in source variance. These results confirm the effectiveness of the proposed method in accurately reconstructing temporal network diffusion while improving stability and reliability in both offline and online settings.
Full article
(This article belongs to the Special Issue Advances in Graph Learning and Representation Models for Complex Network Analysis)
Open AccessArticle
Integrating Graph Retrieval-Augmented Generation into Prescriptive Recommender Systems
by
Marvin Niederhaus, Nico Migenda, Julian Weller, Martin Kohlhase and Wolfram Schenck
Big Data Cogn. Comput. 2025, 9(10), 261; https://doi.org/10.3390/bdcc9100261 - 15 Oct 2025
Abstract
►▼
Show Figures
Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users.
[...] Read more.
Making time-critical decisions with serious consequences is a daily aspect of work environments. To support the process of finding optimal actions, data-driven approaches are increasingly being used. The most advanced form of data-driven analytics is prescriptive analytics, which prescribes actionable recommendations for users. However, the produced recommendations rely on complex models and optimization techniques that are difficult to understand or justify to non-expert users. Currently, there is a lack of platforms that offer easy integration of domain-specific prescriptive analytics workflows into production environments. In particular, there is no centralized environment and standardized approach for implementing such prescriptive workflows. To address these challenges, large language models (LLMs) can be leveraged to improve interpretability by translating complex recommendations into clear, context-specific explanations, enabling non-experts to grasp the rationale behind the suggested actions. Nevertheless, we acknowledge the inherent black-box nature of LLMs, which may introduce limitations in transparency. To mitigate these limitations and to provide interpretable recommendations based on real user knowledge, a knowledge graph is integrated. In this paper, we present and validate a prescriptive analytics platform that integrates ontology-based graph retrieval-augmented generation (GraphRAG) to enhance decision making by delivering actionable and context-aware recommendations. For this purpose, a knowledge graph is created through a fully automated workflow based on an ontology, which serves as the backbone of the prescriptive platform. Data sources for the knowledge graph are standardized and classified according to the ontology by employing a zero-shot classifier. For user-friendly presentation, we critically examine the usability of GraphRAG in prescriptive analytics platforms. We validate our prescriptive platform in a customer clinic with industry experts in our IoT-Factory, a dedicated research environment.
Full article

Figure 1
Open AccessArticle
Cognitive Computing Frameworks for Scalable Deception Detection in Textual Data
by
Faiza Belbachir
Big Data Cogn. Comput. 2025, 9(10), 260; https://doi.org/10.3390/bdcc9100260 - 14 Oct 2025
Abstract
►▼
Show Figures
Detecting deception in emotionally grounded natural language remains a significant challenge due to the subtlety and context dependence of deceptive intent. In this work, we use a structured behavioral dataset in which participants produce truthful and deceptive statements under emotional and social constraints.
[...] Read more.
Detecting deception in emotionally grounded natural language remains a significant challenge due to the subtlety and context dependence of deceptive intent. In this work, we use a structured behavioral dataset in which participants produce truthful and deceptive statements under emotional and social constraints. To maintain label accuracy and semantic consistency, we propose a multilayer validation pipeline combining selfconsistency prompting with feedback-guided revision, implemented through the CoTAM (Chain-of-Thought Assisted Modification) method. Our results demonstrate that this framework enhances deception detection by leveraging a sentence decomposition strategy that highlights subtle emotional and strategic cues, improving interpretability for both models and human annotators.
Full article

Figure 1
Open AccessArticle
Towards the Adoption of Recommender Systems in Online Education: A Framework and Implementation
by
Alex Martínez-Martínez, Águeda Gómez-Cambronero, Raul Montoliu and Inmaculada Remolar
Big Data Cogn. Comput. 2025, 9(10), 259; https://doi.org/10.3390/bdcc9100259 - 14 Oct 2025
Abstract
►▼
Show Figures
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning,
[...] Read more.
The rapid expansion of online education has generated large volumes of learner interaction data, highlighting the need for intelligent systems capable of transforming this information into personalized guidance. Educational Recommender Systems (ERS) represent a key application of big data analytics and machine learning, offering adaptive learning pathways that respond to diverse student needs. For widespread adoption, these systems must align with pedagogical principles while ensuring transparency, interpretability, and seamless integration into Learning Management Systems (LMS). This paper introduces a comprehensive framework and implementation of an ERS designed for platforms such as Moodle. The system integrates big data processing pipelines to support scalability, real-time interaction, and multi-layered personalization, including data collection, preprocessing, recommendation generation, and retrieval. A detailed use case demonstrates its deployment in a real educational environment, underlining both technical feasibility and pedagogical value. Finally, the paper discusses challenges such as data sparsity, learner model complexity, and evaluation of effectiveness, offering directions for future research at the intersection of big data technologies and digital education. By bridging theoretical models with operational platforms, this work contributes to sustainable and data-driven personalization in online learning ecosystems.
Full article

Figure 1
Open AccessReview
Data Organisation for Efficient Pattern Retrieval: Indexing, Storage, and Access Structures
by
Paraskevas Koukaras and Christos Tjortjis
Big Data Cogn. Comput. 2025, 9(10), 258; https://doi.org/10.3390/bdcc9100258 - 13 Oct 2025
Abstract
►▼
Show Figures
The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval
[...] Read more.
The increasing scale and complexity of data mining outputs, such as frequent itemsets, association rules, sequences, and subgraphs have made efficient pattern retrieval a critical, yet underexplored challenge. This review addresses the organisation, indexing, and access strategies, which enable scalable and responsive retrieval of structured patterns. We examine the underlying types of data and pattern outputs, common retrieval operations, and the variety of query types encountered in practice. Key indexing structures are surveyed, including prefix trees, inverted indices, hash-based approaches, and bitmap-based methods, each suited to different pattern representations and workloads. Storage designs are discussed with attention to metadata annotation, format choices, and redundancy mitigation. Query optimisation strategies are reviewed, emphasising index-aware traversal, caching, and ranking mechanisms. This paper also explores scalability through parallel, distributed, and streaming architectures, and surveys current systems and tools, which integrate mining and retrieval capabilities. Finally, we outline pressing challenges and emerging directions, such as supporting real-time and uncertainty-aware retrieval, and enabling semantic, cross-domain pattern access. Additional frontiers include privacy-preserving indexing and secure query execution, along with integration of repositories into machine learning pipelines for hybrid symbolic–statistical workflows. We further highlight the need for dynamic repositories, probabilistic semantics, and community benchmarks to ensure that progress is measurable and reproducible across domains. This review provides a comprehensive foundation for designing next-generation pattern retrieval systems, which are scalable, flexible, and tightly integrated into analytic workflows. The analysis and roadmap offered are relevant across application areas including finance, healthcare, cybersecurity, and retail, where robust and interpretable retrieval is essential.
Full article

Figure 1
Open AccessArticle
Tester-Guided Graph Learning with End-to-End Detection Certificates for Triangle-Based Anomalies
by
Manuel J. C. S. Reis
Big Data Cogn. Comput. 2025, 9(10), 257; https://doi.org/10.3390/bdcc9100257 - 12 Oct 2025
Abstract
We investigate anomaly detection in complex networks through a property-testing-guided graph neural model (PT-GNN) that provides an end-to-end miss-probability certificate . The method combines (i) a wedge-sampling tester that estimates triangle-closure frequency and derives a concentration bound
[...] Read more.
We investigate anomaly detection in complex networks through a property-testing-guided graph neural model (PT-GNN) that provides an end-to-end miss-probability certificate . The method combines (i) a wedge-sampling tester that estimates triangle-closure frequency and derives a concentration bound via Bernstein’s inequality, with (ii) a lightweight classifier over structural features whose validation error contributes . The overall certificate is given by the sum , quantifying the probability of missed anomalies under bounded sampling. On synthetic communication graphs with n = 1000, edge probability p = 0.01, and anomalous subgraph size k = 120, PT-GNN achieves perfect detection performance (AUC = 1.0, F1 = 1.0) across all tested regimes. Moreover, the miss-probability certificate tightens systematically as the tester budget m increases (e.g., for = 0.06, enlarging m from 2000 to 8000 reduces from ≈0.87 to ≈0.49). These results demonstrate that PT-GNN effectively couples graph learning with property testing, offering both strong empirical detection and formally verifiable guarantees in anomaly detection tasks.
Full article
(This article belongs to the Special Issue Advances in Graph Learning and Representation Models for Complex Network Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Robust Clinical Querying with Local LLMs: Lexical Challenges in NL2SQL and Retrieval-Augmented QA on EHRs
by
Luka Blašković, Nikola Tanković, Ivan Lorencin and Sandi Baressi Šegota
Big Data Cogn. Comput. 2025, 9(10), 256; https://doi.org/10.3390/bdcc9100256 - 11 Oct 2025
Abstract
Electronic health records (EHRs) are typically stored in relational databases, making them difficult to query for nontechnical users, especially under privacy constraints. We evaluate two practical clinical NLP workflows, natural language to SQL (NL2SQL) for EHR querying and retrieval-augmented generation for clinical question
[...] Read more.
Electronic health records (EHRs) are typically stored in relational databases, making them difficult to query for nontechnical users, especially under privacy constraints. We evaluate two practical clinical NLP workflows, natural language to SQL (NL2SQL) for EHR querying and retrieval-augmented generation for clinical question answering (RAG-QA), with a focus on privacy-preserving deployment. We benchmark nine large language models, spanning open-weight options (DeepSeek V3/V3.1, Llama-3.3-70B, Qwen2.5-32B, Mixtral-8 × 22B, BioMistral-7B, and GPT-OSS-20B) and proprietary APIs (GPT-4o and GPT-5). The models were chosen to represent a diverse cross-section spanning sparse MoE, dense general-purpose, domain-adapted, and proprietary LLMs. On MIMICSQL (27,000 generations; nine models × three runs), the best NL2SQL execution accuracy (EX) is 66.1% (GPT-4o), followed by 64.6% (GPT-5). Among open-weight models, DeepSeek V3.1 reaches 59.8% EX, while DeepSeek V3 reaches 58.8%, with Llama-3.3-70B at 54.5% and BioMistral-7B achieving only 11.8%, underscoring a persistent gap relative to general-domain benchmarks. We introduce SQL-EC, a deterministic SQL error-classification framework with adjudication, revealing string mismatches as the dominant failure (86.3%), followed by query-join misinterpretations (49.7%), while incorrect aggregation-function usage accounts for only 6.7%. This highlights lexical/ontology grounding as the key bottleneck for NL2SQL in the biomedical domain. For RAG-QA, evaluated on 100 synthetic patient records across 20 questions (54,000 reference–generation pairs; three runs), BLEU and ROUGE-L fluctuate more strongly across models, whereas BERTScore remains high on most, with DeepSeek V3.1 and GPT-4o among the top performers; pairwise t-tests confirm that significant differences were observed among the LLMs. Cost–performance analysis based on measured token usage shows per-query costs ranging from USD 0.000285 (GPT-OSS-20B) to USD 0.005918 (GPT-4o); DeepSeek V3.1 offers the best open-weight cost–accuracy trade-off, and GPT-5 provides a balanced API alternative. Overall, the privacy-conscious RAG-QA attains strong semantic fidelity, whereas the clinical NL2SQL remains brittle under lexical variation. SQL-EC pinpoints actionable failure modes, motivating ontology-aware normalization and schema-linked prompting for robust clinical querying.
Full article
(This article belongs to the Special Issue Advances in Large Language Models for Biological and Medical Applications)
►▼
Show Figures

Figure 1
Open AccessReview
An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment
by
Michalis Savelonas
Big Data Cogn. Comput. 2025, 9(10), 255; https://doi.org/10.3390/bdcc9100255 - 10 Oct 2025
Abstract
►▼
Show Figures
Accurate segmentation and analysis of thyroid nodules in ultrasound (US) images are essential for the diagnosis and management of thyroid conditions, including cancer. Despite advancements in medical imaging, achieving accurate and efficient segmentation remains a significant challenge due to the complexity and variability
[...] Read more.
Accurate segmentation and analysis of thyroid nodules in ultrasound (US) images are essential for the diagnosis and management of thyroid conditions, including cancer. Despite advancements in medical imaging, achieving accurate and efficient segmentation remains a significant challenge due to the complexity and variability of US images. Recently, deep learning (DL) techniques, such as convolutional neural networks (CNNs) and vision transformers (ViTs), have emerged as powerful tools for computer-aided diagnosis (CAD). This review highlights recent advancements in thyroid US image segmentation, focusing on state-of-the-art DL techniques such as contrastive learning, consistency learning, and knowledge-driven DL. We explore various approaches to improve segmentation accuracy, including multi-task learning, self-supervised learning, and methods that minimize reliance on the availability of large, annotated datasets. Additionally, we examine the clinical significance of these methods in differentiating between benign and malignant nodules, as well as their potential for integration into clinically adopted, fully automated CAD systems. By addressing the latest developments and ongoing challenges, this review serves as a comprehensive reference for future research and clinical implementation of thyroid US diagnostics.
Full article

Figure 1
Open AccessArticle
Fast Adaptive Approximate Nearest Neighbor Search with Cluster-Shaped Indices
by
Vladimir Kazakovtsev, Mikhail Plekhanov, Alexandr Naumchev, Guzel Shkaberina, Igor Masich, Lyudmila Egorova, Alena Stupina, Aleksey Popov and Lev Kazakovtsev
Big Data Cogn. Comput. 2025, 9(10), 254; https://doi.org/10.3390/bdcc9100254 - 9 Oct 2025
Abstract
►▼
Show Figures
In this study, we propose a novel adaptive algorithm for approximate nearest neighbor (ANN) search, based on the inverted file (IVF) index (cluster-based index) and online query complexity classification. The concept of the classical IVF search implemented in vector databases is as follows:
[...] Read more.
In this study, we propose a novel adaptive algorithm for approximate nearest neighbor (ANN) search, based on the inverted file (IVF) index (cluster-based index) and online query complexity classification. The concept of the classical IVF search implemented in vector databases is as follows: all data vectors are divided into clusters, and each cluster is assigned to its central point (centroid). For an ANN search query, the closest centroids are determined, and the further search continues in the corresponding clusters only. In our study, the complexity of each query is assessed and classified with the use of results of an initial trial search in a limited number of clusters. Based on this classification, the algorithm dynamically determines the presumably sufficient number of clusters which is sufficient to achieve the desired Recall value, thereby improving vector search efficiency. Our experiments show that such a complexity classifier can be built with the use of a single feature, and we propose an algorithm for its training. We studied the impact of various features on the query processing and discovered a strong dependence on the number of clusters that contains at least one nearest neighbor (productive clusters). The new algorithm is designed to be implemented on top of the IVF search which is a well-known algorithm for approximate nearest neighbor search and uses existing IVF indexes that are widely used in the most popular vector database management systems, such as pgvector. The results obtained demonstrate a significant increase in the speed of nearest neighbor search (up to 35%) while maintaining a high Recall rate of 0.99. Additionally, the search algorithm is deterministic, which might be extremely important for tasks where the reproducibility of results plays a crucial role. The developed algorithm has been tested on datasets of varying sizes up to one billion data vectors.
Full article

Figure 1
Open AccessArticle
A Pattern-Based Framework for Automated Migration of Monolithic Applications to Microservices
by
Hossam Hassan, Manal A. Abdel-Fattah and Wael Mohamed
Big Data Cogn. Comput. 2025, 9(10), 253; https://doi.org/10.3390/bdcc9100253 - 6 Oct 2025
Abstract
Over the past decade, many software enterprises have migrated from monolithic to microservice architectures to enhance scalability, maintainability, and performance. However, this transition presents significant challenges, requiring considerable development efforts, research, customization, and resource allocation over extended periods. Furthermore, the success of migration
[...] Read more.
Over the past decade, many software enterprises have migrated from monolithic to microservice architectures to enhance scalability, maintainability, and performance. However, this transition presents significant challenges, requiring considerable development efforts, research, customization, and resource allocation over extended periods. Furthermore, the success of migration is not guaranteed, highlighting the complexities organizations face in modernizing their software systems. To address these challenges, this study introduces Mono2Micro, a comprehensive framework designed to automate the migration process while preserving structural integrity and optimizing service boundaries. The framework focuses on three core patterns: database patterns, service decomposition, and communication patterns. It leverages machine learning algorithms, including Random Forest and Louvain clustering, to analyze database query patterns along with static and dynamic database model analysis, which enables the identification of relationships between models, facilitating the systematic decomposition of microservices while ensuring efficient inter-service communication. To validate its effectiveness, Mono2Micro was applied to a student information system for faculty management, demonstrating its ability to streamline the migration process while maintaining functional integrity. The proposed framework offers a systematic and scalable solution for organizations and researchers seeking efficient migration from monolithic systems to microservices.
Full article
(This article belongs to the Special Issue Advanced Software and Machine Learning Techniques for System Architectures and Big Data)
►▼
Show Figures

Figure 1
Open AccessReview
A Digital Twin Threat Survey
by
Manuel Suárez-Román, Mario Sanz-Rodrigo, Andrés Marín-López and David Arroyo
Big Data Cogn. Comput. 2025, 9(10), 252; https://doi.org/10.3390/bdcc9100252 - 2 Oct 2025
Abstract
Virtual and digital twins are means of high value to characterize, model and control physical systems, providing the basis for a simulation environment and lab. In the case of a digital twin, it is possible to have a replica of a physical environment
[...] Read more.
Virtual and digital twins are means of high value to characterize, model and control physical systems, providing the basis for a simulation environment and lab. In the case of a digital twin, it is possible to have a replica of a physical environment by means of reliable sensor networks and accurate data. In this paper we analyse in detail the threats to the reliability of the information extracted from these sensor networks, along with a set of challenges to guarantee data liveness and trustworthiness.
Full article
(This article belongs to the Topic Internet of Things Architectures, Applications, and Strategies: Emerging Paradigms, Technologies, and Advancing AI Integration)
►▼
Show Figures

Figure 1
Open AccessArticle
Monitoring of First Responders Biomedical Data During Training with Innovative Virtual Reality Technologies
by
Lýdie Leová, Martin Molek, Petr Volf, Marek Sokol, Jan Hejda, Zdeněk Hon, Marek Bureš and Patrik Kutilek
Big Data Cogn. Comput. 2025, 9(10), 251; https://doi.org/10.3390/bdcc9100251 - 30 Sep 2025
Abstract
►▼
Show Figures
Traditional training methods for first responders are often limited by time, resources, and safety constraints, which reduces their consistency and effectiveness. This study focused on two main issues: whether exposure to virtual reality training scenarios induces measurable physiological changes in heart rate and
[...] Read more.
Traditional training methods for first responders are often limited by time, resources, and safety constraints, which reduces their consistency and effectiveness. This study focused on two main issues: whether exposure to virtual reality training scenarios induces measurable physiological changes in heart rate and heart rate variability, and whether these responses differ between police and firefighter contexts. The aim of this study was to explore the integration of virtual reality technologies into responder training and to evaluate how biomedical monitoring can be used to assess training effectiveness. A pilot measurement was conducted with ten participants who completed systematic crime scene investigation scenarios in both domains. Heart activity was continuously recorded using a wearable sensor and analyzed for heart rate and heart rate variability parameters, while cognitive load and task performance were also assessed. The collected data were statistically evaluated using tests of normality and paired comparisons between baseline and virtual reality phases. The results showed a significant increase in heart rate and a decrease in heart rate variability during virtual reality exposure compared to baseline, with higher cognitive load and success rates in police scenarios compared to firefighter scenarios. These findings indicate that virtual reality scenarios can elicit measurable psychophysiological responses and highlight the potential of combining immersive technologies with biomedical monitoring for the development of adaptive and effective training methods for first responders.
Full article

Figure 1
Open AccessArticle
Exploring the Application and Characteristics of Homomorphic Encryption Based on Pixel Scrambling Algorithm in Image Processing
by
Tieyu Zhao
Big Data Cogn. Comput. 2025, 9(10), 250; https://doi.org/10.3390/bdcc9100250 - 30 Sep 2025
Abstract
Homomorphic encryption is well known to researchers, yet its application in image processing is scarce. The diversity of image processing algorithms makes homomorphic encryption implementation challenging. Current research often uses the CKKS algorithm, but it has core bottlenecks in image encryption, such as
[...] Read more.
Homomorphic encryption is well known to researchers, yet its application in image processing is scarce. The diversity of image processing algorithms makes homomorphic encryption implementation challenging. Current research often uses the CKKS algorithm, but it has core bottlenecks in image encryption, such as the mismatch between image data and the homomorphic operation mechanism, high 2D-structure-induced costs, noise-related visual quality damage, and poor nonlinear operational support. This study, based on image pixel characteristics, analyzes homomorphic encryption via pixel scrambling algorithms. Using magic square, Arnold, Henon map, and Hilbert curve transformations as starting points, it reveals their homomorphic properties in image processing. This further explores general pixel scrambling algorithm homomorphic encryption properties, offering valuable insights for homomorphic encryption applications in image processing.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
►▼
Show Figures

Figure 1
Open AccessArticle
A Complex Network Science Perspective on Urban Parcel Locker Placement
by
Enrico Corradini, Mattia Mandorlini, Filippo Mariani, Paolo Roselli, Samuele Sacchetti and Matteo Spiga
Big Data Cogn. Comput. 2025, 9(10), 249; https://doi.org/10.3390/bdcc9100249 - 30 Sep 2025
Abstract
The rapid rise of e-commerce is intensifying pressure on last-mile delivery networks, making the strategic placement of parcel lockers an urgent urban challenge. In this work, we adapt multilayer two-mode Social Network Analysis to the parcel-locker siting problem, modeling city-scale systems as bipartite
[...] Read more.
The rapid rise of e-commerce is intensifying pressure on last-mile delivery networks, making the strategic placement of parcel lockers an urgent urban challenge. In this work, we adapt multilayer two-mode Social Network Analysis to the parcel-locker siting problem, modeling city-scale systems as bipartite networks linking spatially resolved demand zones to locker locations using only open-source demographic and geographic data. We introduce two new Social Network Analysis metrics, Dual centrality and Coverage centrality, designed to identify both structurally critical and highly accessible lockers within the network. Applying our framework to Milan, Rome, and Naples, we find that conventional coverage-based strategies successfully maximize immediate service reach, but tend to prioritize redundant hubs. In contrast, Dual centrality reveals a distinct set of lockers whose presence is essential for maintaining overall connectivity and resilience, often acting as hidden bridges between user communities. Comparative analysis with state-of-the-art multi-criteria optimization baselines confirms that our network-centric metrics deliver complementary, and in some cases better, guidance for robust locker placement. Our results show that a network-analytic lens yields actionable guidance for resilient last-mile locker siting. The method is reproducible from open data (potential-access weights) and plug-in compatible with observed assignments. Importantly, the path-based results (Coverage centrality) are adjacency-driven and thus largely insensitive to volumetric weights.
Full article
(This article belongs to the Special Issue Advances in Graph Learning and Representation Models for Complex Network Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
A Multi-Model Machine Learning Framework for Daily Stock Price Prediction
by
Bharatendra Rai and Leili Soltanisehat
Big Data Cogn. Comput. 2025, 9(10), 248; https://doi.org/10.3390/bdcc9100248 - 28 Sep 2025
Abstract
Stock price prediction remains a challenging problem due to the inherent volatility and complexity of financial markets. This study proposes a multi-model machine learning framework for one-day-ahead stock price prediction using thirty-six features derived from technical indicators. Empirical analysis is conducted on data
[...] Read more.
Stock price prediction remains a challenging problem due to the inherent volatility and complexity of financial markets. This study proposes a multi-model machine learning framework for one-day-ahead stock price prediction using thirty-six features derived from technical indicators. Empirical analysis is conducted on data from Apple, Tesla, and NVIDIA, employing nine classification algorithms, including support vector machines, random forests, extreme gradient boosting, and logistic regression. Results indicate that momentum-based indicators are the most influential predictors. While support vector machines achieve the highest accuracy for Apple, extreme gradient boosting performed best for NVIDIA and Tesla. In addition, explainable AI techniques are applied to interpret individual model predictions, thereby enhancing transparency and trust in the results. The study contributes to financial analytics research by providing a comparative evaluation of diverse machine learning methods and highlighting key indicators critical for short-term stock price forecasting.
Full article
(This article belongs to the Topic Electronic Communications, IOT and Big Data, 2nd Volume)
►▼
Show Figures

Figure 1
Open AccessArticle
Leveraging Large Language Models for Sustainable and Inclusive Web Accessibility
by
Manuel Andruccioli, Barry Bassi, Giovanni Delnevo and Paola Salomoni
Big Data Cogn. Comput. 2025, 9(10), 247; https://doi.org/10.3390/bdcc9100247 - 26 Sep 2025
Abstract
The increasing complexity of modern web applications, which are composed of dynamic and asynchronous components, poses a significant challenge for digital inclusion. Traditional automated tools typically analyze only the static HTML markup generated by frontend and backend frameworks. Recent advances in Large Language
[...] Read more.
The increasing complexity of modern web applications, which are composed of dynamic and asynchronous components, poses a significant challenge for digital inclusion. Traditional automated tools typically analyze only the static HTML markup generated by frontend and backend frameworks. Recent advances in Large Language Models (LLMs) offer a novel approach to enhance the validation process by directly analyzing the source code. In this paper, we investigate the capacity of LLMs to interpret and reason dynamically generated content, providing real-time feedback on web accessibility. Our findings show that LLMs can correctly anticipate the presence of accessibility violations in the generated HTML code, going beyond the capabilities of traditional validators, also evaluating possible issues due to the asynchronous execution of the web application. However, together with legitimate issues, LLMs also produced a relevant number of hallucinated or redundant violations. This study contributes to the broader effort of employing AI with the aim of improving the inclusivity and equity of the web.
Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
►▼
Show Figures

Figure 1
Open AccessArticle
FedIFD: Identifying False Data Injection Attacks in Internet of Vehicles Based on Federated Learning
by
Huan Wang, Junying Yang, Jing Sun, Zhe Wang, Qingzheng Liu and Shaoxuan Luo
Big Data Cogn. Comput. 2025, 9(10), 246; https://doi.org/10.3390/bdcc9100246 - 26 Sep 2025
Abstract
With the rapid development of intelligent connected vehicle technology, false data injection (FDI) attacks have become a major challenge in the Internet of Vehicles (IoV). While deep learning methods can effectively identify such attacks, the dynamic, distributed architecture of the IoV and limited
[...] Read more.
With the rapid development of intelligent connected vehicle technology, false data injection (FDI) attacks have become a major challenge in the Internet of Vehicles (IoV). While deep learning methods can effectively identify such attacks, the dynamic, distributed architecture of the IoV and limited computing resources hinder both privacy protection and lightweight computation. To address this, we propose FedIFD, a federated learning (FL)-based detection method for false data injection attacks. The lightweight threat detection model utilizes basic safety messages (BSM) for local incremental training, and the Q-FedCG algorithm compresses gradients for global aggregation. Original features are reshaped using a time window. To ensure temporal and spatial consistency, a sliding average strategy aligns samples before spatial feature extraction. A dual-branch architecture enables parallel extraction of spatiotemporal features: a three-layer stacked Bidirectional Long Short-Term Memory (BiLSTM) captures temporal dependencies, and a lightweight Transformer models spatial relationships. A dynamic feature fusion weight matrix calculates attention scores for adaptive feature weighting. Finally, a differentiated pooling strategy is applied to emphasize critical features. Experiments on the VeReMi dataset show that the accuracy reaches 97.8%.
Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
►▼
Show Figures

Figure 1
Open AccessArticle
DTS-MixNet: Dynamic Spatiotemporal Graph Mixed Network for Anomaly Detection in Multivariate Time Series
by
Chengxun Tan, Jiayi Hu, Jian Li, Minmin Miao, Wenjun Hu and Shitong Wang
Big Data Cogn. Comput. 2025, 9(10), 245; https://doi.org/10.3390/bdcc9100245 - 25 Sep 2025
Abstract
►▼
Show Figures
Anomaly detection in multivariate time series (MTS) remains challenging due to the presence of complex and dynamic spatiotemporal dependencies. To address this, we propose the Dynamic Spatiotemporal Graph Mixed Network (DTS-MixNet), which takes a sliding window data as input to predict the next
[...] Read more.
Anomaly detection in multivariate time series (MTS) remains challenging due to the presence of complex and dynamic spatiotemporal dependencies. To address this, we propose the Dynamic Spatiotemporal Graph Mixed Network (DTS-MixNet), which takes a sliding window data as input to predict the next time series data and determine its state. The model comprises five blocks. The Temporal Graph Structure Learner (TGSL) generates the attention-weighted graphs via two types of neighbor relationships and the multi-head-attention-based neighbor degrees. Then, the Cross-Temporal Dynamic Encoder (CTDE) aggregates the cross-temporal dependencies from attention-weighted graphs, and encodes them into a proxy multivariate sequence (PMS), which is fed into the proposed Cross-Variable Dynamic Encoder (CVDE). Subsequently, the CVDE captures the sensors-among spatial relationship through multiple local spatial graphs and a global spatial graph, and produces a spatial graph sequence (SGS). Finally, the Spatiotemporal Mixer (TSM) mixes PMS and SGS to build a spatiotemporal mixed sequence (TSMS) for downstream tasks, e.g., classification or prediction. We evaluate on two industrial control datasets and discuss applicability to non-industrial multivariate time series. The experimental results on benchmark datasets show that the proposed DTS-MixNet is encouraging.
Full article

Figure 1
Open AccessArticle
A Comparative Study of X Data About the NHS Using Sentiment Analysis
by
Saeed Ur Rehman, Obi Oluchi Blessing and Anwar Ali
Big Data Cogn. Comput. 2025, 9(10), 244; https://doi.org/10.3390/bdcc9100244 - 24 Sep 2025
Abstract
►▼
Show Figures
This study investigates sentiment analysis of X data about the National Health Service (NHS) during a politically charged period, using lexicon-based, machine learning, and deep learning approaches, as well as topic modelling and aspect-based sentiment analysis (ABSA). This study is distinct in its
[...] Read more.
This study investigates sentiment analysis of X data about the National Health Service (NHS) during a politically charged period, using lexicon-based, machine learning, and deep learning approaches, as well as topic modelling and aspect-based sentiment analysis (ABSA). This study is distinct in its comparative evaluation of sentiment analysis techniques on NHS-related tweets during a politically sensitive period, offering insights into public opinion shaped by political discourse. A dataset of 35,000 tweets collected and analysed using various techniques, including VADER, TextBlob, Naive Bayes, Support Vector Machines, Logistic Regression, Ensemble Learning, and BERT. Unlike previous studies that focus on structured feedback or general sentiment, this research uniquely explores unstructured public discourse during an election period, capturing real-time political sentiment towards NHS policies. The sentiment distribution from lexicon-based methods depicted that the presence of stop words could affect model performance. While all models achieved high accuracy on the validation dataset, challenges such as class imbalance and limited labelled data impacted performance, with signs of overfitting observed. Topic modelling identified nine topic clusters, with “waiting list,” “service,” and “immigration” carrying negative sentiments. At the same time, words like “thank,” “support,” “care,” and “team” had the most positive sentiments, reflecting public delight in these areas. ABSA identified positive sentiments towards aspects like “useful service”. This study contributes a comparative framework for evaluating sentiment analysis techniques in politically contextualised healthcare discourse, offering insights for policymakers and researchers. The study underscores the importance of data quality in sentiment analysis. Future research should consider incorporating multilingual datasets, extending data collection periods, optimising deep learning models, and employing hybrid approaches to enhance performance.
Full article

Figure 1

Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Conferences
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
IJERPH, JPM, Healthcare, BDCC, Applied Sciences, Sensors
eHealth and mHealth: Challenges and Prospects, 2nd Edition
Topic Editors: Antonis Billis, Manuel Dominguez-Morales, Anton CivitDeadline: 31 October 2025
Topic in
Actuators, Algorithms, BDCC, Future Internet, JMMP, Machines, Robotics, Systems
Smart Product Design and Manufacturing on Industrial Internet
Topic Editors: Pingyu Jiang, Jihong Liu, Ying Liu, Jihong YanDeadline: 31 December 2025
Topic in
Computers, Information, AI, Electronics, Technologies, BDCC
Graph Neural Networks and Learning Systems
Topic Editors: Huijia Li, Jun Hu, Weichen Zhao, Jie CaoDeadline: 31 January 2026
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 31 March 2026

Conferences
Special Issues
Special Issue in
BDCC
Natural Language Processing Applications in Big Data
Guest Editors: Xingyi Song, Ye Jiang, Yunfei LongDeadline: 22 October 2025
Special Issue in
BDCC
Big Data and Machine Learning Applications for Material Removal, Additive and Hybrid Manufacturing Processes
Guest Editor: Nikolaos FountasDeadline: 31 October 2025
Special Issue in
BDCC
Advances in Large Language Models for Biological and Medical Applications
Guest Editors: Irene Li, Ruihai DongDeadline: 31 October 2025
Special Issue in
BDCC
Beyond Diagnosis: Machine Learning in Prognosis, Prevention, Healthcare, Neurosciences, and Precision Medicine
Guest Editors: Cristian Randieri, Giuseppe Tradigo, Riccardo Pecori, Jakub MieczkowskiDeadline: 1 November 2025