Next Issue
Volume 10, February
Previous Issue
Volume 9, December
 
 

Big Data Cogn. Comput., Volume 10, Issue 1 (January 2026) – 38 articles

Cover Story (view full-size image): Acquired brain injury (ABI) can impair cognitive functions essential for safe driving, reducing independence and quality of life. This study compared driving simulator performance between individuals with ABI and healthy controls and examined associations between cognitive abilities and driving behavior. Using a simulator with increasing task complexity, ABI participants performed similarly to controls in basic vehicle operation but showed deficits in cognitively demanding tasks requiring sustained attention, visuospatial monitoring, and adaptive control, including rural driving, vehicle following, and parking. In controls, simulator performance was associated with attention, processing speed, and spatial orientation, supporting simulator-based assessment as a sensitive tool for evaluating post-injury driving readiness. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
36 pages, 1519 KB  
Review
Thinking Machines: Mathematical Reasoning in the Age of LLMs
by Andrea Asperti, Alberto Naibo and Claudio Sacerdoti Coen
Big Data Cogn. Comput. 2026, 10(1), 38; https://doi.org/10.3390/bdcc10010038 - 22 Jan 2026
Viewed by 237
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in structured reasoning and symbolic tasks, with coding emerging as a particularly successful application. This progress has naturally motivated efforts to extend these models to mathematics, both in its traditional form, expressed through natural-style mathematical [...] Read more.
Large Language Models (LLMs) have demonstrated impressive capabilities in structured reasoning and symbolic tasks, with coding emerging as a particularly successful application. This progress has naturally motivated efforts to extend these models to mathematics, both in its traditional form, expressed through natural-style mathematical language, and in its formalized counterpart, expressed in a symbolic syntax suitable for automatic verification. Yet, despite apparent parallels between programming and proof construction, advances in formalized mathematics have proven significantly more challenging. This gap raises fundamental questions about the nature of reasoning in current LLM architectures, the role of supervision and feedback, and the extent to which such models maintain an internal notion of computational or deductive state. In this article, we review the current state-of-the-art in mathematical reasoning with LLMs, focusing on recent models and benchmarks. We explore three central issues at the intersection of machine learning and mathematical cognition: (i) the trade-offs between traditional and formalized mathematics as training and evaluation domains; (ii) the structural and methodological reasons why proof synthesis remains more brittle than code generation; and (iii) whether LLMs genuinely represent or merely emulate a notion of evolving logical state. Our goal is not to draw rigid distinctions but to clarify the present boundaries of these systems and outline promising directions for their extension. Full article
Show Figures

Figure 1

24 pages, 7898 KB  
Article
Unifying Aesthetic Evaluation via Multimodal Annotation and Fine-Grained Sentiment Analysis
by Kai Liu, Hangyu Xiong, Jinyi Zhang and Min Peng
Big Data Cogn. Comput. 2026, 10(1), 37; https://doi.org/10.3390/bdcc10010037 - 22 Jan 2026
Viewed by 96
Abstract
With the rapid growth of visual content, automated aesthetic evaluation has become increasingly important. However, existing research faces three key challenges: (1) the absence of datasets combining Image Aesthetic Assessment (IAA) scores and Image Aesthetic Captioning (IAC) descriptions; (2) limited integration of quantitative [...] Read more.
With the rapid growth of visual content, automated aesthetic evaluation has become increasingly important. However, existing research faces three key challenges: (1) the absence of datasets combining Image Aesthetic Assessment (IAA) scores and Image Aesthetic Captioning (IAC) descriptions; (2) limited integration of quantitative scores and qualitative text, hindering comprehensive modeling; (3) the subjective nature of aesthetics, which complicates consistent fine-grained evaluation. To tackle these issues, we propose a unified multimodal framework. To address the lack of data, we develop the Textual Aesthetic Sentiment Labeling Pipeline (TASLP) for automatic annotation and construct the Reddit Multimodal Sentiment Dataset (RMSD) with paired IAA and IAC labels. To improve annotation integration, we introduce the Aesthetic Category Sentiment Analysis (ACSA) task, which models fine-grained aesthetic attributes across modalities. To handle subjectivity, we design two models—LAGA for IAA and ACSFM for IAC—that leverage ACSA features to enhance consistency and interpretability. Experiments on RMSD and public benchmarks show that our approach alleviates data limitations and delivers competitive performance, highlighting the effectiveness of fine-grained sentiment modeling and multimodal learning in aesthetic evaluation. Full article
(This article belongs to the Special Issue Machine Learning and Image Processing: Applications and Challenges)
Show Figures

Figure 1

12 pages, 1944 KB  
Article
Extracting Metasystem: A Novel Paradigm to Perceive Complex Systems
by Xue Li and Ying’an Cui
Big Data Cogn. Comput. 2026, 10(1), 36; https://doi.org/10.3390/bdcc10010036 - 19 Jan 2026
Viewed by 232
Abstract
Abundant evidence shows that there is a core component within a complex system, referred to as the metasystem, that fundamentally shapes the structural and dynamical characteristics of a complex system. The limitations of existing techniques for analyzing complex systems have made it increasingly [...] Read more.
Abundant evidence shows that there is a core component within a complex system, referred to as the metasystem, that fundamentally shapes the structural and dynamical characteristics of a complex system. The limitations of existing techniques for analyzing complex systems have made it increasingly desirable to extract metasystems for modeling, measuring, and analyzing complex phenomena. However, the methods of extracting metasystems are still in their infancy with various shortcomings. Here, we propose a universal framework based on divide and conquer to extract fine-grained metasystems. The method comprises three stages performed in sequence: partitioning, sampling, and optimizing. It can decompose a complex system into interconnected metasystem and non-metasystem components, providing a lightweight perspective for studying complex systems: essential insights can be gained by merely examining the internal mechanisms of each component and their interaction patterns. Full article
(This article belongs to the Special Issue Advances in Complex Networks)
Show Figures

Figure 1

30 pages, 1372 KB  
Systematic Review
A Systematic Review and Bibliometric Analysis of Automated Multiple-Choice Question Generation
by Dimitris Mitroulias and Spyros Sioutas
Big Data Cogn. Comput. 2026, 10(1), 35; https://doi.org/10.3390/bdcc10010035 - 18 Jan 2026
Viewed by 334
Abstract
The aim of this study is to systematically capture, synthesize, and evaluate current research trends related to Automated Multiple-Choice Question Generation as they emerge within the broader landscape of natural language processing (NLP) and large language model (LLM)-based educational and assessment research. A [...] Read more.
The aim of this study is to systematically capture, synthesize, and evaluate current research trends related to Automated Multiple-Choice Question Generation as they emerge within the broader landscape of natural language processing (NLP) and large language model (LLM)-based educational and assessment research. A systematic search and selection process was conducted following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, using predefined inclusion and exclusion criteria. A total of 240 eligible publications indexed in the Scopus database were identified and analyzed. To provide a comprehensive overview of this evolving research landscape, a bibliometric analysis was performed utilizing performance analysis and scientific mapping methods, supported by the Bibliometrix (version 4.2.2) R package and VOSviewer (version 1.6.19) software. The findings of the performance analysis indicate a steady upward trend in publications and citations, with significant contributions from leading academic institutions—primarily from the United States—and a strong presence in high quality academic journals. Scientific mapping through co-authorship analysis reveals that, despite the increasing research activity, there remains a need for enhanced collaborative efforts. Bibliographic coupling organizes the analyzed literature into seven thematic clusters, highlighting the main research axes and their diachronic evolution. Furthermore, co-word analysis identifies emerging research trends and underexplored directions, indicating substantial opportunities for future investigation. To the best of our knowledge, this study represents the first systematic bibliometric analysis that examines Automated Multiple-Choice Question Generation research within the context of the broader LLM-driven educational assessment literature. By mapping the relevant scientific production and identifying research gaps and future directions, this work contributes to a more coherent understanding of the field and supports the ongoing development of research at the intersection of generative AI and educational assessment. Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
Show Figures

Figure 1

22 pages, 6241 KB  
Article
Using Large Language Models to Detect and Debunk Climate Change Misinformation
by Zeinab Shahbazi and Sara Behnamian
Big Data Cogn. Comput. 2026, 10(1), 34; https://doi.org/10.3390/bdcc10010034 - 17 Jan 2026
Viewed by 360
Abstract
The rapid spread of climate change misinformation across digital platforms undermines scientific literacy, public trust, and evidence-based policy action. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) create new opportunities for automating the detection and correction of misleading climate-related narratives. [...] Read more.
The rapid spread of climate change misinformation across digital platforms undermines scientific literacy, public trust, and evidence-based policy action. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) create new opportunities for automating the detection and correction of misleading climate-related narratives. This study presents a multi-stage system that employs state-of-the-art large language models such as Generative Pre-trained Transformer 4 (GPT-4), Large Language Model Meta AI (LLaMA) version 3 (LLaMA-3), and RoBERTa-large (Robustly optimized BERT pretraining approach large) to identify, classify, and generate scientifically grounded corrections for climate misinformation. The system integrates several complementary techniques, including transformer-based text classification, semantic similarity scoring using Sentence-BERT, stance detection, and retrieval-augmented generation (RAG) for evidence-grounded debunking. Misinformation instances are detected through a fine-tuned RoBERTa–Multi-Genre Natural Language Inference (MNLI) classifier (RoBERTa-MNLI), grouped using BERTopic, and verified against curated climate-science knowledge sources using BM25 and dense retrieval via FAISS (Facebook AI Similarity Search). The debunking component employs RAG-enhanced GPT-4 to produce accurate and persuasive counter-messages aligned with authoritative scientific reports such as those from the Intergovernmental Panel on Climate Change (IPCC). A diverse dataset of climate misinformation categories covering denialism, cherry-picking of data, false causation narratives, and misleading comparisons is compiled for evaluation. Benchmarking experiments demonstrate that LLM-based models substantially outperform traditional machine-learning baselines such as Support Vector Machines, Logistic Regression, and Random Forests in precision, contextual understanding, and robustness to linguistic variation. Expert assessment further shows that generated debunking messages exhibit higher clarity, scientific accuracy, and persuasive effectiveness compared to conventional fact-checking text. These results highlight the potential of advanced LLM-driven pipelines to provide scalable, real-time mitigation of climate misinformation while offering guidelines for responsible deployment of AI-assisted debunking systems. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

40 pages, 1968 KB  
Article
Large Model in Low-Altitude Economy: Applications and Challenges
by Jinpeng Hu, Wei Wang, Yuxiao Liu and Jing Zhang
Big Data Cogn. Comput. 2026, 10(1), 33; https://doi.org/10.3390/bdcc10010033 - 16 Jan 2026
Viewed by 559
Abstract
The integration of large models and multimodal foundation models into the low-altitude economy is driving a transformative shift, enabling intelligent, autonomous, and efficient operations for low-altitude vehicles (LAVs). This article provides a comprehensive analysis of the role these large models play within the [...] Read more.
The integration of large models and multimodal foundation models into the low-altitude economy is driving a transformative shift, enabling intelligent, autonomous, and efficient operations for low-altitude vehicles (LAVs). This article provides a comprehensive analysis of the role these large models play within the smart integrated lower airspace system (SILAS), focusing on their applications across the four fundamental networks: facility, information, air route, and service. Our analysis yields several key findings, which pave the way for enhancing the application of large models in the low-altitude economy. By leveraging advanced capabilities in perception, reasoning, and interaction, large models are demonstrated to enhance critical functions such as high-precision remote sensing interpretation, robust meteorological forecasting, reliable visual localization, intelligent path planning, and collaborative multi-agent decision-making. Furthermore, we find that the integration of these models with key enabling technologies, including edge computing, sixth-generation (6G) communication networks, and integrated sensing and communication (ISAC), effectively addresses challenges related to real-time processing, resource constraints, and dynamic operational environments. Significant challenges, including sustainable operation under severe resource limitations, data security, network resilience, and system interoperability, are examined alongside potential solutions. Based on our survey, we discuss future research directions, such as the development of specialized low-altitude models, high-efficiency deployment paradigms, advanced multimodal fusion, and the establishment of trustworthy distributed intelligence frameworks. This survey offers a forward-looking perspective on this rapidly evolving field and underscores the pivotal role of large models in unlocking the full potential of the next-generation low-altitude economy. Full article
Show Figures

Figure 1

25 pages, 5725 KB  
Article
Data-Driven Life-Cycle Assessment of Household Air Conditioners: Identifying Low-Carbon Operation Patterns Based on Big Data Analysis
by Genta Sugiyama, Tomonori Honda and Norihiro Itsubo
Big Data Cogn. Comput. 2026, 10(1), 32; https://doi.org/10.3390/bdcc10010032 - 15 Jan 2026
Viewed by 203
Abstract
Air conditioners are a critical adaptation measure against heat- and cold-related risks under climate change. However, their electricity use and refrigerant leakage increase greenhouse gas (GHG) emissions. This study developed a data-driven life-cycle assessment (LCA) framework for residential room air conditioners in Japan [...] Read more.
Air conditioners are a critical adaptation measure against heat- and cold-related risks under climate change. However, their electricity use and refrigerant leakage increase greenhouse gas (GHG) emissions. This study developed a data-driven life-cycle assessment (LCA) framework for residential room air conditioners in Japan by integrating large-scale field operation data with life-cycle climate performance (LCCP) modeling. We aggregated 1 min records for approximately 4100 wall-mounted split units and evaluated the 10-year LCCP across nine climate regions. Using the annual operating hours and electricity consumption, we classified the units into four behavioral quadrants and quantified the life-cycle GHG emissions and parameter sensitivities for each. The results show that the use-phase electricity dominated the total emissions, and that even under the same climate and capacity class, the 10-year per-unit emissions differed by roughly a factor of two between the high- and low-load quadrants. The sensitivity analysis identified the heating hours and the setpoint–indoor temperature difference as the most influential drivers, whereas the grid CO2 intensity, equipment lifetime, and refrigerant assumptions were of secondary importance. By replacing a single assumed use scenario with empirical profiles and behavior-based clusters, the proposed framework improves the representativeness of the LCA for air conditioners. This enabled the design of cluster-specific mitigation strategies. Full article
(This article belongs to the Special Issue Energy Conservation Towards a Low-Carbon and Sustainability Future)
Show Figures

Figure 1

23 pages, 54003 KB  
Article
TRACE: Topical Reasoning with Adaptive Contextual Experts
by Jiabin Ye, Qiuyi Xin, Chu Zhang and Hengnian Qi
Big Data Cogn. Comput. 2026, 10(1), 31; https://doi.org/10.3390/bdcc10010031 - 13 Jan 2026
Viewed by 232
Abstract
Retrieval-Augmented Generation (RAG) is widely used for long-text summarization due to its efficiency and scalability. However, standard RAG methods flatten documents into independent chunks, disrupting sequential flow and thematic structure, resulting in significant loss of contextual information. This paper presents MOEGAT, a novel [...] Read more.
Retrieval-Augmented Generation (RAG) is widely used for long-text summarization due to its efficiency and scalability. However, standard RAG methods flatten documents into independent chunks, disrupting sequential flow and thematic structure, resulting in significant loss of contextual information. This paper presents MOEGAT, a novel graph-enhanced retrieval framework that addresses this limitation by explicitly modeling document structure. MOEGAT constructs an Orthogonal Context Graph to capture sequential discourse and global semantic relationships—long-range dependencies between non-adjacent text spans that reflect topical similarity and logical associations beyond local context. It then employs a query-aware Mixture-of-Experts Graph Attention Network to dynamically activate specialized reasoning pathways. Experiments conducted on three public long-text summarization datasets demonstrate that MOEGAT achieves state-of-the-art performance. Notably, on the WCEP dataset, it outperforms the previous state-of-the-art Graph of Records (GOR) baseline by 14.9%, 18.1%, and 18.4% on ROUGE-L, ROUGE-1, and ROUGE-2, respectively. These substantial gains, especially the 14.9% improvement in ROUGE-L, reflect significantly better capture of long-range coherence and thematic integrity in summaries. Ablation studies confirm the effectiveness of the orthogonal graph and Mixture-of-Experts components. Overall, this work introduces a novel structure-aware approach to RAG that explicitly models and leverages document structure through an orthogonal graph representation and query-aware Mixture-of-Experts reasoning. Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
Show Figures

Figure 1

33 pages, 118991 KB  
Article
Delay-Driven Information Diffusion in Telegram: Modeling, Empirical Analysis, and the Limits of Competition
by Kamila Bakenova, Oleksandr Kuznetsov, Aigul Shaikhanova, Davyd Cherkaskyi, Borys Khrushkov and Valentyn Chernushevych
Big Data Cogn. Comput. 2026, 10(1), 30; https://doi.org/10.3390/bdcc10010030 - 13 Jan 2026
Viewed by 385
Abstract
Information diffusion models developed for Twitter, Reddit, and Facebook assume network contagion and competition for shared attention. Telegram operates differently. It is built around channels rather than social graphs, and users receive posts directly from subscribed channels without algorithmic mediation. We analyze over [...] Read more.
Information diffusion models developed for Twitter, Reddit, and Facebook assume network contagion and competition for shared attention. Telegram operates differently. It is built around channels rather than social graphs, and users receive posts directly from subscribed channels without algorithmic mediation. We analyze over 5000 forwarding cascades from the Pushshift Telegram dataset to examine whether existing diffusion models generalize to this broadcast environment. Our findings reveal fundamental structural differences. Telegram forwarding produces perfect star topologies with zero multi-hop propagation. Every forward connects directly to the original message, creating trees with maximum depth of exactly 1. This contrasts sharply with Twitter retweet chains that routinely reach depths of 5 or more hops. Forwarding delays follow heavy-tailed Weibull or lognormal distributions with median delays measured in days rather than hours. Approximately 15 to 20 percent of cascades exhibit administrative bulk reposting rather than organic user-driven growth. Most strikingly, early-stage competitive overtaking is absent. Six of 30 pairs exhibit crossings, but these occur late (median 79 days) via administrative bursts rather than organic competitive acceleration during peak growth. We develop a delay-driven star diffusion model that treats forwarding as independent draws from a delay distribution. The model achieves median prediction errors below 10 percent for organic cascades. These findings demonstrate that platform architecture fundamentally shapes diffusion dynamics. Comparison with prior studies on Twitter, Weibo, and Reddit reveals that Telegram’s broadcast structure produces categorically different patterns—including perfect star topology and asynchronous delays—requiring platform-specific modeling approaches rather than network-based frameworks developed for other platforms. Full article
(This article belongs to the Special Issue Recent Trends and Applications of Data Science in Social Network)
Show Figures

Figure 1

27 pages, 3466 KB  
Article
Machine Learning-Based Prediction of Operability for Friction Pendulum Isolators Under Seismic Design Levels
by Ayla Ocak, Batuhan Kahvecioğlu, Sinan Melih Nigdeli, Gebrail Bekdaş, Ümit Işıkdağ and Zong Woo Geem
Big Data Cogn. Comput. 2026, 10(1), 29; https://doi.org/10.3390/bdcc10010029 - 12 Jan 2026
Viewed by 340
Abstract
Within the scope of the study, the parameters of friction pendulum-type (FPS) isolators used or planned to be used in different projects were evaluated specifically for the project and its location. The evaluations were conducted within a performance-based seismic design framework using displacement, [...] Read more.
Within the scope of the study, the parameters of friction pendulum-type (FPS) isolators used or planned to be used in different projects were evaluated specifically for the project and its location. The evaluations were conducted within a performance-based seismic design framework using displacement, re-centering, and force-based operability criteria, as implemented through the Türkiye Building Earthquake Code (TBDY) 2018. The friction coefficient and radius of curvature were evaluated, along with the lower and upper limit specifications determined according to TBDY 2018. The planned control points were the period of the isolator system, the isolator re-centering control, and the ratio of the base shear force to the structure weight. Within the scope of the study, isolator groups with different axial load values and different spectra were evaluated. A dataset was prepared by using the parameters obtained from the re-centering, period, and shear force analyses to determine the conditions in which the isolator continued to operate and those in which conditions prevented its operation. Machine learning models were developed to identify FPS isolator configurations that do not satisfy the code-based operability criteria, based on isolator properties, spectral acceleration coefficients corresponding to different earthquake levels, mean dead and live loads, and the number of isolators. The resulting Bagging model predicted an isolator’s operability with a high degree of accuracy, reaching 96%. Full article
Show Figures

Figure 1

26 pages, 3399 KB  
Article
Adaptive Data Prefetching for File Storage Systems Using Online Machine Learning
by George Savva and Herodotos Herodotou
Big Data Cogn. Comput. 2026, 10(1), 28; https://doi.org/10.3390/bdcc10010028 - 10 Jan 2026
Viewed by 301
Abstract
Data prefetching is essential for modern file storage systems operating in large-scale cloud and data-intensive environments, where high performance increasingly depends on intelligent, adaptive mechanisms. Traditional rule-based methods and recently proposed machine learning-based techniques often struggle to cope with the complex and rapidly [...] Read more.
Data prefetching is essential for modern file storage systems operating in large-scale cloud and data-intensive environments, where high performance increasingly depends on intelligent, adaptive mechanisms. Traditional rule-based methods and recently proposed machine learning-based techniques often struggle to cope with the complex and rapidly evolving data access patterns characteristic of big-data workloads. In this paper, we introduce an online, streaming machine learning (SML) approach for predictive data prefetching that retrieves useful data into the cache ahead of time. We present a novel online training framework that extracts features in real time and continuously updates streaming ML models to learn and adapt from large and dynamic access streams. Building on this framework, we design new SML-driven prefetching algorithms that decide when, how, and what data to prefetch into the cache with minimal overhead. Extensive experiments using production traces from Huawei Technologies Inc. and Google workloads from the SNIA IOTTA repository demonstrate that our intelligent policies consistently deliver the highest byte hits among competing approaches, achieving 97% prefetch byte precision and reducing data access latency by up to 2.8 times. These results show that streaming ML can deliver immediate performance gains and offers a scalable foundation for future adaptive storage systems. Full article
Show Figures

Figure 1

22 pages, 7097 KB  
Article
Improving Flat Maxima with Natural Gradient for Better Adversarial Transferability
by Yunfei Long and Huosheng Xu
Big Data Cogn. Comput. 2026, 10(1), 27; https://doi.org/10.3390/bdcc10010027 - 9 Jan 2026
Viewed by 289
Abstract
Deep neural networks are vulnerable and susceptible to adversarial examples, which can induce erroneous predictions by injecting imperceptible perturbations. Transferability is a crucial property of adversarial examples, enabling effective attacks under black-box settings. Adversarial examples at flat maxima-those around which the loss peaks [...] Read more.
Deep neural networks are vulnerable and susceptible to adversarial examples, which can induce erroneous predictions by injecting imperceptible perturbations. Transferability is a crucial property of adversarial examples, enabling effective attacks under black-box settings. Adversarial examples at flat maxima-those around which the loss peaks and grows slowly-have been demonstrated to exhibit higher transferability. Existing methods to achieve flat maxima rely on the gradient of the worst-case loss within the small neighborhood around the adversarial point. However, the neighborhood structure is typically defined as a Euclidean space, which neglects the input space’s information geometry, leading to suboptimal results. In this work, we build upon the idea of flat maxima but extend the neighborhood structure from Euclidean space to the manifold measured by the Fisher metric, which takes into account the information geometry of the data space. In the non-Euclidean case, we search for the worst-case point in the direction of the natural gradient with respect to adversarial examples. The natural gradient adjusts the original gradient using the Fisher information matrix, giving the steepest direction in the manifold. Furthermore, to reduce the computational cost of calculating the Fisher information matrix, we introduce a diagonal approximation of the matrix and propose an empirical Fisher method under the model ensemble setting. Experimental results demonstrate that our proposed manifold extensions significantly enhance attack success rates against both normally and adversarially trained models. In particular, compared to methods relying on the Euclidean metric, our approach demonstrates more efficient performance. Full article
(This article belongs to the Special Issue Internet Intelligence for Cybersecurity)
Show Figures

Figure 1

2 pages, 149 KB  
Editorial
Big Data and Cognitive Computing: Five New Journal Sections Established
by Min Chen and Giancarlo Fortino
Big Data Cogn. Comput. 2026, 10(1), 26; https://doi.org/10.3390/bdcc10010026 - 8 Jan 2026
Viewed by 188
Abstract
The journal Big Data and Cognitive Computing (BDCC) is a scholarly online journal which provides a platform for big data theories with emerging technologies on smart clouds and exploring supercomputers with new cognitive applications [...] Full article
28 pages, 3179 KB  
Article
FakeVoiceFinder: An Open-Source Framework for Synthetic and Deepfake Audio Detection
by Cesar Pachon and Dora Ballesteros
Big Data Cogn. Comput. 2026, 10(1), 25; https://doi.org/10.3390/bdcc10010025 - 7 Jan 2026
Viewed by 507
Abstract
AI-based audio generation has advanced rapidly, enabling deepfake audio to reach levels of naturalness that closely resemble real recordings and complicate the distinction between authentic and synthetic signals. While numerous CNN- and Transformer-based detection approaches have been proposed, most adopt a model-centric perspective [...] Read more.
AI-based audio generation has advanced rapidly, enabling deepfake audio to reach levels of naturalness that closely resemble real recordings and complicate the distinction between authentic and synthetic signals. While numerous CNN- and Transformer-based detection approaches have been proposed, most adopt a model-centric perspective in which the spectral representation remains fixed. Parallel data-centric efforts have explored alternative representations such as scalograms and CQT, yet the field still lacks a unified framework that jointly evaluates the influence of model architecture, its hyperparameters (e.g., learning rate, number of epochs), and the spectral representation along with its own parameters (e.g., representation type, window size). Moreover, there is no standardized approach for benchmarking custom architectures against established baselines under consistent experimental conditions. FakeVoiceFinder addresses this gap by providing a systematic framework that enables direct comparison of model-centric, data-centric, and hybrid evaluation strategies. It supports controlled experimentation, flexible configuration of models and representations, and comprehensive performance reporting tailored to the detection task. This framework enhances reproducibility and helps clarify how architectural and representational choices interact in synthetic audio detection. Full article
Show Figures

Figure 1

19 pages, 857 KB  
Article
Data-Driven Insights: Leveraging Sentiment Analysis and Latent Profile Analysis for Financial Market Forecasting
by Eyal Eckhaus
Big Data Cogn. Comput. 2026, 10(1), 24; https://doi.org/10.3390/bdcc10010024 - 7 Jan 2026
Viewed by 476
Abstract
Background: This study explores an innovative integration of big data analytics techniques aimed at enhancing predictive modeling in financial markets. It investigates how combining sentiment analysis with latent profile analysis (LPA) can accurately forecast stock prices. This research aligns with big data [...] Read more.
Background: This study explores an innovative integration of big data analytics techniques aimed at enhancing predictive modeling in financial markets. It investigates how combining sentiment analysis with latent profile analysis (LPA) can accurately forecast stock prices. This research aligns with big data methodologies by leveraging automated content analysis and segmentation algorithms to address real-world challenges in data-driven decision-making. This study leverages advanced computational methods to process and segment large-scale unstructured data, demonstrating scalability in data-rich environments. Methods: We compiled a corpus of 3843 financial news articles on Teva Pharmaceuticals from Bloomberg and Reuters. Sentiment scores were generated using the VADER tool, and LPA was applied to identify eight distinct sentiment profiles. These profiles were then used in segmented regression models and Structural Equation Modeling (SEM) to assess their predictive value for stock price fluctuations. Results: Six of the eight latent profiles demonstrated significantly higher predictive accuracy compared to traditional sentiment-based models. The combined profile-based regression model explained 47% of the stock price variance (R2 = 0.47), compared to 10% (R2 = 0.10) in the baseline model using sentiment analysis alone. Conclusion: This study pioneers the use of latent profile analysis (LPA) in sentiment analysis for stock price prediction, offering a novel integration of clustering and financial forecasting. By uncovering complex, non-linear links between market sentiment and stock movements, it addresses a key gap in the literature and establishes a powerful foundation for advancing sentiment-based financial models. Full article
Show Figures

Figure 1

20 pages, 2862 KB  
Article
Image–Text Multimodal Sentiment Analysis Algorithm Based on Curriculum Learning and Attention Mechanisms
by Yifan Chang, Zhuoxin Li, Youxiang Ruan and Guangqiang Yin
Big Data Cogn. Comput. 2026, 10(1), 23; https://doi.org/10.3390/bdcc10010023 - 7 Jan 2026
Viewed by 397
Abstract
With the rapid development of mobile internet technology, the explosive growth of image–text multimodal data generated by social networking platforms has provided rich practical scenarios and theoretical research value for multimodal sentiment analysis. However, existing methods generally suffer from inefficient modal interaction and [...] Read more.
With the rapid development of mobile internet technology, the explosive growth of image–text multimodal data generated by social networking platforms has provided rich practical scenarios and theoretical research value for multimodal sentiment analysis. However, existing methods generally suffer from inefficient modal interaction and imperfect sentiment aggregation mechanisms, particularly an over-reliance on visual modalities, leading to an imbalance in cross-modal semantic correlation modeling. To address these issues, this paper proposes a sentiment analysis algorithm for image–text modalities based on curriculum learning and attention mechanisms. The algorithm introduces the concept of curriculum learning, fully considering the negative impact of irrelevant images in image–text data on overall sentiment analysis, effectively suppressing interference from irrelevant visual information without requiring manual data cleaning. Meanwhile, the algorithm designs a dual-stage attention architecture—first capturing cross-modal correlation features via cross-modal attention, then introducing an attention bottleneck strategy to compress redundant information flow, achieving efficient feature fusion by constraining intra-modal attention dimensions. Finally, extensive experiments were conducted on two public datasets, demonstrating that the proposed method outperforms existing approaches in sentiment prediction performance. Full article
Show Figures

Figure 1

53 pages, 3162 KB  
Review
A Review on Fuzzy Cognitive Mapping: Recent Advances and Algorithms
by Gonzalo Nápoles, Agnieszka Jastrzebska, Isel Grau, Yamisleydi Salgueiro and Maikel Leon
Big Data Cogn. Comput. 2026, 10(1), 22; https://doi.org/10.3390/bdcc10010022 - 6 Jan 2026
Viewed by 416
Abstract
Fuzzy Cognitive Maps (FCMs) are a type of recurrent neural network with built-in meaning in their architecture, originally devoted to modeling and scenario simulation tasks. These knowledge-based neural systems support feedback loops that handle static and temporal data. Over the last decade, there [...] Read more.
Fuzzy Cognitive Maps (FCMs) are a type of recurrent neural network with built-in meaning in their architecture, originally devoted to modeling and scenario simulation tasks. These knowledge-based neural systems support feedback loops that handle static and temporal data. Over the last decade, there has been a noticeable increase in the number of contributions dedicated to developing FCM-based models and algorithms for structured pattern classification and time series forecasting. These models are attractive since they have proven competitive compared to black boxes while providing highly desirable interpretability features. Equally important are the theoretical studies that have significantly advanced our understanding of the convergence behavior and approximation capabilities of FCM-based models. These studies can challenge individuals who are not experts in Mathematics or Computer Science. As a result, we can occasionally find flawed FCM studies that fail to benefit from the theoretical progress experienced by the field. To address all these challenges, this survey paper aims to cover relevant theoretical and algorithmic advances in the field, while providing clear interpretations and practical pointers for both practitioners and researchers. Additionally, we will survey existing tools and software implementations, highlighting their strengths and limitations towards developing FCM-based solutions. Full article
Show Figures

Figure 1

35 pages, 3297 KB  
Article
Phenomenological Semantic Factor Method for Risk Management of Complex Systems in Drifting
by Dmitry Rodionov, Prohor Polyakov and Evgeniy Konnikov
Big Data Cogn. Comput. 2026, 10(1), 21; https://doi.org/10.3390/bdcc10010021 - 6 Jan 2026
Viewed by 322
Abstract
Managing risk in drifting complex systems is hindered by the weak integration of unstructured incident narratives into quantitative, decision-ready models. We present a phenomena-centric semantic factor framework that closes the data–model–decision gap by transforming free-text incident reports into transparent, traceable drivers of risk [...] Read more.
Managing risk in drifting complex systems is hindered by the weak integration of unstructured incident narratives into quantitative, decision-ready models. We present a phenomena-centric semantic factor framework that closes the data–model–decision gap by transforming free-text incident reports into transparent, traceable drivers of risk and actionable interventions. The pipeline normalizes and encodes narratives, extracts domain-invariant phenomena, couples them to risk outcomes through calibrated partial least squares factors, and applies scenario optimization to recommend portfolios of measures aligned with EAM/CMMS taxonomies. Applied to a large corpus of incident notifications, the method yields stable, interpretable phenomena, improves out-of-sample risk estimation against strong text-only baselines, and delivers prescriptive recommendations whose composition and cost–risk trade-offs remain robust under concept drift. Sensitivity and ablation analyses identify semantic factorization and PLS coupling as the principal contributors to performance and explainability. The resulting end-to-end process is traceable—from tokens through phenomena and factors to actions—supporting auditability and operational adoption in critical infrastructure. Overall, the study demonstrates that phenomenological semantic factorization combined with scenario optimization provides an effective and transferable solution for integrating incident text into the proactive risk management of complex, drifting systems. Full article
(This article belongs to the Special Issue Application of Semantic Technologies in Intelligent Environment)
Show Figures

Figure 1

25 pages, 2288 KB  
Article
Driving Simulator Performance After Acquired Brain Injury: A Comparative Study of Neuropsychological Predictors
by Marek Sokol, Petr Volf, Jan Hejda, Jiří Remr, Lýdie Leová and Patrik Kutílek
Big Data Cogn. Comput. 2026, 10(1), 20; https://doi.org/10.3390/bdcc10010020 - 6 Jan 2026
Viewed by 367
Abstract
Acquired brain injury (ABI) often results in cognitive and motor impairments that can compromise driving ability, an essential aspect of independence and social participation. This study utilized a custom-designed driving simulator to compare driving performance between individuals with ABI and controls, and to [...] Read more.
Acquired brain injury (ABI) often results in cognitive and motor impairments that can compromise driving ability, an essential aspect of independence and social participation. This study utilized a custom-designed driving simulator to compare driving performance between individuals with ABI and controls, and to examine the relationship between cognitive performance and driving behavior within the control group. All participants completed a series of standardized driving simulation tasks of varying complexity. The control group also completed a neuropsychological battery that assessed attention, processing speed, executive function, and visuospatial abilities. Simulator data were analyzed using generalized linear mixed models to evaluate group differences and, for the control group, cognitive predictors of performance. Results showed that individuals with ABI performed comparably to controls in basic operational tasks but demonstrated reduced performance in cognitively demanding scenarios requiring sustained attention, visuospatial monitoring, and adaptive control, such as rural driving, vehicle following, and parking. In the control group, strong associations were found between simulator outcomes and measures of attention, processing speed, and spatial orientation. The findings support the use of simulator-based assessment as an objective tool sensitive to post-injury impairments and highlight its links to cognitive domains relevant to driving. Full article
Show Figures

Figure 1

36 pages, 968 KB  
Review
Applications of Artificial Intelligence in Fisheries: From Data to Decisions
by Syed Ariful Haque and Saud M. Al Jufaili
Big Data Cogn. Comput. 2026, 10(1), 19; https://doi.org/10.3390/bdcc10010019 - 5 Jan 2026
Viewed by 1230
Abstract
AI enhances aquatic resource management by automating species detection, optimizing feed, forecasting water quality, protecting species interactions, and strengthening the detection of illegal, unreported, and unregulated fishing activities. However, these advancements are inconsistently employed, subject to domain shifts, limited by the availability of [...] Read more.
AI enhances aquatic resource management by automating species detection, optimizing feed, forecasting water quality, protecting species interactions, and strengthening the detection of illegal, unreported, and unregulated fishing activities. However, these advancements are inconsistently employed, subject to domain shifts, limited by the availability of labeled data, and poorly benchmarked across operational contexts. Recent developments in technology and applications in fisheries genetics and monitoring, precision aquaculture, management, and sensing infrastructure are summarized in this paper. We studied automated species recognition, genomic trait inference, environmental DNA metabarcoding, acoustic analysis, and trait-based population modeling in fisheries genetics and monitoring. We used digital-twin frameworks for supervised learning in feed optimization, reinforcement learning for water quality control, vision-based welfare monitoring, and harvest forecasting in aquaculture. We explored automatic identification system trajectory analysis for illicit fishing detection, global effort mapping, electronic bycatch monitoring, protected species tracking, and multi-sensor vessel surveillance in fisheries management. Acoustic echogram automation, convolutional neural network-based fish detection, edge-computing architectures, and marine-domain foundation models are foundational developments in sensing infrastructure. Implementation challenges include performance degradation across habitat and seasonal transitions, insufficient standardized multi-region datasets for rare and protected taxa, inadequate incorporation of model uncertainty into management decisions, and structural inequalities in data access and technology adoption among smallholder producers. Standardized multi-region benchmarks with rare-taxa coverage, calibrated uncertainty quantification in assessment and control systems, domain-robust energy-efficient algorithms, and privacy-preserving data partnerships are our priorities. These integrated priorities enable transition from experimental prototypes to a reliable, collaborative infrastructure for sustainable wild capture and farmed aquatic systems. Full article
Show Figures

Figure 1

22 pages, 15015 KB  
Article
Research on Power Quality Disturbance Identification by Multi-Scale Feature Fusion
by Yunhui Wu, Kunsong Wu, Cheng Qian, Jingjin Wu and Rongnian Tang
Big Data Cogn. Comput. 2026, 10(1), 18; https://doi.org/10.3390/bdcc10010018 - 5 Jan 2026
Viewed by 297
Abstract
In the context of the convergence of multiple energy systems, the risk of power quality degradation across different stages of energy generation and distribution has become increasingly significant. Accurate identification of power quality disturbances is crucial for improving power quality and ensuring the [...] Read more.
In the context of the convergence of multiple energy systems, the risk of power quality degradation across different stages of energy generation and distribution has become increasingly significant. Accurate identification of power quality disturbances is crucial for improving power quality and ensuring the stable operation of power grids. However, existing disturbance identification methods struggle to balance accuracy and computational efficiency, limiting their applicability in real-time monitoring scenarios. To address this issue, this paper proposes a novel disturbance recognition framework called ST-mRMR-RF. The method first applies the S-transform to convert the time-domain signal into the time-frequency domain. It then extracts spectrum, low-frequency, mid-frequency, and high-frequency components as frequency-domain features from this domain. These are fused with time-domain features to form a multi-scale feature set. To reduce feature redundancy, the Maximum Relevance Minimum Redundancy (mRMR) algorithm is applied to select the optimal feature subset, ensuring maximum category relevance and minimal redundancy. Based on this foundation, four classifiers—Random Forest (RF), Partial Least Squares (PLS), Extreme Learning Machine (ELM), and Convolutional Neural Network (CNN)—are employed for disturbance identification. Experimental results show that the feature subset selected via mRMR reduces the model’s training time by 88.91%. When tested in a white noise environment containing 21 types of power quality disturbance signals, the ST-mRMR-RF method achieves a recognition accuracy of 99.24% at a 20dB signal-to-noise ratio. Overall, this framework demonstrates outstanding performance in noise resistance, classification accuracy, and computational efficiency. Full article
Show Figures

Figure 1

22 pages, 820 KB  
Article
CBR2: A Case-Based Reasoning Framework with Dual Retrieval Guidance for Few-Shot KBQA
by Xinyu Hu, Tong Li, Lingtao Xue, Zhipeng Du, Kai Huang, Gang Xiao and He Tang
Big Data Cogn. Comput. 2026, 10(1), 17; https://doi.org/10.3390/bdcc10010017 - 4 Jan 2026
Viewed by 359
Abstract
Recent advances in large language models (LLMs) have driven substantial progress in knowledge base question answering (KBQA), particularly under few-shot settings. However, symbolic program generation remains challenging due to its strict structural constraints and high sensitivity to generation errors. Existing few-shot methods often [...] Read more.
Recent advances in large language models (LLMs) have driven substantial progress in knowledge base question answering (KBQA), particularly under few-shot settings. However, symbolic program generation remains challenging due to its strict structural constraints and high sensitivity to generation errors. Existing few-shot methods often rely on multi-turn strategies, such as rule-based step-by-step reasoning or iterative self-correction, which introduce additional latency and exacerbate error propagation. We present CBR2, a case-based reasoning framework with dual retrieval guidance for single-pass symbolic program generation. Instead of generating programs interactively, CBR2 constructs a unified structure-aware prompt that integrates two complementary types of retrieval: (1) structured knowledge from ontologies and factual triples, and (2) reasoning exemplars retrieved via semantic and function-level similarity. A lightweight similarity model is trained to retrieve structurally aligned programs, enabling effective transfer of abstract reasoning patterns. Experiments on KQA Pro and MetaQA demonstrate that CBR2 achieves significant improvements in both accuracy and syntactic robustness. Specifically on KQA Pro, it boosts Hits@1 from 72.70% to 82.13% and reduces syntax errors by 25%, surpassing the previous few-shot state-of-the-art. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

18 pages, 4122 KB  
Article
AI-Enabled Diagnosis Using YOLOv9: Leveraging X-Ray Image Analysis in Dentistry
by Dhiaa Musleh, Atta Rahman, Haya Almossaeed, Fay Balhareth, Ghadah Alqahtani, Norah Alobaidan, Jana Altalag, May Issa Aldossary and Fahd Alhaidari
Big Data Cogn. Comput. 2026, 10(1), 16; https://doi.org/10.3390/bdcc10010016 - 2 Jan 2026
Viewed by 482
Abstract
Artificial Intelligence (AI)-enabled diagnosis has emerged as a promising avenue for revolutionizing medical image analysis, such as X-ray analysis, across a wide range of healthcare disciplines, including dentistry, consequently offering swift, efficient, and accurate solutions for identifying various dental conditions. In this study, [...] Read more.
Artificial Intelligence (AI)-enabled diagnosis has emerged as a promising avenue for revolutionizing medical image analysis, such as X-ray analysis, across a wide range of healthcare disciplines, including dentistry, consequently offering swift, efficient, and accurate solutions for identifying various dental conditions. In this study, we investigated the application of the YOLOv9 model, a cutting-edge object detection algorithm, to automate the diagnosis of dental diseases from X-ray images. The proposed methodology encompasses a comprehensive analysis of dental datasets, as well as preprocessing and model training. Through rigorous experimentation, remarkable accuracy, precision, recall, mAP@50, and an F1-score of 84.89%, 89.2%, 86.9%, 89.2%, and 88%, respectively, are achieved. With significant improvements over the baseline model of 17.9%, 15.8%, 18.5%, and 16.81% in precision, recall, mAP@50, and F1-score, respectively, with 7.9 ms inference time. This demonstrates the effectiveness of the proposed approach in accurately identifying dental conditions. Additionally, we discuss the challenges in automated diagnosis of dental diseases and outline future research directions to address knowledge gaps in this domain. This study contributes to the growing body of literature on AI in dentistry, providing valuable insights for researchers and practitioners. Full article
(This article belongs to the Special Issue Machine Learning and Image Processing: Applications and Challenges)
Show Figures

Figure 1

27 pages, 16705 KB  
Article
Development of an Ozone (O3) Predictive Emissions Model Using the XGBoost Machine Learning Algorithm
by Esteban Hernandez-Santiago, Edgar Tello-Leal, Jailene Marlen Jaramillo-Perez and Bárbara A. Macías-Hernández
Big Data Cogn. Comput. 2026, 10(1), 15; https://doi.org/10.3390/bdcc10010015 - 1 Jan 2026
Viewed by 464
Abstract
High concentrations of tropospheric ozone (O3) in urban areas pose a significant risk to human health. This study proposes an evaluation framework based on the XGBoost algorithm to predict O3 concentration, assessing the model’s capacity for seasonal extrapolation and [...] Read more.
High concentrations of tropospheric ozone (O3) in urban areas pose a significant risk to human health. This study proposes an evaluation framework based on the XGBoost algorithm to predict O3 concentration, assessing the model’s capacity for seasonal extrapolation and spatial transferability. The experiment uses hourly air pollution data (O3, NO, NO2, and NOx) and meteorological factors (temperature, relative humidity, barometric pressure, wind speed, and wind direction) from six monitoring stations in the Monterrey Metropolitan Area, Mexico (from 22 September 2022 to 21 September 2023). In the preprocessing phase, the datasets were extended via feature engineering, including cyclic variables, rolling windows, and lag features, to capture temporal dynamics. The prediction models were optimized using a random search, with time-series cross-validation to prevent data leakage. The models were evaluated across a concentration range of 0.001 to 0.122 ppm, demonstrating high predictive accuracy, with a coefficient of determination (R2) of up to 0.96 and a root-mean-square error (RMSE) of 0.0034 ppm when predicting summer (O3) concentrations without prior knowledge. Spatial generalization was robust in residential areas (R2 > 0.90), but performance decreased in the industrial corridor (AQMS-NL03). We identified that this decrease is related to local complexity through the quantification of domain shift (Kolmogorov–Smirnov test) and Shapley additive explanations (SHAP) diagnostics, since the model effectively learns atmospheric inertia in stable areas but struggles with the stochastic effects of NOx titration driven by industrial emissions. These findings position the proposed approach as a reliable tool for “virtual detection” while highlighting the crucial role of environmental topology in model implementation. Full article
(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)
Show Figures

Figure 1

34 pages, 1847 KB  
Article
Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration
by Dezhi Cao, Yue Zhao and Xiaona Xu
Big Data Cogn. Comput. 2026, 10(1), 14; https://doi.org/10.3390/bdcc10010014 - 1 Jan 2026
Viewed by 263
Abstract
Accurate forecasting of China’s Consumer Price Index (CPI) is crucial for effective macroeconomic policymaking, yet remains challenging due to structural breaks and nonlinear dynamics inherent in the inflation process. Traditional linear models, such as ARIMA, often fail to capture threshold effects and regime [...] Read more.
Accurate forecasting of China’s Consumer Price Index (CPI) is crucial for effective macroeconomic policymaking, yet remains challenging due to structural breaks and nonlinear dynamics inherent in the inflation process. Traditional linear models, such as ARIMA, often fail to capture threshold effects and regime shifts. This study introduces a Threshold Autoregressive Moving Average (TARMA) model that embeds a nonlinear threshold mechanism within the conventional ARMA framework, enabling it to better capture the CPI’s complex behavior. Leveraging an evolutionary modeling approach, the TARMA model effectively identifies high- and low-inflation regimes, offering enhanced flexibility and interpretability. Empirical results demonstrate that TARMA significantly outperforms standard models. Specifically, regarding the CPI Index level, the out-of-sample Mean Absolute Percentage Error (MAPE) is reduced to approximately 0.35% (under the S-BIC integration scheme), significantly improving upon the baseline ARIMA model. The model adapts well to inflation regime shifts and delivers substantial improvements near turning points. Furthermore, integrating an information-criterion-based weighting scheme further refines forecasts and reduces errors. By addressing the limitations of linear models through threshold-driven nonlinearity, this study offers a more accurate and interpretable framework for forecasting China’s CPI inflation. Full article
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)
Show Figures

Figure 1

21 pages, 2310 KB  
Article
Adversarial Perturbations for Defeating Cryptographic Algorithm Identification
by Shuijun Yin, Di Wu, Haolan Zhang, Heng Li, Zhiyuan Yao and Wei Yuan
Big Data Cogn. Comput. 2026, 10(1), 13; https://doi.org/10.3390/bdcc10010013 - 30 Dec 2025
Viewed by 383
Abstract
Recent advances in machine learning have enabled highly effective ciphertext-based cryptographic algorithm identification, posing a potential threat to encrypted communication. Inspired by adversarial example techniques, we present CSPM (Class-Specific Perturbation Mask Generation), a novel adversarial-defense framework that enhances ciphertext unidentifiability through misleading machine-learning-based [...] Read more.
Recent advances in machine learning have enabled highly effective ciphertext-based cryptographic algorithm identification, posing a potential threat to encrypted communication. Inspired by adversarial example techniques, we present CSPM (Class-Specific Perturbation Mask Generation), a novel adversarial-defense framework that enhances ciphertext unidentifiability through misleading machine-learning-based cipher classifiers. CPSM constructs lightweight, reversible bit-level perturbations that alter statistical ciphertext features without affecting legitimate decryption. The method leverages class prototypes to capture representative bit-distribution patterns for each cryptographic algorithm and integrates two complementary mechanisms—mimicry-based perturbing, which steers ciphertexts toward similar cipher classes, and distortion-based perturbing, which disrupts distinctive statistical traits—through a ranking-based greedy search. Extensive experiments on seven widely used cryptographic algorithms and fifteen NIST statistical feature configurations demonstrate that CSPM consistently reduces algorithm-identification accuracy by over 25%. These results confirm that perturbation position selection, rather than magnitude, dominates attack efficacy. CSPM provides a practical defense mechanism, offering a new perspective for safeguarding encrypted communications against statistical and machine-learning-based traffic analysis. Full article
(This article belongs to the Topic New Trends in Cybersecurity and Data Privacy)
Show Figures

Figure 1

26 pages, 13386 KB  
Article
QU-Net: Quantum-Enhanced U-Net for Self Supervised Embedding and Classification of Skin Cancer Images
by Khidhr Halab, Nabil Marzoug, Othmane El Meslouhi, Zouhair Elamrani Abou Elassad and Moulay A. Akhloufi
Big Data Cogn. Comput. 2026, 10(1), 12; https://doi.org/10.3390/bdcc10010012 - 30 Dec 2025
Viewed by 538
Abstract
Background: Quantum Machine Learning (QML) has attracted significant attention in recent years. With quantum computing achievements in computationally costly domains, discovering its potential in improving the performance and efficiency of deep learning models in medical imaging has become a promising field of research. [...] Read more.
Background: Quantum Machine Learning (QML) has attracted significant attention in recent years. With quantum computing achievements in computationally costly domains, discovering its potential in improving the performance and efficiency of deep learning models in medical imaging has become a promising field of research. Methods: We investigate QML in healthcare by developing a novel quantum-enhanced U-Net (QU-Net). We experiment with six configurations of parameterized quantum circuits, varying the encoding technique (amplitude vs. angle), depth and entanglement. Using the ISIC-2017 skin cancer dataset, we compare QU-Net with classical U-Net on self-supervised image reconstruction and binary classification of benign and malignant skin cancer, where we combine bottleneck embeddings with patient metadata. Results: Our findings show that amplitude encoding stabilizes training, whereas angle encoding introduces fluctuations. The best performance is obtained with amplitude encoding and one layer. For reconstruction, QU-Net with entanglement converges faster (25 epochs vs. 44) with a lower Mean Squared Error per image (0.00015 vs. 0.00017) on unseen data. For classification, QU-Net with no entanglement embeddings reaches 79.03% F1-score compared with 74.14% for U-Net, despite compressing images to a smaller latent space (7 vs. 128). Conclusions: These results demonstrate that the quantum layer enhances U-Net’s expressive power with efficient data embedding. Full article
Show Figures

Graphical abstract

24 pages, 3319 KB  
Article
NovAc-DL: Novel Activity Recognition Based on Deep Learning in the Real-Time Environment
by Saksham Singla, Sheral Singla, Karan Singla, Priya Kansal, Sachin Kansal, Alka Bishnoi and Jyotindra Narayan
Big Data Cogn. Comput. 2026, 10(1), 11; https://doi.org/10.3390/bdcc10010011 - 29 Dec 2025
Viewed by 396
Abstract
Real-time fine-grained human activity recognition (HAR) remains a challenging problem due to rapid spatial–temporal variations, subtle motion differences, and dynamic environmental conditions. Addressing this difficulty, we propose NovAc-DL, a unified deep learning framework designed to accurately classify short human-like actions, specifically, “pour” and [...] Read more.
Real-time fine-grained human activity recognition (HAR) remains a challenging problem due to rapid spatial–temporal variations, subtle motion differences, and dynamic environmental conditions. Addressing this difficulty, we propose NovAc-DL, a unified deep learning framework designed to accurately classify short human-like actions, specifically, “pour” and “stir” from sequential video data. The framework integrates adaptive time-distributed convolutional encoding with temporal reasoning modules to enable robust recognition under realistic robotic-interaction conditions. A balanced dataset of 2000 videos was curated and processed through a consistent spatiotemporal pipeline. Three architectures, LRCN, CNN-TD, and ConvLSTM, were systematically evaluated. CNN-TD achieved the best performance, reaching 98.68% accuracy with the lowest test loss (0.0236), outperforming the other models in convergence speed, generalization, and computational efficiency. Grad-CAM visualizations further confirm that NovAc-DL reliably attends to motion-salient regions relevant to pouring and stirring gestures. These results establish NovAc-DL as a high-precision real-time-capable solution for deployment in healthcare monitoring, industrial automation, and collaborative robotics. Full article
Show Figures

Figure 1

29 pages, 2044 KB  
Article
A Dual-Branch Transformer Framework for Trace-Level Anomaly Detection via Phase-Space Embedding and Causal Message Propagation
by Siyuan Liu, Yiting Chen, Sen Li, Jining Chen and Qian He
Big Data Cogn. Comput. 2026, 10(1), 10; https://doi.org/10.3390/bdcc10010010 - 28 Dec 2025
Viewed by 500
Abstract
In cloud-based distributed systems, trace anomaly detection plays a vital role in maintaining system reliability by identifying early signs of performance degradation or faults. However, existing methods often fail to capture the complex temporal and structural dependencies inherent in trace data. To address [...] Read more.
In cloud-based distributed systems, trace anomaly detection plays a vital role in maintaining system reliability by identifying early signs of performance degradation or faults. However, existing methods often fail to capture the complex temporal and structural dependencies inherent in trace data. To address this, we propose a novel dual-branch Transformer-based framework that integrates both temporal modeling and causal reasoning. The first branch encodes the original trace data to capture direct service-level dynamics, while the second employs phase-space reconstruction to reveal nonlinear temporal interactions by embedding time-delayed representations. To better capture how anomalies propagate across services, we introduce a causal propagation module that leverages directed service call graphs to enforce the time order and directionality during feature aggregation, ensuring anomaly signals propagate along realistic causal paths. Additionally, we propose a hybrid loss function combining the reconstruction error with symmetric Kullback–Leibler divergence between attention maps from the two branches, enabling the model to distinguish normal and anomalous patterns more effectively. Extensive experiments conducted on multiple real-world trace datasets demonstrate that our method consistently outperforms state-of-the-art baselines in terms of precision, recall, and F1 score. The proposed framework proves robust across diverse scenarios, offering improved detection accuracy, and robustness to noisy or complex service dependencies. Full article
Show Figures

Figure 1

35 pages, 2397 KB  
Article
A Monte Carlo Tree Search with Reinforcement Learning and Graph Relational Attention Network for Dynamic Flexible Job Shop Scheduling Problem
by Yu Jia, Rui Yang and Qiuyu Zhang
Big Data Cogn. Comput. 2026, 10(1), 9; https://doi.org/10.3390/bdcc10010009 - 26 Dec 2025
Viewed by 434
Abstract
The dynamic flexible job shop scheduling problem (DFJSP) with machine faults, considering the recovery condition and variable processing time, is studied to determine the rescheduling scheme when machine faults occur in real time. The Monte Carlo Tree Search (MCTS) algorithm with reinforcement learning [...] Read more.
The dynamic flexible job shop scheduling problem (DFJSP) with machine faults, considering the recovery condition and variable processing time, is studied to determine the rescheduling scheme when machine faults occur in real time. The Monte Carlo Tree Search (MCTS) algorithm with reinforcement learning and the relational-enhanced graph attention network (MGRL) is presented to address the DFJSP with machine faults, considering the recovery condition and variable processing time. The MCTS with the skip-node restart strategy, which utilizes local optimal solutions found during the Monte Carlo sampling process, is designed to enhance the optimization efficiency of MCTS in real time. A relational graph attention network (RGAT), a relational-enhanced and transformer-integrated graph network in the MGRL, is designed to analyze the scheduling disjunctive graph, guide the Monte Carlo sampling method to improve sampling efficiency, and enhance the quality of MCTS optimization decisions. Experimental results demonstrate the effectiveness of the RGAT and the skip-node restart strategy. Further application analysis results show that the MGRL is optimal among all comparison methods when algorithms solve the DFJSP. Full article
(This article belongs to the Topic Generative AI and Interdisciplinary Applications)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop