Editor’s Choice Articles

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
46 pages, 573 KiB  
Systematic Review
State of the Art and Future Directions of Small Language Models: A Systematic Review
by Flavio Corradini, Matteo Leonesi and Marco Piangerelli
Big Data Cogn. Comput. 2025, 9(7), 189; https://doi.org/10.3390/bdcc9070189 - 21 Jul 2025
Viewed by 2145
Abstract
Small Language Models (SLMs) have emerged as a critical area of study within natural language processing, attracting growing attention from both academia and industry. This systematic literature review provides a comprehensive and reproducible analysis of recent developments and advancements in SLMs post-2023. Drawing [...] Read more.
Small Language Models (SLMs) have emerged as a critical area of study within natural language processing, attracting growing attention from both academia and industry. This systematic literature review provides a comprehensive and reproducible analysis of recent developments and advancements in SLMs post-2023. Drawing on 70 English-language studies published between January 2023 and January 2025, identified through Scopus, IEEE Xplore, Web of Science, and ACM Digital Library, and focusing primarily on SLMs (including those with up to 7 billion parameters), this review offers a structured overview of the current state of the art and potential future directions. Designed as a resource for researchers seeking an in-depth global synthesis, the review examines key dimensions such as publication trends, visual data representations, contributing institutions, and the availability of public datasets. It highlights prevailing research challenges and outlines proposed solutions, with a particular focus on widely adopted model architectures, as well as common compression and optimization techniques. This study also evaluates the criteria used to assess the effectiveness of SLMs and discusses emerging de facto standards for industry. The curated data and insights aim to support and inform ongoing and future research in this rapidly evolving field. Full article
Show Figures

Figure 1

18 pages, 1663 KiB  
Article
CNN-Based Framework for Classifying COVID-19, Pneumonia, and Normal Chest X-Rays
by Cristian Randieri, Andrea Perrotta, Adriano Puglisi, Maria Grazia Bocci and Christian Napoli
Big Data Cogn. Comput. 2025, 9(7), 186; https://doi.org/10.3390/bdcc9070186 - 11 Jul 2025
Cited by 2 | Viewed by 911
Abstract
This paper describes the development of a CNN model for the analysis of chest X-rays and the automated diagnosis of pneumonia, bacterial or viral, and lung pathologies resulting from COVID-19, offering new insights for further research through the development of an AI-based diagnostic [...] Read more.
This paper describes the development of a CNN model for the analysis of chest X-rays and the automated diagnosis of pneumonia, bacterial or viral, and lung pathologies resulting from COVID-19, offering new insights for further research through the development of an AI-based diagnostic tool, which can be automatically implemented and made available for rapid differentiation between normal pneumonia and COVID-19 starting from X-ray images. The model developed in this work is capable of performing three-class classification, achieving 97.48% accuracy in distinguishing chest X-rays affected by COVID-19 from other pneumonias (bacterial or viral) and from cases defined as normal, i.e., without any obvious pathology. The novelty of our study is represented not only by the quality of the results obtained in terms of accuracy but, above all, by the reduced complexity of the model in terms of parameters and a shorter inference time compared to other models currently found in the literature. The excellent trade-off between the accuracy and computational complexity of our model allows for easy implementation on numerous embedded hardware platforms, such as FPGAs, for the creation of new diagnostic tools to support medical practice. Full article
Show Figures

Figure 1

18 pages, 380 KiB  
Article
Gait-Based Parkinson’s Disease Detection Using Recurrent Neural Networks for Wearable Systems
by Carlos Rangel-Cascajosa, Francisco Luna-Perejón, Saturnino Vicente-Diaz and Manuel Domínguez-Morales
Big Data Cogn. Comput. 2025, 9(7), 183; https://doi.org/10.3390/bdcc9070183 - 7 Jul 2025
Viewed by 588
Abstract
Parkinson’s disease is one of the neurodegenerative conditions that has seen a significant increase in prevalence in recent decades. The lack of specific screening tests and notable disease biomarkers, combined with the strain on healthcare systems, leads to delayed detection of the disease, [...] Read more.
Parkinson’s disease is one of the neurodegenerative conditions that has seen a significant increase in prevalence in recent decades. The lack of specific screening tests and notable disease biomarkers, combined with the strain on healthcare systems, leads to delayed detection of the disease, which worsens its progression. The development of diagnostic support tools can support early detection and facilitate timely intervention. The ability of Deep Learning algorithms to identify complex features from clinical data has proven to be a promising approach in various medical domains as support tools. In this study, we present an investigation of different architectures based on Gated Recurrent Neural Networks to assess their effectiveness in identifying subjects with Parkinson’s disease from gait records. Models with Long-Short term Memory (LSTM) and Gated Recurrent Unit (GRU) layers were evaluated. Performance results reach competitive effectiveness values with the current state-of-the-art accuracy (up to 93.75% (average ± SD: 86 ± 5%)), simplifying computational complexity, which represents an advance in the implementation of executable screening and diagnostic support tools in systems with few computational resources in wearable devices. Full article
Show Figures

Figure 1

47 pages, 6244 KiB  
Review
Toward the Mass Adoption of Blockchain: Cross-Industry Insights from DeFi, Gaming, and Data Analytics
by Shezon Saleem Mohammed Abdul, Anup Shrestha and Jianming Yong
Big Data Cogn. Comput. 2025, 9(7), 178; https://doi.org/10.3390/bdcc9070178 - 3 Jul 2025
Cited by 1 | Viewed by 2962
Abstract
Blockchain’s promise of decentralised, tamper-resistant services is gaining real traction in three arenas: decentralized finance (DeFi), blockchain gaming, and data-driven analytics. These sectors span finance, entertainment, and information services, offering a representative setting in which to study real-world adoption. This survey analyzes how [...] Read more.
Blockchain’s promise of decentralised, tamper-resistant services is gaining real traction in three arenas: decentralized finance (DeFi), blockchain gaming, and data-driven analytics. These sectors span finance, entertainment, and information services, offering a representative setting in which to study real-world adoption. This survey analyzes how each domain implements blockchain, identifies the incentives that accelerate uptake, and maps the technical and organizational barriers that still limit scale. By examining peer-reviewed literature and recent industry developments, this review distils common design features such as token incentives, verifiable digital ownership, and immutable data governance. It also pinpoints the following domain-specific challenges: capital efficiency in DeFi, asset portability and community engagement in gaming, and high-volume, low-latency querying in analytics. Moreover, cross-sector links are already forming, with DeFi liquidity tools supporting in-game economies and analytics dashboards improving decision-making across platforms. Building on these findings, this paper offers guidance on stronger interoperability and user-centered design and sets research priorities in consensus optimization, privacy-preserving analytics, and inclusive governance. Together, the insights equip developers, policymakers, and researchers to build scalable, interoperable platforms and reuse proven designs while avoiding common pitfalls. Full article
(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)
Show Figures

Figure 1

17 pages, 711 KiB  
Article
Boost-Classifier-Driven Fault Prediction Across Heterogeneous Open-Source Repositories
by Philip König, Sebastian Raubitzek, Alexander Schatten, Dennis Toth, Fabian Obermann, Caroline König and Kevin Mallinger
Big Data Cogn. Comput. 2025, 9(7), 174; https://doi.org/10.3390/bdcc9070174 - 2 Jul 2025
Viewed by 344
Abstract
Ensuring reliability, availability, and security in modern software systems hinges on early fault detection, yet predicting which parts of a codebase are most at risk remains a significant challenge. In this paper, we analyze 2.4 million commits drawn from 33 heterogeneous open-source projects, [...] Read more.
Ensuring reliability, availability, and security in modern software systems hinges on early fault detection, yet predicting which parts of a codebase are most at risk remains a significant challenge. In this paper, we analyze 2.4 million commits drawn from 33 heterogeneous open-source projects, spanning healthcare, security tools, data processing, and more. By examining each repository per file and per commit, we derive process metrics (e.g., churn, file age, revision frequency) alongside size metrics and entropy-based indicators of how scattered changes are over time. We train and tune a gradient boosting model to classify bug-prone commits under realistic class-imbalance conditions, achieving robust predictive performance across diverse repositories. Moreover, a comprehensive feature-importance analysis shows that files with long lifespans (high age), frequent edits (revision count), and widely scattered changes (entropy metrics) are especially vulnerable to defects. These insights can help practitioners and researchers prioritize testing and tailor maintenance strategies, ultimately strengthening software dependability. Full article
Show Figures

Figure 1

20 pages, 3062 KiB  
Article
Cognitive Networks and Text Analysis Identify Anxiety as a Key Dimension of Distress in Genuine Suicide Notes
by Massimo Stella, Trevor James Swanson, Andreia Sofia Teixeira, Brianne N. Richson, Ying Li, Thomas T. Hills, Kelsie T. Forbush and David Watson
Big Data Cogn. Comput. 2025, 9(7), 171; https://doi.org/10.3390/bdcc9070171 - 27 Jun 2025
Viewed by 681
Abstract
Understanding the mindset of people who die by suicide remains a key research challenge. We map conceptual and emotional word–word co-occurrences in 139 genuine suicide notes and in reference word lists, an Emotional Recall Task, from 200 individuals grouped by high/low depression, anxiety, [...] Read more.
Understanding the mindset of people who die by suicide remains a key research challenge. We map conceptual and emotional word–word co-occurrences in 139 genuine suicide notes and in reference word lists, an Emotional Recall Task, from 200 individuals grouped by high/low depression, anxiety, and stress levels on DASS-21. Positive words cover most of the suicide notes’ vocabulary; however, co-occurrences in suicide notes overlap mostly with those produced by individuals with low anxiety (Jaccard index of 0.42 for valence and 0.38 for arousal). We introduce a “words not said” method: It removes every word that corpus A shares with a comparison corpus B and then checks the emotions of “residual” words in AB. With no leftover emotions, A and B are similar in expressing the same emotions. Simulations indicate this method can classify high/low levels of depression, anxiety and stress with 80% accuracy in a balanced task. After subtracting suicide note words, only the high-anxiety corpus displays no significant residual emotions. Our findings thus pin anxiety as a key latent feature of suicidal psychology and offer an interpretable language-based marker for suicide risk detection. Full article
Show Figures

Figure 1

19 pages, 2755 KiB  
Article
Real-Time Algal Monitoring Using Novel Machine Learning Approaches
by Seyit Uguz, Yavuz Selim Sahin, Pradeep Kumar, Xufei Yang and Gary Anderson
Big Data Cogn. Comput. 2025, 9(6), 153; https://doi.org/10.3390/bdcc9060153 - 9 Jun 2025
Cited by 2 | Viewed by 1084
Abstract
Monitoring algal growth rates and estimating microalgae concentration in photobioreactor systems are critical for optimizing production efficiency. Traditional methods—such as microscopy, fluorescence, flow cytometry, spectroscopy, and macroscopic approaches—while accurate, are often costly, time-consuming, labor-intensive, and susceptible to contamination or production interference. To overcome [...] Read more.
Monitoring algal growth rates and estimating microalgae concentration in photobioreactor systems are critical for optimizing production efficiency. Traditional methods—such as microscopy, fluorescence, flow cytometry, spectroscopy, and macroscopic approaches—while accurate, are often costly, time-consuming, labor-intensive, and susceptible to contamination or production interference. To overcome these limitations, this study proposes an automated, real-time, and cost-effective solution by integrating machine learning with image-based analysis. We evaluated the performance of Decision Trees (DTS), Random Forests (RF), Gradient Boosting Machines (GBM), and K-Nearest Neighbors (k-NN) algorithms using RGB color histograms extracted from images of Scenedesmus dimorphus cultures. Ground truth data were obtained via manual cell enumeration under a microscope and dry biomass measurements. Among the models tested, DTS achieved the highest accuracy for cell count prediction (R2 = 0.77), while RF demonstrated superior performance for dry biomass estimation (R2 = 0.66). Compared to conventional methods, the proposed ML-based approach offers a low-cost, non-invasive, and scalable alternative that significantly reduces manual effort and response time. These findings highlight the potential of machine learning–driven imaging systems for continuous, real-time monitoring in industrial-scale microalgae cultivation. Full article
Show Figures

Graphical abstract

28 pages, 2486 KiB  
Article
A Framework for Rapidly Prototyping Data Mining Pipelines
by Flavio Corradini, Luca Mozzoni, Marco Piangerelli, Barbara Re and Lorenzo Rossi
Big Data Cogn. Comput. 2025, 9(6), 150; https://doi.org/10.3390/bdcc9060150 - 5 Jun 2025
Viewed by 968
Abstract
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced [...] Read more.
With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced competitive advantage. To address this, systems for analyzing data can help prototype data mining pipelines, mitigating the risks of failure and resource wastage, especially when experimenting with novel techniques. Moreover, business experts often lack deep technical expertise and need robust support to validate their pipeline designs quickly. This paper presents Rainfall, a novel framework for rapidly prototyping data mining pipelines, developed through collaborative projects with industry. The framework’s requirements stem from a combination of literature review findings, iterative industry engagement, and analysis of existing tools. Rainfall enables the visual programming, execution, monitoring, and management of data mining pipelines, lowering the barrier for non-technical users. Pipelines are composed of configurable nodes that encapsulate functionalities from popular libraries or custom user-defined code, fostering experimentation. The framework is evaluated through a case study and SWOT analysis with INGKA, a large-scale industry partner, alongside usability testing with real users and validation against scenarios from the literature. The paper then underscores the value of industry–academia collaboration in bridging theoretical innovation with practical application. Full article
Show Figures

Graphical abstract

34 pages, 20058 KiB  
Article
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
by Grant Wardle and Teo Sušnjak
Big Data Cogn. Comput. 2025, 9(6), 149; https://doi.org/10.3390/bdcc9060149 - 3 Jun 2025
Viewed by 1335
Abstract
Our study investigates how the sequencing of text and image inputs within multi-modal prompts affects the reasoning performance of Large Language Models (LLMs). Through empirical evaluations of three major commercial LLM vendors—OpenAI, Google, and Anthropic—alongside a user study on interaction strategies, we develop [...] Read more.
Our study investigates how the sequencing of text and image inputs within multi-modal prompts affects the reasoning performance of Large Language Models (LLMs). Through empirical evaluations of three major commercial LLM vendors—OpenAI, Google, and Anthropic—alongside a user study on interaction strategies, we develop and validate practical heuristics for optimising multi-modal prompt design. Our findings reveal that modality sequencing is a critical factor influencing reasoning performance, particularly in tasks with varying cognitive load and structural complexity. For simpler tasks involving a single image, positioning the modalities directly impacts model accuracy, whereas in complex, multi-step reasoning scenarios, the sequence must align with the logical structure of inference, often outweighing the specific placement of individual modalities. Furthermore, we identify systematic challenges in multi-hop reasoning within transformer-based architectures, where models demonstrate strong early-stage inference but struggle with integrating prior contextual information in later reasoning steps. Building on these insights, we propose a set of validated, user-centred heuristics for designing effective multi-modal prompts, enhancing both reasoning accuracy and user interaction with AI systems. Our contributions inform the design and usability of interactive intelligent systems, with implications for applications in education, medical imaging, legal document analysis, and customer support. By bridging the gap between intelligent system behaviour and user interaction strategies, this study provides actionable guidance on how users can effectively structure prompts to optimise multi-modal LLM reasoning within real-world, high-stakes decision-making contexts. Full article
Show Figures

Figure 1

44 pages, 1434 KiB  
Review
The Importance of AI Data Governance in Large Language Models
by Saurabh Pahune, Zahid Akhtar, Venkatesh Mandapati and Kamran Siddique
Big Data Cogn. Comput. 2025, 9(6), 147; https://doi.org/10.3390/bdcc9060147 - 28 May 2025
Cited by 1 | Viewed by 4302
Abstract
AI data governance is a crucial framework for ensuring that data are utilized in the lifecycle of large language model (LLM) activity, from the development process to the end-to-end testing process, model validation, secure deployment, and operations. This requires the data to be [...] Read more.
AI data governance is a crucial framework for ensuring that data are utilized in the lifecycle of large language model (LLM) activity, from the development process to the end-to-end testing process, model validation, secure deployment, and operations. This requires the data to be managed responsibly, confidentially, securely, and ethically. The main objective of data governance is to implement a robust and intelligent data governance framework for LLMs, which tends to impact data quality management, the fine-tuning of model performance, biases, data privacy laws, security protocols, ethical AI practices, and regulatory compliance processes in LLMs. Effective data governance steps are important for minimizing data breach activity, enhancing data security, ensuring compliance and regulations, mitigating bias, and establishing clear policies and guidelines. This paper covers the foundation of AI data governance, key components, types of data governance, best practices, case studies, challenges, and future directions of data governance in LLMs. Additionally, we conduct a comprehensive detailed analysis of data governance and how efficient the integration of AI data governance must be for LLMs to gain a trustable approach for the end user. Finally, we provide deeper insights into the comprehensive exploration of the relevance of the data governance framework to the current landscape of LLMs in the healthcare, pharmaceutical, finance, supply chain management, and cybersecurity sectors and address the essential roles to take advantage of the approach of data governance frameworks and their effectiveness and limitations. Full article
Show Figures

Figure 1

15 pages, 1196 KiB  
Article
Bone Segmentation in Low-Field Knee MRI Using a Three-Dimensional Convolutional Neural Network
by Ciro Listone, Diego Romano and Marco Lapegna
Big Data Cogn. Comput. 2025, 9(6), 146; https://doi.org/10.3390/bdcc9060146 - 28 May 2025
Viewed by 837
Abstract
Bone segmentation in magnetic resonance imaging (MRI) is crucial for clinical and research applications, including diagnosis, surgical planning, and treatment monitoring. However, it remains challenging due to anatomical variability and complex bone morphology. Manual segmentation is time-consuming and operator-dependent, fostering interest in automated [...] Read more.
Bone segmentation in magnetic resonance imaging (MRI) is crucial for clinical and research applications, including diagnosis, surgical planning, and treatment monitoring. However, it remains challenging due to anatomical variability and complex bone morphology. Manual segmentation is time-consuming and operator-dependent, fostering interest in automated methods. This study proposes an automated segmentation method based on a 3D U-Net convolutional neural network to segment the femur, tibia, and patella from low-field MRI scans. Low-field MRI offers advantages in cost, patient comfort, and accessibility but presents challenges related to lower signal quality. Our method achieved a Dice Similarity Coefficient (DSC) of 0.9838, Intersection over Union (IoU) of 0.9682, and Average Hausdorff Distance (AHD) of 0.0223, with an inference time of approximately 3.96 s per volume on a GPU. Although post-processing had minimal impact on metrics, it significantly enhanced the visual smoothness of bone surfaces, which is crucial for clinical use. The final segmentations enabled the creation of clean, 3D-printable bone models, beneficial for preoperative planning. These results demonstrate that the model achieves accurate segmentation with a high degree of overlap compared to manually segmented reference data. This accuracy results from meticulous fine-tuning of the network, along with the application of advanced data augmentation and post-processing techniques. Full article
Show Figures

Figure 1

18 pages, 597 KiB  
Article
No-Code Edge Artificial Intelligence Frameworks Comparison Using a Multi-Sensor Predictive Maintenance Dataset
by Juan M. Montes-Sánchez, Plácido Fernández-Cuevas, Francisco Luna-Perejón, Saturnino Vicente-Diaz and Ángel Jiménez-Fernández
Big Data Cogn. Comput. 2025, 9(6), 145; https://doi.org/10.3390/bdcc9060145 - 26 May 2025
Viewed by 1163
Abstract
Edge Computing (EC) is one of the proposed solutions to address the problems that the industry is facing when implementing Predictive Maintenance (PdM) implementations that can benefit from Edge Artificial Intelligence (Edge AI) systems. In this work, we have compared six of the [...] Read more.
Edge Computing (EC) is one of the proposed solutions to address the problems that the industry is facing when implementing Predictive Maintenance (PdM) implementations that can benefit from Edge Artificial Intelligence (Edge AI) systems. In this work, we have compared six of the most popular no-code Edge AI frameworks in the market. The comparison considers economic cost, the number of features, usability, and performance. We used a combination of the analytic hierarchy process (AHP) and the technique for order performance by similarity to the ideal solution (TOPSIS) to compare the frameworks. We consulted ten independent experts on Edge AI, four employed in industry and the other six in academia. These experts defined the importance of each criterion by deciding the weights of TOPSIS using AHP. We performed two different classification tests on each framework platform using data from a public dataset for PdM on biomedical equipment. Magnetometer data were used for test 1, and accelerometer data were used for test 2. We obtained the F1 score, flash memory, and latency metrics. There was a high level of consensus between the worlds of academia and industry when assigning the weights. Therefore, the overall comparison ranked the analyzed frameworks similarly. NanoEdgeAIStudio ranked first when considering all weights and industry only weights, and Edge Impulse was the first option when using academia only weights. In terms of performance, there is room for improvement in most frameworks, as they did not reach the metrics of the previously developed custom Edge AI solution. We identified some limitations that should be fixed to improve the comparison method in the future, like adding weights to the feature criteria or increasing the number and variety of performance tests. Full article
Show Figures

Figure 1

25 pages, 2733 KiB  
Article
Polarity of Yelp Reviews: A BERT–LSTM Comparative Study
by Rachid Belaroussi, Sié Cyriac Noufe, Francis Dupin and Pierre-Olivier Vandanjon
Big Data Cogn. Comput. 2025, 9(5), 140; https://doi.org/10.3390/bdcc9050140 - 21 May 2025
Viewed by 1326
Abstract
With the rapid growth in social network comments, the need for more effective methods to classify their polarity—negative, neutral, or positive—has become essential. Sentiment analysis, powered by natural language processing, has evolved significantly with the adoption of advanced deep learning techniques. Long Short-Term [...] Read more.
With the rapid growth in social network comments, the need for more effective methods to classify their polarity—negative, neutral, or positive—has become essential. Sentiment analysis, powered by natural language processing, has evolved significantly with the adoption of advanced deep learning techniques. Long Short-Term Memory networks capture long-range dependencies in text, while transformers, with their attention mechanisms, excel at preserving contextual meaning and handling high-dimensional, semantically complex data. This study compares the performance of sentiment analysis models based on LSTM and BERT architectures using key evaluation metrics. The dataset consists of business reviews from the Yelp Open Dataset. We tested LSTM-based methods against BERT and its variants—RoBERTa, BERTweet, and DistilBERT—leveraging popular pipelines from the Hugging Face Hub. A class-by-class performance analysis is presented, revealing that more complex BERT-based models do not always guarantee superior results in the classification of Yelp reviews. Additionally, the use of bidirectionality in LSTMs does not necessarily lead to better performance. However, across a diversity of test sets, transformer models outperform traditional RNN-based models, as their generalization capability is greater than that of a simple LSTM model. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

29 pages, 4204 KiB  
Article
A Comparative Study of Ensemble Machine Learning and Explainable AI for Predicting Harmful Algal Blooms
by Omer Mermer, Eddie Zhang and Ibrahim Demir
Big Data Cogn. Comput. 2025, 9(5), 138; https://doi.org/10.3390/bdcc9050138 - 20 May 2025
Viewed by 1276
Abstract
Harmful algal blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study enhances the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled [...] Read more.
Harmful algal blooms (HABs), driven by environmental pollution, pose significant threats to water quality, public health, and aquatic ecosystems. This study enhances the prediction of HABs in Lake Erie, part of the Great Lakes system, by utilizing ensemble machine learning (ML) models coupled with explainable artificial intelligence (XAI) for interpretability. Using water quality data from 2013 to 2020, various physical, chemical, and biological parameters were analyzed to predict chlorophyll-a (Chl-a) concentrations, which are a commonly used indicator of phytoplankton biomass and a proxy for algal blooms. This study employed multiple ensemble ML models, including random forest (RF), deep forest (DF), gradient boosting (GB), and XGBoost, and compared their performance against individual models, such as support vector machine (SVM), decision tree (DT), and multi-layer perceptron (MLP). The findings revealed that the ensemble models, particularly XGBoost and deep forest (DF), achieved superior predictive accuracy, with R2 values of 0.8517 and 0.8544, respectively. The application of SHapley Additive exPlanations (SHAPs) provided insights into the relative importance of the input features, identifying the particulate organic nitrogen (PON), particulate organic carbon (POC), and total phosphorus (TP) as the critical factors influencing the Chl-a concentrations. This research demonstrates the effectiveness of ensemble ML models for achieving high predictive accuracy, while the integration of XAI enhances model interpretability. The results support the development of proactive water quality management strategies and highlight the potential of advanced ML techniques for environmental monitoring. Full article
(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)
Show Figures

Figure 1

26 pages, 2125 KiB  
Article
Adaptive Augmented Reality Architecture for Optimising Assistance and Safety in Industry 4.0
by Ginés Morales Méndez and Francisco del Cerro Velázquez
Big Data Cogn. Comput. 2025, 9(5), 133; https://doi.org/10.3390/bdcc9050133 - 19 May 2025
Cited by 1 | Viewed by 947
Abstract
The present study proposes adaptive augmented reality (AR) architecture, specifically designed to enhance real-time operator assistance and occupational safety in industrial environments, which is representative of Industry 4.0. The proposed system addresses key challenges in AR adoption, such as the need for dynamic [...] Read more.
The present study proposes adaptive augmented reality (AR) architecture, specifically designed to enhance real-time operator assistance and occupational safety in industrial environments, which is representative of Industry 4.0. The proposed system addresses key challenges in AR adoption, such as the need for dynamic personalisation of instructions based on operator profiles and the mitigation of technical and cognitive barriers. Architecture integrates theoretical modelling, modular design, and real-time adaptability to match instruction complexity with user expertise and environmental conditions. A working prototype was implemented using Microsoft HoloLens 2, Unity 3D, and Vuforia and validated in a controlled industrial scenario involving predictive maintenance and assembly tasks. The experimental results demonstrated statistically significant enhancements in task completion time, error rates, perceived cognitive load, operational efficiency, and safety indicators in comparison with conventional methods. The findings underscore the system’s capacity to enhance both performance and consistency while concomitantly bolstering risk mitigation in intricate operational settings. This study proposes a scalable and modular AR framework with built-in safety and adaptability mechanisms, demonstrating practical benefits for human–machine interaction in Industry 4.0. The present study is subject to certain limitations, including validation in a simulated environment, which limits the direct extrapolation of the results to real industrial scenarios; further evaluation in various operational contexts is required to verify the overall scalability and applicability of the proposed system. It is recommended that future research studies explore the long-term ergonomics, scalability, and integration of emerging technologies in decision support within adaptive AR systems. Full article
Show Figures

Figure 1

19 pages, 5047 KiB  
Article
Robust Anomaly Detection of Multivariate Time Series Data via Adversarial Graph Attention BiGRU
by Yajing Xing, Jinbiao Tan, Rui Zhang and Jiafu Wan
Big Data Cogn. Comput. 2025, 9(5), 122; https://doi.org/10.3390/bdcc9050122 - 8 May 2025
Viewed by 841
Abstract
Multivariate time series data (MTSD) anomaly detection due to complex spatio-temporal dependencies among sensors and pervasive environmental noise. The existing methods struggle to balance anomaly detection accuracy with robustness against data contamination. Hence, this paper proposes a robust multivariate temporal data anomaly detection [...] Read more.
Multivariate time series data (MTSD) anomaly detection due to complex spatio-temporal dependencies among sensors and pervasive environmental noise. The existing methods struggle to balance anomaly detection accuracy with robustness against data contamination. Hence, this paper proposes a robust multivariate temporal data anomaly detection method based on graph attention for training convolutional neural networks (PGAT-BiGRU-NRA). Firstly, the parallel graph attention (PGAT) mechanism extracts the time-dependent and spatially related features of MTSD to realize the MTSD fusion. Then, a bidirectional gate recurrent unit (BiGRU) is utilized to extract the contextual information of the data to avoid information loss. In addition, reconstructing the noise for adversarial training aims to achieve a more robust anomaly detection of MTSD. The experiments conducted on real industrial equipment datasets evaluate the effectiveness of the method in the task of MTSD, and the comparative experiments verify that the proposed method outperforms the mainstream baseline model. The proposed method achieves anomaly detection and robust performance in noise interference, which provides feasible technical support for the stable operation of industrial equipment in complex environments. Full article
Show Figures

Figure 1

20 pages, 1750 KiB  
Article
Enhancing Recommendation Systems with Real-Time Adaptive Learning and Multi-Domain Knowledge Graphs
by Zeinab Shahbazi, Rezvan Jalali and Zahra Shahbazi
Big Data Cogn. Comput. 2025, 9(5), 124; https://doi.org/10.3390/bdcc9050124 - 8 May 2025
Cited by 1 | Viewed by 1488
Abstract
In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt [...] Read more.
In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt dynamically to users’ evolving interests across multiple content domains in real-time. To address this gap, the cross-domain adaptive recommendation system (CDARS) is proposed, which integrates real-time behavioral tracking with multi-domain knowledge graphs to refine user preference modeling continuously. Unlike conventional methods that rely on static or historical data, CDARS dynamically adjusts its recommendation strategies based on contextual factors such as real-time engagement, sentiment fluctuations, and implicit preference drifts. Furthermore, a novel explainable adaptive learning (EAL) module was introduced, providing transparent insights into recommendations’ evolving nature, thereby improving user trust and system interpretability. To enable such real-time adaptability, CDARS incorporates multimodal sentiment analysis of user-generated content, behavioral pattern mining (e.g., click timing, revisit frequency), and learning trajectory modeling through time-aware embeddings and incremental updates of user representations. These dynamic signals are mapped into evolving knowledge graphs, forming continuously updated learning charts that drive more context-aware and emotionally intelligent recommendations. Our experimental results on datasets spanning social media, e-commerce, and entertainment domains demonstrate that CDARS significantly enhances recommendation relevance, achieving an average improvement of 7.8% in click-through rate (CTR) and 8.3% in user engagement compared to state-of-the-art models. This research presents a paradigm shift toward truly dynamic and explainable recommendation systems, creating a way for more personalized and user-centric experiences in the digital landscape. Full article
Show Figures

Figure 1

47 pages, 29654 KiB  
Review
A Survey on Object-Oriented Assembly and Disassembly Operations in Nuclear Applications
by Wenxing Liu, Ipek Caliskanelli, Hanlin Niu, Kaiqiang Zhang and Robert Skilton
Big Data Cogn. Comput. 2025, 9(5), 118; https://doi.org/10.3390/bdcc9050118 - 28 Apr 2025
Viewed by 740
Abstract
Nuclear environments demand exceptional precision, reliability, and safety, given the high stakes involved in handling radioactive materials and maintaining reactor systems. Object-oriented assembly and disassembly operations in nuclear applications represent a cutting-edge approach to managing complex, high-stakes operations with enhanced precision and safety. [...] Read more.
Nuclear environments demand exceptional precision, reliability, and safety, given the high stakes involved in handling radioactive materials and maintaining reactor systems. Object-oriented assembly and disassembly operations in nuclear applications represent a cutting-edge approach to managing complex, high-stakes operations with enhanced precision and safety. This paper discusses the challenges associated with nuclear robotic remote operations, summarizes current methods for handling object-oriented assembly and disassembly operations, and explores potential future research directions in this field. Object-oriented assembly and disassembly operations are vital in nuclear applications due to their ability to manage complexity, ensure precision, and enhance safety and reliability, all of which are paramount in the demanding and high-risk environment of nuclear technology. Full article
(This article belongs to the Special Issue Field Robotics and Artificial Intelligence (AI))
Show Figures

Figure 1

14 pages, 1934 KiB  
Article
Evaluating Deep Learning Architectures for Breast Tumor Classification and Ultrasound Image Detection Using Transfer Learning
by Christopher Kormpos, Fotios Zantalis, Stylianos Katsoulis and Grigorios Koulouras
Big Data Cogn. Comput. 2025, 9(5), 111; https://doi.org/10.3390/bdcc9050111 - 23 Apr 2025
Cited by 1 | Viewed by 1498
Abstract
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images, [...] Read more.
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images, whether sourced from healthcare professionals or individuals, are valid and relevant for analysis. To address this, we propose an initial binary classification filter to distinguish between relevant and irrelevant images, ensuring only meaningful data proceeds to subsequent analysis. However, the primary focus of this study lies in investigating the performance of a hierarchical two-tier classification architecture compared to a traditional flat three-class classification model, by employing a well-established breast ultrasound images dataset. Specifically, we explore whether sequentially breaking down the problem into binary classifications, first identifying normal versus tumorous tissue and then distinguishing benign from malignant tumors, yields better accuracy and robustness than directly classifying all three categories in a single step. Using a range of evaluation metrics, the hierarchical architecture demonstrates notable advantages in certain critical aspects of model performance. The findings of this study provide valuable guidance for selecting the optimal architecture for the final model, facilitating its seamless integration into a web application for deployment. These insights are further anticipated to advance future algorithm development and broaden the potential of the research applicability across diverse fields. Full article
Show Figures

Figure 1

21 pages, 541 KiB  
Article
Cognitive Computing with Large Language Models for Student Assessment Feedback
by Noorhan Abbas and Eric Atwell
Big Data Cogn. Comput. 2025, 9(5), 112; https://doi.org/10.3390/bdcc9050112 - 23 Apr 2025
Viewed by 1033
Abstract
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B [...] Read more.
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B and CodeLlama-7B, to streamline feedback generation for student reports containing both Python programming elements and English narrative content. The findings indicate that these models can provide contextually appropriate feedback on both technical Python coding and English specification and documentation. They effectively identified coding weaknesses and provided constructive suggestions for improvement, as well as insightful feedback on English language quality, structure, and clarity in report writing. These results contribute to the growing body of knowledge on automated assessment feedback in higher education, offering practical insights for institutions considering the implementation of open-source LLMs in their workflows. There are around 22 thousand assessment submissions per year in the School of Computer Science, which is one of eight schools in the Faculty of Engineering and Physical Sciences, which is one of seven faculties in the University of Leeds, which is one of one hundred and sixty-six universities in the UK, so there is clear potential for our methods to scale up to millions of assessment submissions. This study also examines the limitations of current approaches and proposes potential enhancements. The findings support a hybrid system where cognitive computing manages routine tasks and educators focus on complex, personalised evaluations, enhancing feedback quality, consistency, and efficiency in educational settings. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
21 pages, 1529 KiB  
Article
Semantic-Driven Approach for Validation of IoT Streaming Data in Trustable Smart City Decision-Making and Monitoring Systems
by Oluwaseun Bamgboye, Xiaodong Liu, Peter Cruickshank and Qi Liu
Big Data Cogn. Comput. 2025, 9(4), 108; https://doi.org/10.3390/bdcc9040108 - 21 Apr 2025
Viewed by 576
Abstract
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. [...] Read more.
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. This paper describes a semantic IoT streaming data validation approach to provide a semantic IoT data model and process IoT streaming data with the semantic stream processing systems to check the quality requirements of IoT streams. The proposed approach enhances the understanding of smart city data while supporting real-time, data-driven decision-making and monitoring processes. A publicly available sensor dataset collected from a busy road in Milan city is constructed, annotated and semantically processed by the proposed approach and its architecture. The architecture, built on a robust semantic-based system, incorporates a reasoning technique based on forward rules, which is integrated within the semantic stream query processing system. It employs serialized Resource Description Framework (RDF) data formats to enhance stream expressiveness and enables the real-time validation of missing and inconsistent data streams within continuous sliding-window operations. The effectiveness of the approach is validated by deploying multiple RDF stream instances to the architecture before evaluating its accuracy and performance (in terms of reasoning time). The approach underscores the capability of semantic technology in sustaining the validation of IoT streaming data by accurately identifying up to 99% of inconsistent and incomplete streams in each streaming window. Also, it can maintain the performance of the semantic reasoning process in near real time. The approach provides an enhancement to data quality and credibility, capable of providing near-real-time decision support mechanisms for critical smart city applications, and facilitates accurate situational awareness across both the application and operational levels of the smart city. Full article
Show Figures

Figure 1

23 pages, 2189 KiB  
Article
From Rating Predictions to Reliable Recommendations in Collaborative Filtering: The Concept of Recommendation Reliability Classes
by Dionisis Margaris, Costas Vassilakis and Dimitris Spiliotopoulos
Big Data Cogn. Comput. 2025, 9(4), 106; https://doi.org/10.3390/bdcc9040106 - 17 Apr 2025
Viewed by 586
Abstract
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to [...] Read more.
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to them. Collaborative filtering is a popular recommender system technique which generates rating prediction scores by blending the ratings that users with similar preferences have previously given to these products. However, predictions may entail errors, which will either lead to recommending products that the users would not accept or failing to recommend products that the users would actually accept. The first case is considered much more critical, since the recommender system will lose a significant amount of reliability and consequently interest. In this paper, after performing a study on rating prediction confidence factors in collaborative filtering, (a) we introduce the concept of prediction reliability classes, (b) we rank these classes in relation to the utility of the rating predictions belonging to each class, and (c) we present a collaborative filtering recommendation algorithm which exploits these reliability classes for prediction formulation. The efficacy of the presented algorithm is evaluated through an extensive multi-parameter evaluation process, which demonstrates that it significantly enhances recommendation quality. Full article
Show Figures

Figure 1

31 pages, 14157 KiB  
Article
Assessing the Impact of Temperature and Precipitation Trends of Climate Change on Agriculture Based on Multiple Global Circulation Model Projections in Malta
by Benjamin Mifsud Scicluna and Charles Galdies
Big Data Cogn. Comput. 2025, 9(4), 105; https://doi.org/10.3390/bdcc9040105 - 17 Apr 2025
Viewed by 1256
Abstract
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison [...] Read more.
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison project phase 5 (CMIP5) models under two Representative Concentration pathways (RCPs). Through statistical and spatial analysis, the study demonstrates that climate change will have significant adverse effects on Maltese agriculture. Regardless of the RCP scenario considered, projections indicate a substantial increase in temperature and a decline in precipitation, exacerbating aridity and intensifying heat stress. These changes are expected to reduce soil moisture availability and challenge traditional agricultural practices. The study identifies the Western District as a relatively more favourable area for crop cultivation due to its comparatively lower temperatures, whereas the Northern and South Eastern peripheries are projected to experience more severe heat stress. Adaptation strategies, including the selection of heat-tolerant crop varieties such as Tetyda and Finezja, optimised water management techniques, and intercropping practices, are proposed to enhance agricultural resilience. This study is among the few comprehensive assessments of bioclimatic and physical factors affecting Maltese agriculture and highlights the urgent need for targeted adaptation measures to safeguard food production in the region. Full article
Show Figures

Figure 1

26 pages, 30835 KiB  
Article
Uncertainty-Aware δ-GLMB Filtering for Multi-Target Tracking
by M. Hadi Sepanj, Saed Moradi, Zohreh Azimifar and Paul Fieguth
Big Data Cogn. Comput. 2025, 9(4), 84; https://doi.org/10.3390/bdcc9040084 - 31 Mar 2025
Viewed by 658
Abstract
The δ-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in [...] Read more.
The δ-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in the field. Theoretically, the δ-GLMB filter handles uncertainties in measurements in its filtering procedure. However, in practice, degeneration of the measurement quality affects the performance of this filter. In this paper, we discuss the effects of increasing measurement uncertainty on the δ-GLMB filter and also propose two heuristic methods to improve the performance of the filter in such conditions. The base idea of the proposed methods is to utilise the information stored in the history of the filtering procedure, which can be used to decrease the measurement uncertainty effects on the filter. Since GLMB filters have shown good results in the field of multi-target tracking, an uncertainty-immune δ-GLMB can serve as a strong tool in this area. In this study, the results indicate that the proposed heuristic ideas can improve the performance of filtering in the presence of uncertain observations. Experimental evaluations demonstrate that the proposed methods enhance track continuity and robustness, particularly in scenarios with low detection rates and high clutter, while maintaining computational feasibility. Full article
Show Figures

Figure 1

38 pages, 9923 KiB  
Article
A Verifiable, Privacy-Preserving, and Poisoning Attack-Resilient Federated Learning Framework
by Washington Enyinna Mbonu, Carsten Maple, Gregory Epiphaniou and Christo Panchev
Big Data Cogn. Comput. 2025, 9(4), 85; https://doi.org/10.3390/bdcc9040085 - 31 Mar 2025
Viewed by 1091
Abstract
Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a [...] Read more.
Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a malicious server can indirectly obtain sensitive information through shared gradients. On the other hand, the correctness of the global model can be corrupted through poisoning attacks from malicious clients using carefully manipulated updates. Many related works on secure aggregation and poisoning attack detection have been proposed and applied in various scenarios to address these two issues. Nevertheless, existing works are based on the trust confidence that the server will return correctly aggregated results to the participants. However, a malicious server may return false aggregated results to participants. It is still an open problem to simultaneously preserve users’ privacy and defend against poisoning attacks while enabling participants to verify the correctness of aggregated results from the server. In this paper, we propose a privacy-preserving and poisoning attack-resilient federated learning framework that supports the verification of aggregated results from the server. Specifically, we design a zero-trust dual-server architectural framework instead of a traditional single-server scheme based on trust. We exploit additive secret sharing to eliminate the single point of exposure of the training data and implement a weight selection and filtering strategy to enhance robustness to poisoning attacks while supporting the verification of aggregated results from the servers. Theoretical analysis and extensive experiments conducted on real-world data demonstrate the practicability of our proposed framework. Full article
Show Figures

Figure 1

20 pages, 496 KiB  
Article
GenAI Learning for Game Design: Both Prior Self-Transcendent Pursuit and Material Desire Contribute to a Positive Experience
by Dongpeng Huang and James E. Katz
Big Data Cogn. Comput. 2025, 9(4), 78; https://doi.org/10.3390/bdcc9040078 - 27 Mar 2025
Cited by 1 | Viewed by 736
Abstract
This study explores factors influencing positive experiences with generative AI (GenAI) in a learning game design context. Using a sample of 26 master’s-level students in a course on AI’s societal aspects, this study examines the impact of (1) prior knowledge and attitudes toward [...] Read more.
This study explores factors influencing positive experiences with generative AI (GenAI) in a learning game design context. Using a sample of 26 master’s-level students in a course on AI’s societal aspects, this study examines the impact of (1) prior knowledge and attitudes toward technology and learning, and (2) personal value orientations. Results indicated that both students’ self-transcendent goals and desire for material benefits have positive correlations with collaborative, cognitive, and affective outcomes. However, self-transcendent goals are a stronger predictor, as determined by stepwise regression analysis. Attitudes toward technology were positively associated with cognitive and affective outcomes during the first week, though this association did not persist into the second week. Most other attitudinal variables were not associated with collaborative or cognitive outcomes but were linked to negative affect. These findings suggest that students’ personal values correlate more strongly with the collaborative, cognitive, and affective aspects of using GenAI for educational game design than their attitudinal attributes. This result may indicate that the design experience neutralizes the effect of earlier attitudes towards technology, with major influences deriving from personal value orientations. If these findings are borne out, this study has implications for the utility of current educational efforts to change students’ attitudes towards technology, especially those that encourage more women to study STEM topics. Thus, it may be that, rather than pro-technology instruction, a focus on value orientations would be a more effective way to encourage diverse students to participate in STEM programs. Full article
Show Figures

Figure 1

21 pages, 2021 KiB  
Article
A Data Mining Approach to Identify NBA Player Quarter-by-Quarter Performance Patterns
by Dimitrios Iatropoulos, Vangelis Sarlis and Christos Tjortjis
Big Data Cogn. Comput. 2025, 9(4), 74; https://doi.org/10.3390/bdcc9040074 - 25 Mar 2025
Cited by 2 | Viewed by 3556
Abstract
Sports analytics is a fast-evolving domain using advanced data science methods to find useful insights. This study explores the way NBA player performance metrics evolve from quarter to quarter and affect game outcomes. Using Association Rule Mining, we identify key offensive, defensive, and [...] Read more.
Sports analytics is a fast-evolving domain using advanced data science methods to find useful insights. This study explores the way NBA player performance metrics evolve from quarter to quarter and affect game outcomes. Using Association Rule Mining, we identify key offensive, defensive, and overall impact metrics that influence success in both regular-season and playoff contexts. Defensive metrics become more critical in late-game situations, while offensive efficiency is paramount in the playoffs. Ball handling peaks in the second quarter, affecting early momentum, while overall impact metrics, such as Net Rating and Player Impact Estimate, consistently correlate with winning. In the collected dataset we performed preprocessing, applying advanced anomaly detection and discretization techniques. By segmenting performance into five categories—Offense, Defense, Ball Handling, Overall Impact, and Tempo—we uncovered strategic insights for teams, coaches, and analysts. Results emphasize the importance of managing player fatigue, optimizing lineups, and adjusting strategies based on quarter-specific trends. The analysis provides actionable recommendations for coaching decisions, roster management, and player evaluation. Future work can extend this approach to other leagues and incorporate additional contextual factors to refine evaluation and predictive models. Full article
Show Figures

Figure 1

30 pages, 2168 KiB  
Article
Generation Z’s Travel Behavior and Climate Change: A Comparative Study for Greece and the UK
by Athanasios Demiris, Grigorios Fountas, Achille Fonzone and Socrates Basbas
Big Data Cogn. Comput. 2025, 9(3), 70; https://doi.org/10.3390/bdcc9030070 - 17 Mar 2025
Cited by 3 | Viewed by 2626
Abstract
Climate change is one of the most pressing global threats, endangering the sustainability of the planet and quality of life, whilst urban mobility significantly contributes to exacerbating its effects. Recently, policies aimed at mitigating these effects have been implemented, emphasizing the promotion of [...] Read more.
Climate change is one of the most pressing global threats, endangering the sustainability of the planet and quality of life, whilst urban mobility significantly contributes to exacerbating its effects. Recently, policies aimed at mitigating these effects have been implemented, emphasizing the promotion of sustainable travel culture. Prior research has indicated that both environmental awareness and regulatory efforts could encourage the shift towards greener mobility; however, factors that affect young people’s travel behavior remain understudied. This study examined whether and how climate change impacts travel behavior, particularly among Generation Z in Greece. A comprehensive online survey was conducted, from 31 March to 8 April 2024, within a Greek academic community, yielding 904 responses from Generation Z individuals. The design of the survey was informed by an adaptation of Triandis’ Theory of Interpersonal Behavior. The study also incorporated a comparative analysis using data from the UK’s National Travel Attitudes Survey (NTAS), offering insights from a different cultural and socio-economic context. Blending an Exploratory Factor Analysis and latent variable ordered probit and logit models, the key determinants of the willingness to reduce car use and self-reported reduction in car use in response to climate change were identified. The results indicate that emotional factors, social roles, and norms, along with socio-demographic characteristics, current behaviors, and local environmental concerns, significantly influence car-related travel choices among Generation Z. For instance, concerns about local air quality are consistently correlated with a higher likelihood of having already reduced car use due to climate change and a higher willingness to reduce car travel in the future. The NTAS data reveal that flexibility in travel habits and social norms are critical determinants of the willingness to reduce car usage. The findings of the study highlight the key role of policy interventions, such as the implementation of Low-Emission Zones, leveraging social media for environmental campaigns, and enhancing infrastructure for active travel and public transport to foster broader cultural shifts towards sustainable travel behavior among Generation Z. Full article
Show Figures

Figure 1

23 pages, 528 KiB  
Article
Defining, Detecting, and Characterizing Power Users in Threads
by Gianluca Bonifazi, Christopher Buratti, Enrico Corradini, Michele Marchetti, Federica Parlapiano, Domenico Ursino and Luca Virgili
Big Data Cogn. Comput. 2025, 9(3), 69; https://doi.org/10.3390/bdcc9030069 - 16 Mar 2025
Cited by 1 | Viewed by 637
Abstract
Threads is a new social network that was launched by Meta in July 2023 and conceived as a direct alternative to X. It is a unique case study in the social network landscape, as it is content-based like X, but has an Instagram-based [...] Read more.
Threads is a new social network that was launched by Meta in July 2023 and conceived as a direct alternative to X. It is a unique case study in the social network landscape, as it is content-based like X, but has an Instagram-based growth model, which makes it significantly different from X. As it was launched recently, studies on Threads are still scarce. One of the most common investigations in social networks regards power users (also called influencers, lead users, influential users, etc.), i.e., those users who can significantly influence information dissemination, user behavior, and ultimately the current dynamics and future development of a social network. In this paper, we want to contribute to the knowledge of Threads by showing that there are indeed power users in this social network and then attempt to understand the main features that characterize them. The definition of power users that we adopt here is novel and leverages the four classical centrality measures of Social Network Analysis. This ensures that our study of power users can benefit from the enormous knowledge on centrality measures that has accumulated in the literature over the years. In order to conduct our analysis, we had to build a Threads dataset, as none existed in the literature that contained the information necessary for our studies. Once we built such a dataset, we decided to make it open and thus available to all researchers who want to perform analyses on Threads. This dataset, the new definition of power users, and the characterization of Threads power users are the main contributions of this paper. Full article
Show Figures

Figure 1

18 pages, 879 KiB  
Article
A Comparative Analysis of Sentence Transformer Models for Automated Journal Recommendation Using PubMed Metadata
by Maria Teresa Colangelo, Marco Meleti, Stefano Guizzardi, Elena Calciolari and Carlo Galli
Big Data Cogn. Comput. 2025, 9(3), 67; https://doi.org/10.3390/bdcc9030067 - 13 Mar 2025
Cited by 2 | Viewed by 3749
Abstract
We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript [...] Read more.
We present an automated journal recommendation pipeline designed to evaluate the performance of five Sentence Transformer models—all-mpnet-base-v2 (Mpnet), all-MiniLM-L6-v2 (Minilm-l6), all-MiniLM-L12-v2 (Minilm-l12), multi-qa-distilbert-cos-v1 (Multi-qa-distilbert), and all-distilroberta-v1 (roberta)—for recommending journals aligned with a manuscript’s thematic scope. The pipeline extracts domain-relevant keywords from a manuscript via KeyBERT, retrieves potentially related articles from PubMed, and encodes both the test manuscript and retrieved articles into high-dimensional embeddings. By computing cosine similarity, it ranks relevant journals based on thematic overlap. Evaluations on 50 test articles highlight mpnet’s strong performance (mean similarity score 0.71 ± 0.04), albeit with higher computational demands. Minilm-l12 and minilm-l6 offer comparable precision at lower cost, while multi-qa-distilbert and roberta yield broader recommendations better suited to interdisciplinary research. These findings underscore key trade-offs among embedding models and demonstrate how they can provide interpretable, data-driven insights to guide journal selection across varied research contexts. Full article
Show Figures

Figure 1

13 pages, 2003 KiB  
Article
An Expected Goals On Target (xGOT) Model: Accounting for Goalkeeper Performance in Football
by Blanca De-la-Cruz-Torres, Miguel Navarro-Castro and Anselmo Ruiz-de-Alarcón-Quintero
Big Data Cogn. Comput. 2025, 9(3), 64; https://doi.org/10.3390/bdcc9030064 - 10 Mar 2025
Cited by 2 | Viewed by 3607
Abstract
A key challenge in utilizing the expected goals on target (xGOT) metric is the limited public access to detailed football event and positional data, alongside other advanced metrics. This study aims to develop an xGOT model to evaluate goalkeeper (GK) performance based on [...] Read more.
A key challenge in utilizing the expected goals on target (xGOT) metric is the limited public access to detailed football event and positional data, alongside other advanced metrics. This study aims to develop an xGOT model to evaluate goalkeeper (GK) performance based on the probability of successful actions, considering not only the outcomes (saves or goals conceded) but also the difficulty of each shot faced. Formal definitions were established for the following: (i) the initial distance between the ball and the GK at the moment of the shot, (ii) the distance between the ball and the GK over time post-shot, and (iii) the distance between the GK’s initial position and the goal, with respect to the y-coordinate. An xGOT model incorporating geometric parameters was designed to optimize performance based on the ball position, trajectory, and GK positioning. The model was tested using shots on target from the 2022 FIFA World Cup. Statistical evaluation using k-fold cross-validation yielded an AUC-ROC score of 0.67 and an 85% accuracy, confirming the model’s ability to differentiate successful GK performances. This approach enables a more precise evaluation of GK decision-making by analyzing a representative dataset of shots to estimate the probability of success. Full article
Show Figures

Figure 1

48 pages, 1680 KiB  
Article
Trustworthy AI for Whom? GenAI Detection Techniques of Trust Through Decentralized Web3 Ecosystems
by Igor Calzada, Géza Németh and Mohammed Salah Al-Radhi
Big Data Cogn. Comput. 2025, 9(3), 62; https://doi.org/10.3390/bdcc9030062 - 6 Mar 2025
Viewed by 3575
Abstract
As generative AI (GenAI) technologies proliferate, ensuring trust and transparency in digital ecosystems becomes increasingly critical, particularly within democratic frameworks. This article examines decentralized Web3 mechanisms—blockchain, decentralized autonomous organizations (DAOs), and data cooperatives—as foundational tools for enhancing trust in GenAI. These mechanisms are [...] Read more.
As generative AI (GenAI) technologies proliferate, ensuring trust and transparency in digital ecosystems becomes increasingly critical, particularly within democratic frameworks. This article examines decentralized Web3 mechanisms—blockchain, decentralized autonomous organizations (DAOs), and data cooperatives—as foundational tools for enhancing trust in GenAI. These mechanisms are analyzed within the framework of the EU’s AI Act and the Draghi Report, focusing on their potential to support content authenticity, community-driven verification, and data sovereignty. Based on a systematic policy analysis, this article proposes a multi-layered framework to mitigate the risks of AI-generated misinformation. Specifically, as a result of this analysis, it identifies and evaluates seven detection techniques of trust stemming from the action research conducted in the Horizon Europe Lighthouse project called ENFIELD: (i) federated learning for decentralized AI detection, (ii) blockchain-based provenance tracking, (iii) zero-knowledge proofs for content authentication, (iv) DAOs for crowdsourced verification, (v) AI-powered digital watermarking, (vi) explainable AI (XAI) for content detection, and (vii) privacy-preserving machine learning (PPML). By leveraging these approaches, the framework strengthens AI governance through peer-to-peer (P2P) structures while addressing the socio-political challenges of AI-driven misinformation. Ultimately, this research contributes to the development of resilient democratic systems in an era of increasing technopolitical polarization. Full article
Show Figures

Figure 1

16 pages, 785 KiB  
Review
ChatGPT’s Impact Across Sectors: A Systematic Review of Key Themes and Challenges
by Hussam Hussein, Madelina Gordon, Cameron Hodgkinson, Robert Foreman and Sumaya Wagad
Big Data Cogn. Comput. 2025, 9(3), 56; https://doi.org/10.3390/bdcc9030056 - 28 Feb 2025
Cited by 2 | Viewed by 4556
Abstract
This paper critically examines the expanding body of literature on ChatGPT, a transformative AI tool with widespread global adoption. By categorising research into six key themes—sustainability, health, education, work, social media, and energy—it explores ChatGPT’s versatility, benefits, and challenges. The findings highlight its [...] Read more.
This paper critically examines the expanding body of literature on ChatGPT, a transformative AI tool with widespread global adoption. By categorising research into six key themes—sustainability, health, education, work, social media, and energy—it explores ChatGPT’s versatility, benefits, and challenges. The findings highlight its potential to enhance productivity, streamline workflows, and improve access to knowledge while also revealing critical limitations, including high energy consumption, informational inaccuracies, and ethical concerns. The paper underscores the need for robust regulatory frameworks, sustainable AI practices, and interdisciplinary collaboration to optimise benefits while mitigating risks. Future research should focus on improving ChatGPT’s reliability, inclusivity, and environmental sustainability to ensure its responsible integration across diverse sectors. Full article
Show Figures

Figure 1

58 pages, 4720 KiB  
Article
Exploring Predictive Modeling for Food Quality Enhancement: A Case Study on Wine
by Cemil Emre Yavas, Jongyeop Kim, Lei Chen, Christopher Kadlec and Yiming Ji
Big Data Cogn. Comput. 2025, 9(3), 55; https://doi.org/10.3390/bdcc9030055 - 26 Feb 2025
Cited by 3 | Viewed by 1825
Abstract
What makes a wine exceptional enough to score a perfect 10 from experts? This study explores a data-driven approach to identify the ideal physicochemical composition for wines that could achieve this highest possible rating. Using a dataset of 11 measurable attributes, including alcohol, [...] Read more.
What makes a wine exceptional enough to score a perfect 10 from experts? This study explores a data-driven approach to identify the ideal physicochemical composition for wines that could achieve this highest possible rating. Using a dataset of 11 measurable attributes, including alcohol, sulfates, residual sugar, density, and citric acid, for wines rated up to a maximum quality score of 8 by expert tasters, we sought to predict compositions that might enhance wine quality beyond current observations. Our methodology applies a second-degree polynomial ridge regression model, optimized through an exhaustive evaluation of feature combinations. Furthermore, we propose a specific chemical and physical composition of wine that our model predicts could achieve a quality score of 10 from experts. While further validation with winemakers and industry experts is necessary, this study aims to contribute a practical tool for guiding quality exploration and advancing predictive modeling applications in food and beverage sciences. Full article
Show Figures

Figure 1

27 pages, 1975 KiB  
Review
Cognitive Computing and Business Intelligence Applications in Accounting, Finance and Management
by Sio-Iong Ao, Marc Hurwitz and Vasile Palade
Big Data Cogn. Comput. 2025, 9(3), 54; https://doi.org/10.3390/bdcc9030054 - 26 Feb 2025
Cited by 3 | Viewed by 3931
Abstract
Cognitive computing encompasses computing tools and methods that simulate and mimic the process of human thinking, without human supervision. Deep neural network architectures, natural language processing, big data tools, and self-learning tools based on pattern recognition have been widely deployed to solve highly [...] Read more.
Cognitive computing encompasses computing tools and methods that simulate and mimic the process of human thinking, without human supervision. Deep neural network architectures, natural language processing, big data tools, and self-learning tools based on pattern recognition have been widely deployed to solve highly complex problems. Business intelligence enhances collaboration among different organizational departments with data-driven conversations and provides an organization with meaningful data interpretation for making strategic decisions on time. Since the introduction of ChatGPT in November 2022, the tremendous impacts of using Large Language Models have been rippling through cognitive computing, business intelligence, and their applications in accounting, finance, and management. Unlike other recent reviews in related areas, this review focuses precisely on the cognitive computing perspective, with frontier applications in accounting, finance, and management. Some current limitations and future directions of cognitive computing are also discussed. Full article
Show Figures

Figure 1

14 pages, 668 KiB  
Article
Fine-Grained Local and Global Semantic Fusion for Multimodal Image–Text Retrieval
by Shenao Peng, Zhongmei Wang, Jianhua Liu, Changfan Zhang and Lin Jia
Big Data Cogn. Comput. 2025, 9(3), 53; https://doi.org/10.3390/bdcc9030053 - 25 Feb 2025
Viewed by 924
Abstract
An image–text retrieval method that integrates intramodal fine-grained local semantic information and intermodal global semantic information is proposed to address the weak fine-grained discrimination capabilities for the semantic features located between image and text modalities in cross-modal retrieval tasks. First, the original features [...] Read more.
An image–text retrieval method that integrates intramodal fine-grained local semantic information and intermodal global semantic information is proposed to address the weak fine-grained discrimination capabilities for the semantic features located between image and text modalities in cross-modal retrieval tasks. First, the original features of images and texts are extracted, and a graph attention network is employed for region relationship reasoning to obtain relation-enhanced local features. Then, an attention mechanism is used for different semantically interacting samples within the same modality, enabling comprehensive intramodal relationship learning and producing semantically enhanced image and text embeddings. Finally, a triplet loss function is used to train the entire model, and it is enhanced with an angular constraint. Through extensive comparative experiments conducted on the Flickr30K and MS-COCO benchmark datasets, the effectiveness and superiority of the proposed method were verified. It outperformed the current method by 6.4% relatively for image retrieval and 1.3% relatively for caption retrieval on MS-COCO (Recall@1 using the 1K test set). Full article
Show Figures

Figure 1

16 pages, 1438 KiB  
Article
A Web-Based Platform for Hand Rehabilitation Assessment
by Dimitrios N. Soumis and Nikolaos D. Tselikas
Big Data Cogn. Comput. 2025, 9(3), 52; https://doi.org/10.3390/bdcc9030052 - 24 Feb 2025
Cited by 1 | Viewed by 827
Abstract
Hand impairment affects millions of people. There are multiple factors that cause deficits, varying from physical injuries to neurological disorders. Upper-limb patients face significant difficulties in daily life. Rehabilitation aims at supporting them to regain functionality and increasing their independence and quality of [...] Read more.
Hand impairment affects millions of people. There are multiple factors that cause deficits, varying from physical injuries to neurological disorders. Upper-limb patients face significant difficulties in daily life. Rehabilitation aims at supporting them to regain functionality and increasing their independence and quality of life. Assessment is key to therapy, as it offers an evaluation of the condition of patients, leading to suitable treatments. Unfortunately, rehabilitation relies on clinical resources, making it expensive and time-consuming. Digital technology can provide solutions that make treatments more flexible and affordable. With the use of computer vision, we created an online platform that includes several exercises and serious games, based on movements and gestures performed in real-world treatments. Difficulty levels vary, and therapists can monitor these procedures remotely, while performance can be stored and tracked over time, identifying improvement. There is no need for any special equipment, as the platform can be accessed like a common website and all its applications require only a simple computer camera and stable Internet connection. In this article, we present our research approach, we analyze the development of the platform, and we provide a brief demonstration of its use in practice. Furthermore, we address some technical challenges and we share the results derived from preliminary test phases, concluding by outlining future plans. Full article
Show Figures

Figure 1

21 pages, 1760 KiB  
Article
On Continually Tracing Origins of LLM-Generated Text and Its Application in Detecting Cheating in Student Coursework
by Quan Wang and Haoran Li
Big Data Cogn. Comput. 2025, 9(3), 50; https://doi.org/10.3390/bdcc9030050 - 20 Feb 2025
Cited by 2 | Viewed by 1487
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in text generation, which also raise numerous concerns about their potential misuse, especially in educational exercises and academic writing. Accurately identifying and tracing the origins of LLM-generated content is crucial for accountability and transparency, ensuring [...] Read more.
Large language models (LLMs) have demonstrated remarkable capabilities in text generation, which also raise numerous concerns about their potential misuse, especially in educational exercises and academic writing. Accurately identifying and tracing the origins of LLM-generated content is crucial for accountability and transparency, ensuring the responsible use of LLMs in educational and academic environments. Previous methods utilize binary classifiers to discriminate whether a piece of text was written by a human or generated by a specific LLM or employ multi-class classifiers to trace the source LLM from a fixed set. These methods, however, are restricted to one or several pre-specified LLMs and cannot generalize to new LLMs, which are continually emerging. This study formulates source LLM tracing in a class-incremental learning (CIL) fashion, where new LLMs continually emerge, and a model incrementally learns to identify new LLMs without forgetting old ones. A training-free continual learning method is further devised for the task, the idea of which is to continually extract prototypes for emerging LLMs, using a frozen encoder, and then to perform origin tracing via prototype matching after a delicate decorrelation process. For evaluation, two datasets are constructed, one in English and one in Chinese. These datasets simulate a scenario where six LLMs emerge over time and are used to generate student essays, and an LLM detector has to incrementally expand its recognition scope as new LLMs appear. Experimental results show that the proposed method achieves an average accuracy of 97.04% on the English dataset and 91.23% on the Chinese dataset. These results validate the feasibility of continual origin tracing of LLM-generated text and verify its effectiveness in detecting cheating in student coursework. Full article
Show Figures

Figure 1

21 pages, 3633 KiB  
Article
Reusing ML Models in Dynamic Data Environments: Data Similarity-Based Approach for Efficient MLOps
by Eduardo Peixoto, Diogo Torres, Davide Carneiro, Bruno Silva and Ruben Marques
Big Data Cogn. Comput. 2025, 9(2), 47; https://doi.org/10.3390/bdcc9020047 - 19 Feb 2025
Viewed by 938
Abstract
The rapid integration of Machine Learning (ML) in organizational practices has driven demand for substantial computational resources, incurring both high economic costs and environmental impact, particularly from energy consumption. This challenge is amplified in dynamic data environments, where ML models must be frequently [...] Read more.
The rapid integration of Machine Learning (ML) in organizational practices has driven demand for substantial computational resources, incurring both high economic costs and environmental impact, particularly from energy consumption. This challenge is amplified in dynamic data environments, where ML models must be frequently retrained to adapt to evolving data patterns. To address this, more sustainable Machine Learning Operations (MLOps) pipelines are needed for reducing environmental impacts while maintaining model accuracy. In this paper, we propose a model reuse approach based on data similarity metrics, which allows organizations to leverage previously trained models where applicable. We introduce a tailored set of meta-features to characterize data windows, enabling efficient similarity assessment between historical and new data. The effectiveness of the proposed method is validated across multiple ML tasks using the cosine and Bray–Curtis distance functions, which evaluate both model reuse rates and the performance of reused models relative to newly trained alternatives. The results indicate that the proposed approach can reduce the frequency of model retraining by up to 70% to 90% while maintaining or even improving predictive performance, contributing to more resource-efficient and sustainable MLOps practices. Full article
Show Figures

Figure 1

12 pages, 2665 KiB  
Article
Association Between Mastication Pattern, Periodontal Condition, and Cognitive Condition—Investigation Using Large Database of Japanese Universal Healthcare System
by Takahiko Shiba, Daisuke Sasaki, Juanna Xie, Chia-Yu Chen, Hiroyuki Tanaka and Shigemi Nagai
Big Data Cogn. Comput. 2025, 9(2), 43; https://doi.org/10.3390/bdcc9020043 - 17 Feb 2025
Viewed by 897
Abstract
The decline in oral health commonly occurs as a natural consequence of aging or due to various pathological factors. Tooth loss, which diminishes masticatory ability, has been associated with negative impacts on cognitive function. This observational study analyzed dental and medical records from [...] Read more.
The decline in oral health commonly occurs as a natural consequence of aging or due to various pathological factors. Tooth loss, which diminishes masticatory ability, has been associated with negative impacts on cognitive function. This observational study analyzed dental and medical records from Japan’s Universal Healthcare System (UHCS) national database to investigate the relationship between cognitive and oral disorders, focusing on periodontitis and decreased tooth-to-tooth contact between the maxillary and mandibular arches. A descriptive data analysis evaluated diagnostic codes for Alzheimer’s disease and cognitive impairment alongside dental treatment records from 2013 to 2018. The odds ratios for cognitive impairment in patients with partial loss of natural tooth contact were 1.6663 (p < 0.05) for early elderly individuals (aged 65–75) and 1.5003 (p < 0.0001) for advanced elderly individuals (over 75). Periodontally compromised patients had higher odds, with ratios of 1.3936 (p < 0.0001) for early elderly individuals and 1.1888 (p < 0.00001) for advanced elderly individuals, compared to their periodontally healthy counterparts. These findings suggest a potential link between cognitive health, natural tooth contact preservation, and periodontitis, with the loss of natural tooth contacts having the most significant impact on cognitive function. Full article
Show Figures

Figure 1

19 pages, 867 KiB  
Article
Exploring the Boundaries Between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code
by Zixian Zhang and Takfarinas Saber
Big Data Cogn. Comput. 2025, 9(2), 41; https://doi.org/10.3390/bdcc9020041 - 13 Feb 2025
Cited by 3 | Viewed by 2932
Abstract
As Large Language Models (LLMs) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed LLM performance on human-generated code, the proliferation of LLM-generated code raises critical questions about their ability to detect clones across [...] Read more.
As Large Language Models (LLMs) continue to advance, their capabilities in code clone detection have garnered significant attention. While much research has assessed LLM performance on human-generated code, the proliferation of LLM-generated code raises critical questions about their ability to detect clones across both human- and LLM-created codebases, as this capability remains largely unexplored. This paper addresses this gap by evaluating two versions of LLaMA3 on these distinct types of datasets. Additionally, we perform a deeper analysis beyond simple prompting, examining the nuanced relationship between code cloning and code similarity that LLMs infer. We further explore how fine-tuning impacts LLM performance in clone detection, offering new insights into the interplay between code clones and similarity in human versus AI-generated code. Our findings reveal that LLaMA models excel in detecting syntactic clones but face challenges with semantic clones. Notably, the models perform better on LLM-generated datasets for semantic clones, suggesting a potential bias. The fine-tuning technique enhances the ability of LLMs to comprehend code semantics, improving their performance in both code clone detection and code similarity assessment. Our results offer valuable insights into the effectiveness and characteristics of LLMs in clone detection and code similarity assessment, providing a foundation for future applications and guiding further research in this area. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

25 pages, 13698 KiB  
Article
Self-Supervised Foundation Model for Template Matching
by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova
Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025
Viewed by 1735
Abstract
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
Show Figures

Figure 1

34 pages, 8053 KiB  
Article
Novel Extreme-Lightweight Fully Convolutional Network for Low Computational Cost in Microbiological and Cell Analysis: Detection, Quantification, and Segmentation
by Juan A. Ramirez-Quintana, Edgar A. Salazar-Gonzalez, Mario I. Chacon-Murguia and Carlos Arzate-Quintana
Big Data Cogn. Comput. 2025, 9(2), 36; https://doi.org/10.3390/bdcc9020036 - 9 Feb 2025
Cited by 1 | Viewed by 936
Abstract
Integrating deep learning into microbiological and cell analysis from microscopic image samples has gained significant attention in recent years, driven by the rise of novel medical technologies and pressing global health challenges. Numerous methods for segmentation and classification in microscopic images have emerged [...] Read more.
Integrating deep learning into microbiological and cell analysis from microscopic image samples has gained significant attention in recent years, driven by the rise of novel medical technologies and pressing global health challenges. Numerous methods for segmentation and classification in microscopic images have emerged in the literature. However, key challenges persist due to the limited development of specialized deep learning models to accurately detect and quantify microorganisms and cells from microscopic samples. In response to this gap, this paper introduces MBnet, an Extreme-Lightweight Neural Network for Microbiological and Cell Analysis. MBnet is a binary segmentation method based on a Fully Convolutional Network designed to detect and quantify microorganisms and cells, featuring a low computational cost architecture with only 575 parameters. Its innovative design includes a foreground module and an encoder–decoder structure composed of traditional, depthwise, and separable convolution layers. These layers integrate color, orientation, and morphological features to generate an understanding of different contexts in microscopic sample images for binary segmentation. Experiments were conducted using datasets containing bacteria, yeast, and blood cells. The results suggest that MBnet outperforms other popular networks in the literature in counting, detecting, and segmenting cells and unicellular microorganisms. These findings underscore the potential of MBnet as a highly efficient solution for real-world applications in health monitoring and bioinformatics. Full article
Show Figures

Graphical abstract

25 pages, 10920 KiB  
Article
Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection
by Wenan Yuan and Peng Li
Big Data Cogn. Comput. 2025, 9(2), 28; https://doi.org/10.3390/bdcc9020028 - 29 Jan 2025
Cited by 2 | Viewed by 1351
Abstract
Multi-class object detectors often suffer from the class imbalance issue, where substantial model performance discrepancies exist between classes. Generative adversarial networks (GANs), an emerging deep learning research topic, are able to learn from existing data distributions and generate similar synthetic data, which might [...] Read more.
Multi-class object detectors often suffer from the class imbalance issue, where substantial model performance discrepancies exist between classes. Generative adversarial networks (GANs), an emerging deep learning research topic, are able to learn from existing data distributions and generate similar synthetic data, which might serve as valid training data for improving object detectors. The current study investigated the utility of lightweight unconditional GAN in addressing weak object detector class performance by incorporating synthetic data into real data for model retraining, under an agricultural context. AriAplBud, a multi-growth stage aerial apple flower bud dataset was deployed in the study. A baseline YOLO11n detector was first developed based on training, validation, and test datasets derived from AriAplBud. Six FastGAN models were developed based on dedicated subsets of the same YOLO training and validation datasets for different apple flower bud growth stages. Positive sample rates and average instance number per image of synthetic data generated by each of the FastGAN models were investigated based on 1000 synthetic images and the baseline detector at various confidence thresholds. In total, 13 new YOLO11n detectors were retrained specifically for the two weak growth stages, tip and half-inch green, by including synthetic data in training datasets to increase total instance number to 1000, 2000, 4000, and 8000, respectively, pseudo-labeled by the baseline detector. FastGAN showed its resilience in successfully generating positive samples, despite apple flower bud instances being generally small and randomly distributed in the images. Positive sample rates of the synthetic datasets were negatively correlated with the detector confidence thresholds as expected, which ranged from 0 to 1. Higher overall positive sample rates were observed for the growth stages with higher detector performance. The synthetic images generally contained fewer detector-detectable instances per image than the corresponding real training images. The best achieved YOLO11n AP improvements in the retrained detectors for tip and half-inch green were 30.13% and 14.02% respectively, while the best achieved YOLO11n mAP improvement was 2.83%. However, the relationship between synthetic training instance quantity and detector class performances had yet to be determined. GAN was concluded to be beneficial in retraining object detectors and improving their performances. Further studies are still in need to investigate the influence of synthetic training data quantity and quality on retrained object detector performance. Full article
Show Figures

Figure 1

17 pages, 4219 KiB  
Article
Optimizing Convolutional Neural Network Architectures with Optimal Activation Functions for Pediatric Pneumonia Diagnosis Using Chest X-Rays
by Petra Radočaj, Dorijan Radočaj and Goran Martinović
Big Data Cogn. Comput. 2025, 9(2), 25; https://doi.org/10.3390/bdcc9020025 - 27 Jan 2025
Cited by 4 | Viewed by 1716
Abstract
Pneumonia remains a significant cause of morbidity and mortality among pediatric patients worldwide. Accurate and timely diagnosis is crucial for effective treatment and improved patient outcomes. Traditionally, pneumonia diagnosis has relied on a combination of clinical evaluation and radiologists’ interpretation of chest X-rays. [...] Read more.
Pneumonia remains a significant cause of morbidity and mortality among pediatric patients worldwide. Accurate and timely diagnosis is crucial for effective treatment and improved patient outcomes. Traditionally, pneumonia diagnosis has relied on a combination of clinical evaluation and radiologists’ interpretation of chest X-rays. However, this process is time-consuming and prone to inconsistencies in diagnosis. The integration of advanced technologies such as Convolutional Neural Networks (CNNs) into medical diagnostics offers a potential to enhance the accuracy and efficiency. In this study, we conduct a comprehensive evaluation of various activation functions within CNNs for pediatric pneumonia classification using a dataset of 5856 chest X-ray images. The novel Mish activation function was compared with Swish and ReLU, demonstrating superior performance in terms of accuracy, precision, recall, and F1-score in all cases. Notably, InceptionResNetV2 combined with Mish activation function achieved the highest overall performance with an accuracy of 97.61%. Although the dataset used may not fully represent the diversity of real-world clinical cases, this research provides valuable insights into the influence of activation functions on CNN performance in medical image analysis, laying a foundation for future automated pneumonia diagnostic systems. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

30 pages, 882 KiB  
Article
Improving Synthetic Data Generation Through Federated Learning in Scarce and Heterogeneous Data Scenarios
by Patricia A. Apellániz, Juan Parras and Santiago Zazo
Big Data Cogn. Comput. 2025, 9(2), 18; https://doi.org/10.3390/bdcc9020018 - 21 Jan 2025
Cited by 4 | Viewed by 2698
Abstract
Synthetic Data Generation (SDG) is a promising solution for healthcare, offering the potential to generate synthetic patient data closely resembling real-world data while preserving privacy. However, data scarcity and heterogeneity, particularly in under-resourced regions, challenge the effective implementation of SDG. This paper addresses [...] Read more.
Synthetic Data Generation (SDG) is a promising solution for healthcare, offering the potential to generate synthetic patient data closely resembling real-world data while preserving privacy. However, data scarcity and heterogeneity, particularly in under-resourced regions, challenge the effective implementation of SDG. This paper addresses these challenges using Federated Learning (FL) for SDG, focusing on sharing synthetic patients across nodes. By leveraging collective knowledge and diverse data distributions, we hypothesize that sharing synthetic data can significantly enhance the quality and representativeness of generated data, particularly for institutions with limited or biased datasets. This approach aligns with meta-learning concepts, like Domain Randomized Search. We compare two FL techniques, FedAvg and Synthetic Data Sharing (SDS), the latter being our proposed contribution. Both approaches are evaluated using variational autoencoders with Bayesian Gaussian mixture models across diverse medical datasets. Our results demonstrate that while both methods improve SDG, SDS consistently outperforms FedAvg, producing higher-quality, more representative synthetic data. Non-IID scenarios reveal that while FedAvg achieves improvements of 13–27% in reducing divergence compared to isolated training, SDS achieves reductions exceeding 50% in the worst-performing nodes. These findings underscore synthetic data sharing potential to reduce disparities between data-rich and data-poor institutions, fostering more equitable healthcare research and innovation. Full article
(This article belongs to the Special Issue Research on Privacy and Data Security)
Show Figures

Figure 1

20 pages, 17747 KiB  
Article
A Secure Learned Image Codec for Authenticity Verification via Self-Destructive Compression
by Chen-Hsiu Huang and Ja-Ling Wu
Big Data Cogn. Comput. 2025, 9(1), 14; https://doi.org/10.3390/bdcc9010014 - 15 Jan 2025
Viewed by 1136
Abstract
In the era of deepfakes and AI-generated content, digital image manipulation poses significant challenges to image authenticity, creating doubts about the credibility of images. Traditional image forensics techniques often struggle to detect sophisticated tampering, and passive detection approaches are reactive, verifying authenticity only [...] Read more.
In the era of deepfakes and AI-generated content, digital image manipulation poses significant challenges to image authenticity, creating doubts about the credibility of images. Traditional image forensics techniques often struggle to detect sophisticated tampering, and passive detection approaches are reactive, verifying authenticity only after counterfeiting occurs. In this paper, we propose a novel full-resolution secure learned image codec (SLIC) designed to proactively prevent image manipulation by creating self-destructive artifacts upon re-compression. Once a sensitive image is encoded using SLIC, any subsequent re-compression or editing attempts will result in visually severe distortions, making the image’s tampering immediately evident. Because the content of an SLIC image is either original or visually damaged after tampering, images encoded with this secure codec hold greater credibility. SLIC leverages adversarial training to fine-tune a learned image codec that introduces out-of-distribution perturbations, ensuring that the first compressed image retains high quality while subsequent re-compressions degrade drastically. We analyze and compare the adversarial effects of various perceptual quality metrics combined with different learned codecs. Our experiments demonstrate that SLIC holds significant promise as a proactive defense strategy against image manipulation, offering a new approach to enhancing image credibility and authenticity in a media landscape increasingly dominated by AI-driven forgeries. Full article
Show Figures

Figure 1

22 pages, 12031 KiB  
Article
Quantum-Cognitive Neural Networks: Assessing Confidence and Uncertainty with Human Decision-Making Simulations
by Milan Maksimovic and Ivan S. Maksymov
Big Data Cogn. Comput. 2025, 9(1), 12; https://doi.org/10.3390/bdcc9010012 - 14 Jan 2025
Cited by 1 | Viewed by 2580
Abstract
Contemporary machine learning (ML) systems excel in recognising and classifying images with remarkable accuracy. However, like many computer software systems, they can fail by generating confusing or erroneous outputs or by deferring to human operators to interpret the results and make final decisions. [...] Read more.
Contemporary machine learning (ML) systems excel in recognising and classifying images with remarkable accuracy. However, like many computer software systems, they can fail by generating confusing or erroneous outputs or by deferring to human operators to interpret the results and make final decisions. In this paper, we employ the recently proposed quantum tunnelling neural networks (QT-NNs) inspired by human brain processes alongside quantum cognition theory to classify image datasets while emulating human perception and judgment. Our findings suggest that the QT-NN model provides compelling evidence of its potential to replicate human-like decision-making. We also reveal that the QT-NN model can be trained up to 50 times faster than its classical counterpart. Full article
Show Figures

Figure 1

18 pages, 2535 KiB  
Article
A Recursive Attribute Reduction Algorithm and Its Application in Predicting the Hot Metal Silicon Content in Blast Furnaces
by Zhanqi Li, Pan Cheng, Linzi Yin and Yuyin Guan
Big Data Cogn. Comput. 2025, 9(1), 6; https://doi.org/10.3390/bdcc9010006 - 3 Jan 2025
Viewed by 830
Abstract
For many complex industrial applications, traditional attribute reduction algorithms are often inefficient in obtaining optimal reducts that align with mechanistic analyses and practical production requirements. To solve this problem, we propose a recursive attribute reduction algorithm that calculates the optimal reduct. First, we [...] Read more.
For many complex industrial applications, traditional attribute reduction algorithms are often inefficient in obtaining optimal reducts that align with mechanistic analyses and practical production requirements. To solve this problem, we propose a recursive attribute reduction algorithm that calculates the optimal reduct. First, we present the notion of priority sequence to describe the background meaning of attributes and evaluate the optimal reduct. Next, we define a necessary element set to identify the “individually necessary” characteristics of the attributes. On this basis, a recursive algorithm is proposed to calculate the optimal reduct. Its boundary logic is guided by the conflict between the necessary element set and the core attribute set. The experiments demonstrate the proposed algorithm’s uniqueness and its ability to enhance the prediction accuracy of the hot metal silicon content in blast furnaces. Full article
Show Figures

Figure 1

26 pages, 1002 KiB  
Article
Training Neural Networks with a Procedure Guided by BNF Grammars
by Ioannis G. Tsoulos  and Vasileios Charilogis
Big Data Cogn. Comput. 2025, 9(1), 5; https://doi.org/10.3390/bdcc9010005 - 2 Jan 2025
Viewed by 892
Abstract
Artificial neural networks are parametric machine learning models that have been applied successfully to an extended series of classification and regression problems found in the recent literature. For the effective identification of the parameters of the artificial neural networks, a series of optimization [...] Read more.
Artificial neural networks are parametric machine learning models that have been applied successfully to an extended series of classification and regression problems found in the recent literature. For the effective identification of the parameters of the artificial neural networks, a series of optimization techniques have been proposed in the relevant literature, which, although they present good results in many cases, either the optimization method used is not efficient and the training error of the network is trapped in sub-optimal values, or the neural network exhibits the phenomenon of overfitting which means that it has poor results when applied to data that was not present during the training. This paper proposes an innovative technique for constructing the weights of artificial neural networks based on appropriate BNF grammars, used in the evolutionary process of Grammatical Evolution. The new procedure locates an interval of values for the parameters of the artificial neural network, and the optimization method effectively locates the network parameters within this interval. The new technique was applied to a wide range of data classification and adaptation problems covering a number of scientific areas and the experimental results were more than promising. Full article
Show Figures

Figure 1

Back to TopTop