Big Data and Cognitive Computing

21 pages, 5123 KiB

Open AccessArticle

Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection

by Khrystyna Lipianina-Honcharenko, Nazar Melnyk, Andriy Ivasechko, Mykola Telka and Oleg Illiashenko

Big Data Cogn. Comput. 2025, 9(4), 109; https://doi.org/10.3390/bdcc9040109 - 21 Apr 2025

Viewed by 1077

Abstract

Deepfake technology poses significant threats in various domains, including politics, cybersecurity, and social media. This study uses the golden frame selection technique to present a neural network ensemble method for deepfake classification. The proposed approach optimizes computational resources by extracting the most informative [...] Read more.

Deepfake technology poses significant threats in various domains, including politics, cybersecurity, and social media. This study uses the golden frame selection technique to present a neural network ensemble method for deepfake classification. The proposed approach optimizes computational resources by extracting the most informative video frames, improving detection accuracy. We integrate multiple deep learning models, including ResNet50, EfficientNetB0, Xception, InceptionV3, and Facenet, with an XGBoost meta-model for enhanced classification performance. Experimental results demonstrate a 91% accuracy rate, outperforming traditional deepfake detection models. Additionally, feature importance analysis using Grad-CAM highlights how different architectures focus on distinct facial regions, enhancing overall model interpretability. The findings contribute to of robust and efficient deepfake detection techniques, with potential applications in digital forensics, media verification, and cybersecurity. Full article

► Show Figures

Figure 1

21 pages, 1529 KiB

Open AccessEditor’s ChoiceArticle

Semantic-Driven Approach for Validation of IoT Streaming Data in Trustable Smart City Decision-Making and Monitoring Systems

by Oluwaseun Bamgboye, Xiaodong Liu, Peter Cruickshank and Qi Liu

Big Data Cogn. Comput. 2025, 9(4), 108; https://doi.org/10.3390/bdcc9040108 - 21 Apr 2025

Viewed by 511

Abstract

Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. [...] Read more.

Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. This paper describes a semantic IoT streaming data validation approach to provide a semantic IoT data model and process IoT streaming data with the semantic stream processing systems to check the quality requirements of IoT streams. The proposed approach enhances the understanding of smart city data while supporting real-time, data-driven decision-making and monitoring processes. A publicly available sensor dataset collected from a busy road in Milan city is constructed, annotated and semantically processed by the proposed approach and its architecture. The architecture, built on a robust semantic-based system, incorporates a reasoning technique based on forward rules, which is integrated within the semantic stream query processing system. It employs serialized Resource Description Framework (RDF) data formats to enhance stream expressiveness and enables the real-time validation of missing and inconsistent data streams within continuous sliding-window operations. The effectiveness of the approach is validated by deploying multiple RDF stream instances to the architecture before evaluating its accuracy and performance (in terms of reasoning time). The approach underscores the capability of semantic technology in sustaining the validation of IoT streaming data by accurately identifying up to 99% of inconsistent and incomplete streams in each streaming window. Also, it can maintain the performance of the semantic reasoning process in near real time. The approach provides an enhancement to data quality and credibility, capable of providing near-real-time decision support mechanisms for critical smart city applications, and facilitates accurate situational awareness across both the application and operational levels of the smart city. Full article

(This article belongs to the Special Issue Industrial Applications of IoT and Blockchain for Sustainable Environment)

► Show Figures

Figure 1

14 pages, 2950 KiB

Open AccessArticle

3D Urban Digital Twinning on the Web with Low-Cost Technology: 3D Geospatial Data and IoT Integration for Wellness Monitoring

by Marcello La Guardia

Big Data Cogn. Comput. 2025, 9(4), 107; https://doi.org/10.3390/bdcc9040107 - 21 Apr 2025

Viewed by 973

Abstract

Recent advances in computer science and geomatics have enabled the digitalization of complex two-dimensional and three-dimensional spatial environments and the sharing of geospatial data on the web. Simultaneously, the widespread adoption of Internet of Things (IoT) technology has facilitated the rapid deployment of [...] Read more.

Recent advances in computer science and geomatics have enabled the digitalization of complex two-dimensional and three-dimensional spatial environments and the sharing of geospatial data on the web. Simultaneously, the widespread adoption of Internet of Things (IoT) technology has facilitated the rapid deployment of low-cost sensor networks in various scientific applications. The integration of real-time IoT data acquisition in 3D urban environments lays the foundation for the development of Urban Digital Twins. This work proposes a possible low-cost solution as a sample of a structure for 3D digital twinning on the web, presenting a case study related to weather monitoring analysis. Specifically, an indoor-outdoor environmental conditions monitoring system integrated with 3D geospatial data on a 3D WebGIS platform was developed. This solution can be considered as a first step for monitoring human and environmental wellness within a geospatial analysis system that integrates several open-source modules that provide different kinds of information (geospatial data, 3D models, and IoT acquisition). The structure of this system can be valuable for municipalities and private stakeholders seeking to conduct environmental geospatial analysis using cost-effective solutions. Full article

(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)

► Show Figures

Figure 1

23 pages, 2189 KiB

Open AccessEditor’s ChoiceArticle

From Rating Predictions to Reliable Recommendations in Collaborative Filtering: The Concept of Recommendation Reliability Classes

by Dionisis Margaris, Costas Vassilakis and Dimitris Spiliotopoulos

Big Data Cogn. Comput. 2025, 9(4), 106; https://doi.org/10.3390/bdcc9040106 - 17 Apr 2025

Viewed by 544

Abstract

Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to [...] Read more.

Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to them. Collaborative filtering is a popular recommender system technique which generates rating prediction scores by blending the ratings that users with similar preferences have previously given to these products. However, predictions may entail errors, which will either lead to recommending products that the users would not accept or failing to recommend products that the users would actually accept. The first case is considered much more critical, since the recommender system will lose a significant amount of reliability and consequently interest. In this paper, after performing a study on rating prediction confidence factors in collaborative filtering, (a) we introduce the concept of prediction reliability classes, (b) we rank these classes in relation to the utility of the rating predictions belonging to each class, and (c) we present a collaborative filtering recommendation algorithm which exploits these reliability classes for prediction formulation. The efficacy of the presented algorithm is evaluated through an extensive multi-parameter evaluation process, which demonstrates that it significantly enhances recommendation quality. Full article

► Show Figures

Figure 1

31 pages, 14157 KiB

Open AccessEditor’s ChoiceArticle

Assessing the Impact of Temperature and Precipitation Trends of Climate Change on Agriculture Based on Multiple Global Circulation Model Projections in Malta

by Benjamin Mifsud Scicluna and Charles Galdies

Big Data Cogn. Comput. 2025, 9(4), 105; https://doi.org/10.3390/bdcc9040105 - 17 Apr 2025

Viewed by 1077

Abstract

The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison [...] Read more.

The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison project phase 5 (CMIP5) models under two Representative Concentration pathways (RCPs). Through statistical and spatial analysis, the study demonstrates that climate change will have significant adverse effects on Maltese agriculture. Regardless of the RCP scenario considered, projections indicate a substantial increase in temperature and a decline in precipitation, exacerbating aridity and intensifying heat stress. These changes are expected to reduce soil moisture availability and challenge traditional agricultural practices. The study identifies the Western District as a relatively more favourable area for crop cultivation due to its comparatively lower temperatures, whereas the Northern and South Eastern peripheries are projected to experience more severe heat stress. Adaptation strategies, including the selection of heat-tolerant crop varieties such as Tetyda and Finezja, optimised water management techniques, and intercropping practices, are proposed to enhance agricultural resilience. This study is among the few comprehensive assessments of bioclimatic and physical factors affecting Maltese agriculture and highlights the urgent need for targeted adaptation measures to safeguard food production in the region. Full article

► Show Figures

Figure 1

16 pages, 252 KiB

Open AccessReview

Reimagining Robots: The Future of Cybernetic Organisms with Energy-Efficient Designs

by Stefan Stavrev

Big Data Cogn. Comput. 2025, 9(4), 104; https://doi.org/10.3390/bdcc9040104 - 17 Apr 2025

Viewed by 809

Abstract

The development of cybernetic organisms—autonomous systems capable of self-regulation and dynamic environmental interaction—requires innovations in both energy efficiency and computational adaptability. This study explores the integration of bio-inspired liquid flow batteries and neuromorphic computing architectures to enable real-time learning and power optimization in [...] Read more.

The development of cybernetic organisms—autonomous systems capable of self-regulation and dynamic environmental interaction—requires innovations in both energy efficiency and computational adaptability. This study explores the integration of bio-inspired liquid flow batteries and neuromorphic computing architectures to enable real-time learning and power optimization in autonomous robotic systems. Liquid-based energy storage systems, modeled after vascular networks, offer distributed energy management, reducing power bottlenecks and improving resilience in long-duration operations. Complementing this, neuromorphic computing architectures, including memristor-based processors and spiking neural networks (SNNs), enhance computational efficiency while minimizing energy consumption. By integrating these adaptive energy and computing systems, robots can dynamically allocate power and processing resources based on real-time demands, bridging the gap between biological and artificial intelligence. This study evaluates the feasibility of integrating these technologies into robotic platforms, assessing power demands, storage capacity, and operational scalability. While flow batteries and neuromorphic computing show promise in reducing latency and energy constraints, challenges remain in electrolyte stability, computational framework standardization, and real-world implementation. Future research must focus on hybrid computing architectures, self-regulating energy distribution, and material optimizations to enhance the adaptability of cybernetic organisms. By addressing these challenges, this study outlines a roadmap for reimagining robotics through cybernetic principles, paving the way for applications in healthcare, industrial automation, space exploration, and adaptive autonomous systems in dynamic environments. Full article

17 pages, 546 KiB

Open AccessArticle

Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments

by Jamolbek Mattiev, Ulugbek Salaev and Branko Kavšek

Big Data Cogn. Comput. 2025, 9(4), 103; https://doi.org/10.3390/bdcc9040103 - 17 Apr 2025

Viewed by 569

Abstract

Word games are of great importance in the acquisition of vocabulary and letter recognition among children, usually between the ages of 3 and 13, boosting their memory, word retention, spelling, and cognition. Despite the importance of these games, little attention has been paid [...] Read more.

Word games are of great importance in the acquisition of vocabulary and letter recognition among children, usually between the ages of 3 and 13, boosting their memory, word retention, spelling, and cognition. Despite the importance of these games, little attention has been paid to the development of word games for low-resource or highly morphologically constructed languages. This study develops an Advanced Cubic-oriented Game (ACG) model by using a character-level N-gram technique and statistics, commonly known as the matching letter game, wherein a player forms words using a given number of cubes with letters on each of its sides. The main objective of this study is to find out the optimal number of letter cubes while maintaining the overall coverage. Comprehensive experiments on 12 datasets (from low-resource and high-resource languages) incorporating morphological features were conducted to form 3–5-letter words using 7–8 cubes and a special case of forming 6–7-letter words using 8–9 cubes. Experimental evaluations show that the ACG model achieved reasonably high results in terms of average total coverage, with 89.5% for 3–5-letter words using eight cubes and 79.7% for 6–7-letter words using nine cubes over 12 datasets. The ACG model obtained over 90% coverage for Uzbek, Turkish, English, Slovenian, Spanish, French, and Malaysian when constructing 3–5-letter words using eight cubes. Full article

► Show Figures

Figure 1

21 pages, 4512 KiB

Open AccessArticle

Efficient Trajectory Prediction Using Check-In Patterns in Location-Based Social Network

by Eman M. Bahgat, Alshaimaa Abo-alian, Sherine Rady and Tarek F. Gharib

Big Data Cogn. Comput. 2025, 9(4), 102; https://doi.org/10.3390/bdcc9040102 - 17 Apr 2025

Cited by 1 | Viewed by 641

Abstract

Location-based social networks (LBSNs) leverage geo-location technologies to connect users with places, events, and other users nearby. Using GPS data, platforms like Foursquare enable users to check into locations, share their locations, and receive location-based recommendations. A significant research gap in LBSNs lies [...] Read more.

Location-based social networks (LBSNs) leverage geo-location technologies to connect users with places, events, and other users nearby. Using GPS data, platforms like Foursquare enable users to check into locations, share their locations, and receive location-based recommendations. A significant research gap in LBSNs lies in the limited exploration of users’ tendencies to withhold certain location data. While existing studies primarily focus on the locations users choose to disclose and the activities they attend, there is a lack of research on the hidden or intentionally omitted locations. Understanding these concealed patterns and integrating them into predictive models could enhance the accuracy and depth of location prediction, offering a more comprehensive view of user mobility behavior. This paper solves this gap by proposing an Associative Hidden Location Trajectory Prediction model (AHLTP) that leverages user trajectories to infer unchecked locations. The FP-growth mining technique is used in AHLTP to extract frequent patterns of check-in locations, combined with machine-learning methods such as K-nearest-neighbor, gradient-boosted-trees, and deep learning to classify hidden locations. Moreover, AHLTP uses association rule mining to derive the frequency of successive check-in pairs for the purpose of hidden location prediction. The proposed AHLTP integrated with the machine-learning models classifies the data effectively, with the KNN attaining the highest accuracy at 98%, followed by gradient-boosted trees at 96% and deep learning at 92%. Comparative study using a real-world dataset demonstrates the model’s superior accuracy compared to state-of-the-art approaches. Full article

(This article belongs to the Special Issue Research Progress in Artificial Intelligence and Social Network Analysis)

► Show Figures

Figure 1

21 pages, 7637 KiB

Open AccessArticle

Analysis of China’s High-Speed Railway Network Using Complex Network Theory and Graph Convolutional Networks

by Zhenguo Xu, Jun Li, Irene Moulitsas and Fangqu Niu

Big Data Cogn. Comput. 2025, 9(4), 101; https://doi.org/10.3390/bdcc9040101 - 16 Apr 2025

Viewed by 838

Abstract

This study investigated the characteristics and functionalities of China’s High-Speed Railway (HSR) network based on Complex Network Theory (CNT) and Graph Convolutional Networks (GCN). First, complex network analysis was applied to provide insights into the network’s fundamental characteristics, such as small-world properties, efficiency, [...] Read more.

This study investigated the characteristics and functionalities of China’s High-Speed Railway (HSR) network based on Complex Network Theory (CNT) and Graph Convolutional Networks (GCN). First, complex network analysis was applied to provide insights into the network’s fundamental characteristics, such as small-world properties, efficiency, and robustness. Then, this research developed three novel GCN models to identify key nodes, detect community structures, and predict new links. Findings from the complex network analysis revealed that China’s HSR network exhibits a typical small-world property, with a degree distribution that follows a log-normal pattern rather than a power law. The global efficiency indicator suggested that stations are typically connected through direct routes, while the local efficiency indicator showed that the network performs effectively within local areas. The robustness study indicated that the network can quickly lose connectivity if key nodes fail, though it showed an ability initially to self-regulate and has partially restored its structure after disruption. The GCN model for key node identification revealed that the key nodes in the network were predominantly located in economically significant and densely populated cities, positively contributing to the network’s overall efficiency and robustness. The community structures identified by the integrated GCN model highlight the economic and social connections between official urban clusters and the communities. Results from the link prediction model suggest the necessity of improving the long-distance connectivity across regions. Future work will explore the network’s socio-economic dynamics and refine and generalise the GCN models. Full article

► Show Figures

Figure 1

15 pages, 3852 KiB

Open AccessArticle

Subjective Assessment of a Built Environment by ChatGPT, Gemini and Grok: Comparison with Architecture, Engineering and Construction Expert Perception

by Rachid Belaroussi

Big Data Cogn. Comput. 2025, 9(4), 100; https://doi.org/10.3390/bdcc9040100 - 14 Apr 2025

Cited by 3 | Viewed by 1344

Abstract

The emergence of Multimodal Large Language Models (MLLMs) has made methods of artificial intelligence accessible to the general public in a conversational way. It offers tools for the automated visual assessment of the quality of a built environment for professionals of urban planning [...] Read more.

The emergence of Multimodal Large Language Models (MLLMs) has made methods of artificial intelligence accessible to the general public in a conversational way. It offers tools for the automated visual assessment of the quality of a built environment for professionals of urban planning without requiring specific technical knowledge on computing. We investigated the capability of MLLMs to perceive urban environments based on images and textual prompts. We compared the outputs of several popular models—ChatGPT, Gemini and Grok—to the visual assessment of experts in Architecture, Engineering and Construction (AEC) in the context of a real estate construction project. Our analysis was based on subjective attributes proposed to characterize various aspects of a built environment. Four urban identities served as case studies, set in a virtual environment designed using professional 3D models. We found that there can be an alignment between human and AI evaluation on some aspects such as space and scale and architectural style, and more general accordance in environments with vegetation. However, there were noticeable differences in response patterns between the AIs and AEC experts, particularly concerning subjective aspects such as the general emotional resonance of specific urban identities. It raises questions regarding the hallucinations of generative AI where the AI invents information and behaves creatively but its outputs are not accurate. Full article

(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)

► Show Figures

Figure 1

18 pages, 3526 KiB

Open AccessArticle

Predicting College Enrollment for Low-Socioeconomic-Status Students Using Machine Learning Approaches

by Surina He, Mehrdad Yousefpoori-Naeim, Ying Cui and Maria Cutumisu

Big Data Cogn. Comput. 2025, 9(4), 99; https://doi.org/10.3390/bdcc9040099 - 12 Apr 2025

Viewed by 814

Abstract

College enrollment has long been recognized as a critical pathway to better employment prospects and improved economic outcomes. However, the overall enrollment rates have declined in recent years, and students with a lower socioeconomic status (SES) or those from disadvantaged backgrounds remain significantly [...] Read more.

College enrollment has long been recognized as a critical pathway to better employment prospects and improved economic outcomes. However, the overall enrollment rates have declined in recent years, and students with a lower socioeconomic status (SES) or those from disadvantaged backgrounds remain significantly underrepresented in higher education. To investigate the factors influencing college enrollment among low-SES high school students, this study analyzed data from the High School Longitudinal Study of 2009 (HSLS:09) using five widely used machine learning algorithms. The sample included 5223 ninth-grade students from lower socioeconomic backgrounds (51% female; M_age = 14.59) whose biological parents or stepparents completed a parental questionnaire. The results showed that, among all five classifiers, the random forest algorithm achieved the highest classification accuracy at 67.73%. Additionally, the top three predictors of enrollment in 2-year or 4-year colleges were students’ overall high school GPA, parental educational expectations, and the number of close friends planning to attend a 4-year college. Conversely, the most important predictors of non-enrollment were high school GPA, parental educational expectations, and the number of close friends who had dropped out of high school. These findings advance our understanding of the factors shaping college enrollment for low-SES students and highlight two important factors for intervention: improving students’ academic performance and fostering future-oriented goals among their peers and parents. Full article

► Show Figures

Figure 1

20 pages, 4839 KiB

Open AccessArticle

An Enhanced Genetic Algorithm for Optimized Educational Assessment Test Generation Through Population Variation

by Doru-Anastasiu Popescu

Big Data Cogn. Comput. 2025, 9(4), 98; https://doi.org/10.3390/bdcc9040098 - 11 Apr 2025

Viewed by 798

Abstract

The most important aspect of a genetic algorithm (GA) lies in the optimal solution found. The result obtained by a genetic algorithm can be evaluated according to the quality of this solution. It is important that this solution is optimal or close to [...] Read more.

The most important aspect of a genetic algorithm (GA) lies in the optimal solution found. The result obtained by a genetic algorithm can be evaluated according to the quality of this solution. It is important that this solution is optimal or close to optimal in relation to the defined performance criteria, usually the fitness value. This study addresses the problem of automated generation of assessment tests in education. In this paper, we present the design of a model of assessment test generation used in education using genetic algorithms. The assessment covers a series of courses taught over a period of time. The genetic algorithm presents an improvement or development, which consists of the initial population variation, obtained by the selection of a large fixed number of individuals from various populations, which are ordered by the fitness value using merge sort, chosen for the reason of the high number of individuals. The initial population variation can be seen as a specific modality for increasing the diversity and number of the initial population of a genetic algorithm, which influences the algorithm performance. This process increases the diversity and quality of the initial population, improving the algorithm’s overall performance. The development/novelty brought about by this paper is related to its application to a specific issue (educational assessment test generation) and the specific methodology used for population variation. This development can be applied for large sets of individuals, the variety, and the large number of generated individuals leading to higher odds to increase the performance of the algorithm. Experimental results demonstrate that the proposed method outperforms traditional GA implementations in terms of solution quality and convergence speed, showing its effectiveness for large-scale test generation tasks. Full article

► Show Figures

Figure 1

15 pages, 3257 KiB

Open AccessArticle

Deep Learning for Early Skin Cancer Detection: Combining Segmentation, Augmentation, and Transfer Learning

by Ravi Karki, Shishant G C, Javad Rezazadeh and Ammara Khan

Big Data Cogn. Comput. 2025, 9(4), 97; https://doi.org/10.3390/bdcc9040097 - 11 Apr 2025

Viewed by 816

Abstract

Skin cancer, particularly melanoma, is one of the leading causes of cancer-related deaths. It is essential to detect and start the treatment in the early stages for it to be effective and to improve survival rates. This study developed and evaluated a deep [...] Read more.

Skin cancer, particularly melanoma, is one of the leading causes of cancer-related deaths. It is essential to detect and start the treatment in the early stages for it to be effective and to improve survival rates. This study developed and evaluated a deep learning-based classification model to classify the skin lesion images as benign (non-cancerous) and malignant (cancerous). In this study, we used the ISIC 2016 dataset to train the segmentation model and the Kaggle dataset of 10,000 images to train the classification model. We applied different data pre-processing techniques to enhance the robustness of our model. We used the segmentation model to generate a binary segmentation mask and used it with the corresponding pre-processed image by overlaying its edges to highlight the lesion region, before feeding it to the classification model. We used transfer learning, using ResNet-50 as a backbone model for a feedforward network. We achieved an accuracy of 92.80%, a precision of 98.64%, and a recall of 86.80%. From our study, we have found that integrating deep learning techniques with proper data pre-processing improves the model’s performance. Future work will focus on expanding the datasets and testing more architectures to improve the performance metrics of the model. Full article

► Show Figures

Figure 1

16 pages, 7105 KiB

Open AccessArticle

A Self-Attention CycleGAN for Unsupervised Image Hazing

by Hongyin Ni and Wanshan Su

Big Data Cogn. Comput. 2025, 9(4), 96; https://doi.org/10.3390/bdcc9040096 - 11 Apr 2025

Viewed by 697

Abstract

The high cost and difficulty of collecting real-world foggy scene images mean that automatic driving datasets produce limited images in bad weather and lead to deficient training in automatic driving systems, causing unsafe judgments and leading to traffic accidents. Therefore, to effectively promote [...] Read more.

The high cost and difficulty of collecting real-world foggy scene images mean that automatic driving datasets produce limited images in bad weather and lead to deficient training in automatic driving systems, causing unsafe judgments and leading to traffic accidents. Therefore, to effectively promote the safety and robustness of an autonomous driving system, we improved the CycleGAN model to achieve dataset augmentation of foggy images. Firstly, by combining the self-attention mechanism and the residual network architecture, the sense of hierarchy of the fog effect in the synthesized image was significantly refined. Then, LPIPS was employed to adjust the calculation method for cycle consistency loss to make the synthetic picture more similar to the original one in terms of perception. The experimental results showed that the FID index of the foggy image generated by the improved CycleGAN network was reduced by 3.34, the IS index increased by 15.8%, and the SSIM index increased by 0.1%. The modified method enhances the generation of foggy images, while retaining more details of the original image and reducing content distortion. Full article

(This article belongs to the Special Issue Recent Advances in Machine Learning Methods for Imperfect Large-Scale Data)

► Show Figures

Figure 1

20 pages, 3258 KiB

Open AccessArticle

Bayesian Deep Neural Networks with Agnostophilic Approaches

by Sarah McDougall, Sarah Rauchas and Vahid Rafe

Big Data Cogn. Comput. 2025, 9(4), 95; https://doi.org/10.3390/bdcc9040095 - 9 Apr 2025

Viewed by 921

Abstract

A vital area of AI is the ability of a model to recognise the limits of its knowledge and flag when presented with something unclassifiable instead of making incorrect predictions. It has often been claimed that probabilistic networks, particularly Bayesian neural networks, are [...] Read more.

A vital area of AI is the ability of a model to recognise the limits of its knowledge and flag when presented with something unclassifiable instead of making incorrect predictions. It has often been claimed that probabilistic networks, particularly Bayesian neural networks, are unsuited to this problem due to unknown data, meaning that the denominator in Bayes’ equation would be incalculable. This study challenges this view, approaching the task as a blended problem, by considering unknowns to be highly corrupted data, and creating adequate working spaces and generalizations. The core of this method lies in structuring the network in such a manner as to target the high and low confidence levels of the predictions. Instead of simply adjusting for low confidence, developing a consistent gap in the confidence in class predictions between known image types and unseen, unclassifiable data new datapoints can be accurately identified and unknown inputs flagged accordingly through averaged thresholding. In this way, the model is also self-reflecting, using the uncertainties for all data rather than just the unknown subsections in order to determine the limits of its knowledge. The results show that these models are capable of strong performance on a variety of image datasets, with levels of accuracy, recall, and prediction gap consistency across a range of openness levels similar to those achieved using traditional methods. Full article

► Show Figures

Figure 1

26 pages, 6942 KiB

Open AccessArticle

AI-Powered Trade Forecasting: A Data-Driven Approach to Saudi Arabia’s Non-Oil Exports

by Musab Aloudah, Mahdi Alajmi, Alaa Sagheer, Abdulelah Algosaibi, Badr Almarri and Eid Albelwi

Big Data Cogn. Comput. 2025, 9(4), 94; https://doi.org/10.3390/bdcc9040094 - 9 Apr 2025

Viewed by 968

Abstract

This paper investigates the application of artificial intelligence (AI) in forecasting Saudi Arabia’s non-oil export trajectories, contributing to the Kingdom’s Vision 2030 objectives for economic diversification. A suite of machine learning models, including LSTM, Transformer variants, Ensemble Stacking, XGBRegressor, and Random Forest, was [...] Read more.

This paper investigates the application of artificial intelligence (AI) in forecasting Saudi Arabia’s non-oil export trajectories, contributing to the Kingdom’s Vision 2030 objectives for economic diversification. A suite of machine learning models, including LSTM, Transformer variants, Ensemble Stacking, XGBRegressor, and Random Forest, was applied to historical export and GDP data. Among them, the Advanced Transformer model, configured with an increased attention head size, achieved the highest accuracy (MAPE: 0.73%), effectively capturing complex temporal dependencies. The Non-Linear Blending Ensemble, integrating Random Forest, XGBRegressor, and AdaBoost, also performed robustly (MAPE: 1.23%), demonstrating the benefit of leveraging heterogeneous learners. While the Temporal Fusion Transformer (TFT) provided a useful macroeconomic context through GDP integration, its relatively higher error (MAPE: 5.48%) highlighted the challenges of incorporating aggregate indicators into forecasting pipelines. Explainable AI tools, including SHAP analysis and Partial Dependence Plots (PDPs), revealed that recent export lags (lag1, lag2, lag3, and lag10) were the most influential features, offering critical transparency into model behavior. These findings reinforce the promise of interpretable AI-powered forecasting frameworks in delivering actionable, data-informed insights to support strategic economic planning. Full article

(This article belongs to the Special Issue Industrial Data Mining and Machine Learning Applications)

► Show Figures

Figure 1

21 pages, 322 KiB

Open AccessReview

A Comparison of Data Quality Frameworks: A Review

by Russell Miller, Sai Hin Matthew Chan, Harvey Whelan and João Gregório

Big Data Cogn. Comput. 2025, 9(4), 93; https://doi.org/10.3390/bdcc9040093 - 9 Apr 2025

Cited by 2 | Viewed by 3057

Abstract

This study reviews various data quality frameworks that have some form of regulatory backing. The aim is to identify how these frameworks define, measure, and apply data quality dimensions. This review identified generalisable frameworks, such as TDQM, ISO 8000, and ISO 25012, and [...] Read more.

This study reviews various data quality frameworks that have some form of regulatory backing. The aim is to identify how these frameworks define, measure, and apply data quality dimensions. This review identified generalisable frameworks, such as TDQM, ISO 8000, and ISO 25012, and specialised frameworks, such as IMF’s DQAF, BCBS 239, WHO’s DQA, and ALCOA+. A standardised data quality model was employed to map the dimensions of the data from each framework to a common vocabulary. This mapping enabled a gap analysis that highlights the presence or absence of specific data quality dimensions across the examined frameworks. The analysis revealed that core data quality dimensions such as “accuracy”, “completeness”, “consistency”, and “timeliness” are equally and well represented across all frameworks. In contrast, dimensions such as “semantics” and “quantity” were found to be overlooked by most frameworks, despite their growing impact for data practitioners as tools such as knowledge graphs become more common. Frameworks tailored to specific domains were also found to include fewer overall data quality dimensions but contained dimensions that were absent from more general frameworks, highlighting the need for a standardised approach that incorporates both established and emerging data quality dimensions. This work condenses information on commonly used and regulation-backed data quality frameworks, allowing practitioners to develop tools and applications to apply these frameworks that are compliant with standards and regulations. The bibliometric analysis from this review emphasises the importance of adopting a comprehensive quality framework to enhance governance, ensure regulatory compliance, and improve decision-making processes in data-rich environments. Full article

► Show Figures

Figure 1

32 pages, 45289 KiB

Open AccessArticle

CME-YOLO: A Cross-Modal Enhanced YOLO Algorithm for Adverse Weather Object Detection in Autonomous Driving

by Yifei Yuan, Yingmei Wei, Yanming Guo, Jiangming Chen and Tingshuai Jiang

Big Data Cogn. Comput. 2025, 9(4), 92; https://doi.org/10.3390/bdcc9040092 - 9 Apr 2025

Viewed by 1256

Abstract

In open and dynamic environments, object detection is affected by rain, fog, snow, and complex lighting conditions, leading to decreased accuracy and posing a threat to driving safety. Infrared images can provide clear images at nighttime or in adverse weather conditions. Combined with [...] Read more.

In open and dynamic environments, object detection is affected by rain, fog, snow, and complex lighting conditions, leading to decreased accuracy and posing a threat to driving safety. Infrared images can provide clear images at nighttime or in adverse weather conditions. Combined with the mature development of existing cross-modality object detection technologies, both of them offer support for addressing object detection issues in adverse weather scenarios. This paper establishes a novel dataset named Adverse Weather and Illumination Dataset (AWID) to simulate intricate real-world scenarios and proposes a cross-modal object detection algorithm for adverse weather scenarios in autonomous driving, named CME-YOLO, which is based on RGB and infrared images. It integrates the Cross-Perception Transformer Fusion algorithm, CPTFusion, and the Adaptive upsampling technique, AdSample, to enhance the extraction of detailed information and supplement effective information. CPTFusion fuses features from different modalities through multi-scale feature extraction and optimal fusion strategy computation. AdSample adaptively improves the utilization of key features and the quality of the resulting feature tensor. Experiments on two public datasets and AWID show that CME-YOLO performs optimally, with an mAP50 value on the FLIR dataset 6.8% higher than the state-of-the-art MPFT algorithm, verifying its excellent performance in autonomous driving object detection tasks. Full article

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

► Show Figures

Figure 1

16 pages, 736 KiB

Open AccessArticle

Examining the User Engagement on Mind-Sport Online Games: A Social Cognitive Theory and Word-of-Mouth Based Model Proposal

by Manuela Linares, M. Dolores Gallego and Salvador Bueno

Big Data Cogn. Comput. 2025, 9(4), 91; https://doi.org/10.3390/bdcc9040091 - 9 Apr 2025

Viewed by 682

Abstract

Online gamers have increased exponentially in the last few years in all types of online games, including mind-sport games. These games, like Bridge or Chess, have been traditionally played face-to-face. Nowadays more and more players prefer to use online platforms to play mind-sport [...] Read more.

Online gamers have increased exponentially in the last few years in all types of online games, including mind-sport games. These games, like Bridge or Chess, have been traditionally played face-to-face. Nowadays more and more players prefer to use online platforms to play mind-sport games. Previous studies have investigated different aspects of online games and even a few on mind-sport games. However, the frameworks WOM (Word-of-Mouth) and SCT (Social Cognitive Theory) have been sparsely used in this context. In this manner, the present article proposes two objectives: (1) using the SCT in order to analyse the impact of the sociological factor on user engagement in mind-sport online games and (2) analysing how the WOM affects user engagement in mind-sport online games. Specifically, the proposed PLS-SEM model is defined by combining five constructs from these frameworks: (1) health consciousness, (2) WOM and emotional behaviour, (3) self-efficacy, (4) cognitive engagement, and (5) behavioural intention. The findings reveal that health consciousness affects WOM and emotional behaviour in a positive way as players desire well-being. Also, WOM and emotional behaviour affect cognitive engagement, as positive comments encourage high-skill gamers in mind sports. Finally, this study shows how the environmental factor of SCT is represented by WOM and emotional behaviour in an indirect way and the personal factor represented by self-efficacy in a direct way to positively influence behaviour intention. Full article

► Show Figures

Figure 1

20 pages, 3787 KiB

Open AccessArticle

Joint Optimization of Route and Speed for Methanol Dual-Fuel Powered Ships Based on Improved Genetic Algorithm

by Zhao Li, Hao Zhang, Jinfeng Zhang and Bo Wu

Big Data Cogn. Comput. 2025, 9(4), 90; https://doi.org/10.3390/bdcc9040090 - 8 Apr 2025

Viewed by 707

Abstract

Effective route and speed decision-making can significantly reduce vessel operating costs and emissions. However, existing optimization methods developed for conventional fuel-powered vessels are inadequate for application to methanol dual-fuel ships, which represent a new energy vessel type. To address this gap, this study [...] Read more.

Effective route and speed decision-making can significantly reduce vessel operating costs and emissions. However, existing optimization methods developed for conventional fuel-powered vessels are inadequate for application to methanol dual-fuel ships, which represent a new energy vessel type. To address this gap, this study investigates the operational characteristics of methanol dual-fuel liners and develops a mixed-integer nonlinear programming (MINLP) model aimed at minimizing operating costs. Furthermore, an improved genetic algorithm (GA) integrated with the Nonlinear Programming Branch-and-Bound (NLP-BB) method is proposed to solve the model. The case study results demonstrate that the proposed approach can reduce operating costs by more than 15% compared to conventional route and speed strategies while also effectively decreasing emissions of CO₂, NOx, SOx, PM, and CO. Additionally, comparative experiments reveal that the designed algorithm outperforms both the GA and the Linear Interactive and General Optimizer (LINGO) solver for identifying optimal route and speed solutions. This research provides critical insights into the operational dynamics of methanol dual-fuel vessels, demonstrating that traditional route and speed optimization strategies for conventional fuel vessels are not directly applicable. This study provides critical insights into the optimization of voyage decision-making for methanol dual-fuel vessels, demonstrating that traditional route and speed optimization strategies designed for conventional fuel vessels are not directly applicable. It further elucidates the impact of methanol fuel tank capacity on voyage planning, revealing that larger tank capacities offer greater operational flexibility and improved economic performance. These findings provide valuable guidance for shipping companies in strategically planning methanol dual-fuel operations, enhancing economic efficiency while reducing vessel emissions. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Traffic Management)

► Show Figures

Figure 1

20 pages, 3921 KiB

Open AccessArticle

Quinary Classification of Human Gait Phases Using Machine Learning: Investigating the Potential of Different Training Methods and Scaling Techniques

by Amal Mekni, Jyotindra Narayan and Hassène Gritli

Big Data Cogn. Comput. 2025, 9(4), 89; https://doi.org/10.3390/bdcc9040089 - 7 Apr 2025

Cited by 1 | Viewed by 587

Abstract

Walking is a fundamental human activity, and analyzing its complexities is essential for understanding gait abnormalities and musculoskeletal disorders. This article delves into the classification of gait phases using advanced machine learning techniques, specifically focusing on dividing these phases into five distinct subphases. [...] Read more.

Walking is a fundamental human activity, and analyzing its complexities is essential for understanding gait abnormalities and musculoskeletal disorders. This article delves into the classification of gait phases using advanced machine learning techniques, specifically focusing on dividing these phases into five distinct subphases. The study utilizes data from 100 individuals obtained from an open-access platform and employs two distinct training methodologies. The first approach adopts stratified random sampling, where 80% of the data from each subphase are allocated for training and 20% for testing. The second approach involves participant-based splitting, training on data from 80% of the individuals and testing on the remaining 20%. Preprocessing methods such as Min–Max Scaling (MMS), Standard Scaling (SS), and Principal Component Analysis (PCA) were applied to the dataset to ensure optimal performance of the machine learning models. Several algorithms were implemented, including k-Nearest Neighbors (k-NNs), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (Gaussian, Bernoulli, and Multinomial) (NB), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA). The models were rigorously evaluated using performance metrics like cross-validation score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), accuracy, and

R^{2}

score, offering a comprehensive assessment of their effectiveness in classifying gait phases. In the five subphases analysis, RF again performed strongly with a 94.95% accuracy, an RMSE of 0.4461, and an

R^{2}

score of 90.09%, demonstrating robust performance across all scaling methods. Full article

(This article belongs to the Special Issue Deep Learning-Based Pose Estimation: Applications in Vision, Robotics, and Beyond)

► Show Figures

Figure 1

16 pages, 8075 KiB

Open AccessArticle

Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment

by Tao Hu, Xiao Huang, Yun Li and Xiaokang Fu

Big Data Cogn. Comput. 2025, 9(4), 88; https://doi.org/10.3390/bdcc9040088 - 5 Apr 2025

Viewed by 546

Abstract

Media platforms provide an effective way to gauge public perceptions, especially during mass disruption events. This research explores public responses to the 2023 Ohio train derailment event through Twitter, currently known as X, and Google Trends. It aims to unveil public sentiments and [...] Read more.

Media platforms provide an effective way to gauge public perceptions, especially during mass disruption events. This research explores public responses to the 2023 Ohio train derailment event through Twitter, currently known as X, and Google Trends. It aims to unveil public sentiments and attitudes by employing sentiment analysis using the Valence Aware Dictionary and Sentiment Reasoner (VADER) and topic modeling using Latent Dirichlet Allocation (LDA) on geotagged tweets across three phases of the event: impact and immediate response, investigation, and recovery. Additionally, the Self-Organizing Map (SOM) model is employed to conduct time-series clustering analysis of Google search patterns, offering a deeper understanding into the event’s spatial and temporal impact on society. The results reveal that public perceptions related to pollution in communities exhibited an inverted U-shaped curve during the initial two phases on both the Twitter and Google Search platforms. However, in the third phase, the trends diverged. While public awareness declined on Google Search, it experienced an uptick on Twitter, a shift that can be attributed to governmental responses. Furthermore, the topics of Twitter discussions underwent a transition across three phases, changing from a focus on the causes of fires and evacuation strategies in Phase 1, to river pollution and trusteeship issues in Phase 2, and finally converging on government actions and community safety in Phase 3. Overall, this study advances a multi-platform and multi-method framework to uncover the spatiotemporal dynamics of public perception during disasters, offering actionable insights for real-time, region-specific crisis management. Full article

(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)

► Show Figures

Figure 1

20 pages, 4739 KiB

Open AccessPerspective

LLM Fine-Tuning: Concepts, Opportunities, and Challenges

by Xiao-Kun Wu, Min Chen, Wanyi Li, Rui Wang, Limeng Lu, Jia Liu, Kai Hwang, Yixue Hao, Yanru Pan, Qingguo Meng, Kaibin Huang, Long Hu, Mohsen Guizani, Naipeng Chao, Giancarlo Fortino, Fei Lin, Yonglin Tian, Dusit Niyato and Fei-Yue Wang

Big Data Cogn. Comput. 2025, 9(4), 87; https://doi.org/10.3390/bdcc9040087 - 2 Apr 2025

Cited by 4 | Viewed by 6829

Abstract

As a foundation of large language models, fine-tuning drives rapid progress, broad applicability, and profound impacts on human–AI collaboration, surpassing earlier technological advancements. This paper provides a comprehensive overview of large language model (LLM) fine-tuning by integrating hermeneutic theories of human comprehension, with [...] Read more.

As a foundation of large language models, fine-tuning drives rapid progress, broad applicability, and profound impacts on human–AI collaboration, surpassing earlier technological advancements. This paper provides a comprehensive overview of large language model (LLM) fine-tuning by integrating hermeneutic theories of human comprehension, with a focus on the essential cognitive conditions that underpin this process. Drawing on Gadamer’s concepts of Vorverständnis, Distanciation, and the Hermeneutic Circle, the paper explores how LLM fine-tuning evolves from initial learning to deeper comprehension, ultimately advancing toward self-awareness. It examines the core principles, development, and applications of fine-tuning techniques, emphasizing its growing significance across diverse field and industries. The paper introduces a new term, “Tutorial Fine-Tuning (TFT)”, which annotates a process of intensive tuition given by a “tutor” to a small number of “students”, to define the latest round of LLM fine-tuning advancements. By addressing key challenges associated with fine-tuning, including ensuring adaptability, precision, credibility and reliability, this paper explores potential future directions for the co-evolution of humans and AI. By bridging theoretical perspectives with practical implications, this work provides valuable insights into the ongoing development of LLMs, emphasizing their potential to achieve higher levels of cognitive and operational intelligence. Full article

► Show Figures

Figure 1

19 pages, 999 KiB

Open AccessArticle

Development of a Predictive Model for the Biological Activity of Food and Microbial Metabolites Toward Estrogen Receptor Alpha (ERα) Using Machine Learning

by Maksim Kuznetsov, Olga Chernyavskaya, Mikhail Kutuzov, Daria Vilkova, Olga Novichenko, Alla Stolyarova, Dmitry Mashin and Igor Nikitin

Big Data Cogn. Comput. 2025, 9(4), 86; https://doi.org/10.3390/bdcc9040086 - 1 Apr 2025

Viewed by 661

Abstract

The interaction of estrogen receptor alpha (ERα) with various metabolites—both endogenous and exogenous, such as those present in food products, as well as gut microbiota-derived metabolites—plays a critical role in modulating the hormonal balance in the human body. In this study, we evaluated [...] Read more.

The interaction of estrogen receptor alpha (ERα) with various metabolites—both endogenous and exogenous, such as those present in food products, as well as gut microbiota-derived metabolites—plays a critical role in modulating the hormonal balance in the human body. In this study, we evaluated a suite of 27 machine learning models and, following systematic optimization and rigorous performance comparison, identified linear discriminant analysis (LDA) as the most effective predictive approach. A meticulously curated dataset comprising 75 molecular descriptors derived from compounds with known ERα activity was assembled, enabling the model to achieve an accuracy of 89.4% and an F1 score of 0.93, thereby demonstrating high predictive efficacy. Feature importance analysis revealed that both topological and physicochemical descriptors—most notably FractionCSP3 and AromaticProportion—play pivotal roles in the potential binding to ERα. Subsequently, the model was applied to chemicals commonly encountered in food products, such as indole and various phenolic compounds, indicating that approximately 70% of these substances exhibit activity toward ERα. Moreover, our findings suggest that food processing conditions, including fermentation, thermal treatment, and storage parameters, can significantly influence the formation of these active metabolites. These results underscore the promising potential of integrating predictive modeling into food technology and highlight the need for further experimental validation and model refinement to support innovative strategies for developing healthier and more sustainable food products. Full article

(This article belongs to the Special Issue Beyond Diagnosis: Machine Learning in Prognosis, Prevention, Healthcare, Neurosciences, and Precision Medicine)

► Show Figures

Figure 1

38 pages, 9923 KiB

Open AccessEditor’s ChoiceArticle

A Verifiable, Privacy-Preserving, and Poisoning Attack-Resilient Federated Learning Framework

by Washington Enyinna Mbonu, Carsten Maple, Gregory Epiphaniou and Christo Panchev

Big Data Cogn. Comput. 2025, 9(4), 85; https://doi.org/10.3390/bdcc9040085 - 31 Mar 2025

Viewed by 944

Abstract

Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a [...] Read more.

Federated learning is the on-device, collaborative training of a global model that can be utilized to support the privacy preservation of participants’ local data. In federated learning, there are challenges to model training regarding privacy preservation, security, resilience, and integrity. For example, a malicious server can indirectly obtain sensitive information through shared gradients. On the other hand, the correctness of the global model can be corrupted through poisoning attacks from malicious clients using carefully manipulated updates. Many related works on secure aggregation and poisoning attack detection have been proposed and applied in various scenarios to address these two issues. Nevertheless, existing works are based on the trust confidence that the server will return correctly aggregated results to the participants. However, a malicious server may return false aggregated results to participants. It is still an open problem to simultaneously preserve users’ privacy and defend against poisoning attacks while enabling participants to verify the correctness of aggregated results from the server. In this paper, we propose a privacy-preserving and poisoning attack-resilient federated learning framework that supports the verification of aggregated results from the server. Specifically, we design a zero-trust dual-server architectural framework instead of a traditional single-server scheme based on trust. We exploit additive secret sharing to eliminate the single point of exposure of the training data and implement a weight selection and filtering strategy to enhance robustness to poisoning attacks while supporting the verification of aggregated results from the servers. Theoretical analysis and extensive experiments conducted on real-world data demonstrate the practicability of our proposed framework. Full article

► Show Figures

Figure 1

26 pages, 30835 KiB

Open AccessEditor’s ChoiceArticle

Uncertainty-Aware δ-GLMB Filtering for Multi-Target Tracking

by M. Hadi Sepanj, Saed Moradi, Zohreh Azimifar and Paul Fieguth

Big Data Cogn. Comput. 2025, 9(4), 84; https://doi.org/10.3390/bdcc9040084 - 31 Mar 2025

Viewed by 600

Abstract

The

δ

-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in [...] Read more.

The

δ

-GLMB filter is an analytic solution to the multi-target Bayes recursion used in multi-target tracking. It extends the Generalised Labelled Multi-Bernoulli (GLMB) framework by providing an efficient and scalable implementation while preserving track identities, making it a widely used approach in the field. Theoretically, the

δ

-GLMB filter handles uncertainties in measurements in its filtering procedure. However, in practice, degeneration of the measurement quality affects the performance of this filter. In this paper, we discuss the effects of increasing measurement uncertainty on the

δ

-GLMB filter and also propose two heuristic methods to improve the performance of the filter in such conditions. The base idea of the proposed methods is to utilise the information stored in the history of the filtering procedure, which can be used to decrease the measurement uncertainty effects on the filter. Since GLMB filters have shown good results in the field of multi-target tracking, an uncertainty-immune

δ

-GLMB can serve as a strong tool in this area. In this study, the results indicate that the proposed heuristic ideas can improve the performance of filtering in the presence of uncertain observations. Experimental evaluations demonstrate that the proposed methods enhance track continuity and robustness, particularly in scenarios with low detection rates and high clutter, while maintaining computational feasibility. Full article

► Show Figures

Figure 1

20 pages, 4445 KiB

Open AccessArticle

COVID-19 Severity Classification Using Hybrid Feature Extraction: Integrating Persistent Homology, Convolutional Neural Networks and Vision Transformers

by Redet Assefa, Adane Mamuye and Marco Piangerelli

Big Data Cogn. Comput. 2025, 9(4), 83; https://doi.org/10.3390/bdcc9040083 - 31 Mar 2025

Viewed by 723

Abstract

This paper introduces a model that automates the diagnosis of a patient’s condition, reducing reliance on highly trained professionals, particularly in resource-constrained settings. To ensure data consistency, the dataset was preprocessed for uniformity in size, format, and color channels. Image quality was further [...] Read more.

This paper introduces a model that automates the diagnosis of a patient’s condition, reducing reliance on highly trained professionals, particularly in resource-constrained settings. To ensure data consistency, the dataset was preprocessed for uniformity in size, format, and color channels. Image quality was further enhanced using histogram equalization to improve the dynamic range. Lung regions were isolated using segmentation techniques, which also eliminated extraneous areas from the images. A modified segmentation-based cropping technique was employed to define an optimal cropping rectangle. Feature extraction was performed using persistent homology, deep learning, and hybrid methodologies. Persistent homology captured topological features across multiple scales, while the deep learning model leveraged convolutional transition equivariance, input-adaptive weighting, and the global receptive field provided by Vision Transformers. By integrating features from both methods, the classification model effectively predicted severity levels (mild, moderate, severe). The segmentation-based cropping method showed a modest improvement, achieving 80% accuracy, while stand-alone persistent homology features reached 66% accuracy. Notably, the hybrid model outperformed existing approaches, including SVM, ResNet50, and VGG16, achieving an accuracy of 82%. Full article

► Show Figures

Figure 1

22 pages, 1100 KiB

Open AccessArticle

Reinforced Residual Encoder–Decoder Network for Image Denoising via Deeper Encoding and Balanced Skip Connections

by Ismail Boucherit and Hamza Kheddar

Big Data Cogn. Comput. 2025, 9(4), 82; https://doi.org/10.3390/bdcc9040082 - 31 Mar 2025

Cited by 1 | Viewed by 802

Abstract

Traditional image denoising algorithms often struggle with real-world complexities such as spatially correlated noise, varying illumination conditions, sensor-specific noise patterns, motion blur, and structural distortions. This paper presents an enhanced residual denoising network, R-REDNet, which stands for Reinforced Residual Encoder–Decoder Network. The proposed [...] Read more.

Traditional image denoising algorithms often struggle with real-world complexities such as spatially correlated noise, varying illumination conditions, sensor-specific noise patterns, motion blur, and structural distortions. This paper presents an enhanced residual denoising network, R-REDNet, which stands for Reinforced Residual Encoder–Decoder Network. The proposed architecture incorporates deeper convolutional layers in the encoder and replaces additive skip connections with averaging operations to improve feature extraction and noise suppression. Additionally, the method leverages an iterative refinement approach, further enhancing its denoising performance. Experiments conducted on two real-world noisy image datasets demonstrate that R-REDNet outperforms current state-of-the-art approaches. Specifically, it attained a peak signal-to-noise ratio of 44.01 dB and a structural similarity index of 0.9931 on Dataset 1, and it obtained a peak signal-to-noise ratio of 46.15 dB with a structural similarity index of 0.9955 on Dataset 2. These findings confirm the efficiency of our method in delivering high-quality image restoration while preserving fine details. Full article

► Show Figures

Figure 1

17 pages, 840 KiB

Open AccessArticle

Enhancing Green Practice Detection in Social Media with Paraphrasing-Based Data Augmentation

by Anna Glazkova and Olga Zakharova

Big Data Cogn. Comput. 2025, 9(4), 81; https://doi.org/10.3390/bdcc9040081 - 31 Mar 2025

Viewed by 451

Abstract

Detecting mentions of green waste practices on social networks is a crucial tool for environmental monitoring and sustainability analytics. Social media serve as a valuable source of ecological information, enabling researchers to track trends, assess public engagement, and predict the spread of sustainable [...] Read more.

Detecting mentions of green waste practices on social networks is a crucial tool for environmental monitoring and sustainability analytics. Social media serve as a valuable source of ecological information, enabling researchers to track trends, assess public engagement, and predict the spread of sustainable behaviors. Automatic extraction of mentions of green waste practices facilitates large-scale analysis, but the uneven distribution of such mentions presents a challenge for effective detection. To address this, data augmentation plays a key role in balancing class distribution in green practice detection tasks. In this study, we compared existing data augmentation techniques based on the paraphrasing of original texts. We evaluated the effectiveness of additional explanations in prompts, the Chain-of-Thought prompting, synonym substitution, and text expansion. Experiments were conducted on the GreenRu dataset, which focuses on detecting mentions of green waste practices in Russian social media. Our results, obtained using two instruction-based large language models, demonstrated the effectiveness of the Chain-of-Thought prompting for text augmentation. These findings contribute to advancing sustainability analytics by improving automated detection and analysis of environmental discussions. Furthermore, the results of this study can be applied to other tasks that require augmentation of text data in the context of ecological research and beyond. Full article

► Show Figures

Figure 1

23 pages, 5219 KiB

Open AccessArticle

Optimized Resource Allocation Algorithm for a Deadline-Aware IoT Healthcare Model

by Amal EL-Natat, Nirmeen A. El-Bahnasawy, Ayman El-Sayed and Sahar Elkazzaz

Big Data Cogn. Comput. 2025, 9(4), 80; https://doi.org/10.3390/bdcc9040080 - 30 Mar 2025

Viewed by 447

Abstract

In recent years, the healthcare market has grown very fast and is dealing with a huge increase in data. Healthcare applications are time-sensitive and need quick responses with fewer delays. Fog Computing (FC) was introduced to achieve this aim. It can be applied [...] Read more.

In recent years, the healthcare market has grown very fast and is dealing with a huge increase in data. Healthcare applications are time-sensitive and need quick responses with fewer delays. Fog Computing (FC) was introduced to achieve this aim. It can be applied in various application areas like healthcare, smart and intelligent environments, etc. In healthcare applications, some tasks are considered critical and need to be processed first; other tasks are time-sensitive and need to be processed before their deadline. In this paper, we have proposed a Task Classification algorithm based on Deadline and Criticality (TCDC) for serving healthcare applications in a fog environment. It depends on classifying tasks based on the critical level to process critical tasks first and considers the deadline of the task, which is an essential parameter to consider in real-time applications. The performance of TCDC was compared with some of the literature. The simulation results showed that the proposed algorithm can improve the overall performance in terms of some QoS parameters like makespan with an improved ratio from 60% to 70%, resource utilization, etc. Full article

(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 9, Issue 4 (April 2025) – 37 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI