Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 25.3 days after submission; acceptance to publication is undertaken in 5.6 days (median values for papers published in this journal in the second half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2023)
Latest Articles
Cognitive Computing for Understanding and Restoring Color in Renaissance Art
Big Data Cogn. Comput. 2025, 9(5), 113; https://doi.org/10.3390/bdcc9050113 - 23 Apr 2025
Abstract
►
Show Figures
In this article, for the first time on this topic, we analyze the historical color palettes of Renaissance oil paintings by using machine-learning methods and digital images. Our work has two main parts: we collect data on their historical color palettes and then
[...] Read more.
In this article, for the first time on this topic, we analyze the historical color palettes of Renaissance oil paintings by using machine-learning methods and digital images. Our work has two main parts: we collect data on their historical color palettes and then use machine learning to predict the original colors of paintings. This model studies color ratios, enhancement levels, symbolic meanings, and historical records. It looks at key colors, measures their relationships, and learns how they have changed. The main contributions of this work are as follows: (i) we develop a model that predicts a painting’s original color palette based on multiple factors, such as the color ratios and symbolic meanings, and (ii) we propose a framework for using cognitive computing tools to recover the original colors of historical artworks. This helps us to rediscover lost emotional and cultural details.
Full article
Open AccessArticle
Cognitive Computing with Large Language Models for Student Assessment Feedback
by
Noorhan Abbas and Eric Atwell
Big Data Cogn. Comput. 2025, 9(5), 112; https://doi.org/10.3390/bdcc9050112 - 23 Apr 2025
Abstract
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B
[...] Read more.
Effective student feedback is fundamental to enhancing learning outcomes in higher education. While traditional assessment methods emphasise both achievements and development areas, the process remains time-intensive for educators. This research explores the application of cognitive computing, specifically open-source Large Language Models (LLMs) Mistral-7B and CodeLlama-7B, to streamline feedback generation for student reports containing both Python programming elements and English narrative content. The findings indicate that these models can provide contextually appropriate feedback on both technical Python coding and English specification and documentation. They effectively identified coding weaknesses and provided constructive suggestions for improvement, as well as insightful feedback on English language quality, structure, and clarity in report writing. These results contribute to the growing body of knowledge on automated assessment feedback in higher education, offering practical insights for institutions considering the implementation of open-source LLMs in their workflows. There are around 22 thousand assessment submissions per year in the School of Computer Science, which is one of eight schools in the Faculty of Engineering and Physical Sciences, which is one of seven faculties in the University of Leeds, which is one of one hundred and sixty-six universities in the UK, so there is clear potential for our methods to scale up to millions of assessment submissions. This study also examines the limitations of current approaches and proposes potential enhancements. The findings support a hybrid system where cognitive computing manages routine tasks and educators focus on complex, personalised evaluations, enhancing feedback quality, consistency, and efficiency in educational settings.
Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Open AccessArticle
Evaluating Deep Learning Architectures for Breast Tumor Classification and Ultrasound Image Detection Using Transfer Learning
by
Christopher Kormpos, Fotios Zantalis, Stylianos Katsoulis and Grigorios Koulouras
Big Data Cogn. Comput. 2025, 9(5), 111; https://doi.org/10.3390/bdcc9050111 - 23 Apr 2025
Abstract
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images,
[...] Read more.
The intersection of medical image classification and deep learning has garnered increasing research interest, particularly in the context of breast tumor detection using ultrasound images. Prior studies have predominantly focused on image classification, segmentation, and feature extraction, often assuming that the input images, whether sourced from healthcare professionals or individuals, are valid and relevant for analysis. To address this, we propose an initial binary classification filter to distinguish between relevant and irrelevant images, ensuring only meaningful data proceeds to subsequent analysis. However, the primary focus of this study lies in investigating the performance of a hierarchical two-tier classification architecture compared to a traditional flat three-class classification model, by employing a well-established breast ultrasound images dataset. Specifically, we explore whether sequentially breaking down the problem into binary classifications, first identifying normal versus tumorous tissue and then distinguishing benign from malignant tumors, yields better accuracy and robustness than directly classifying all three categories in a single step. Using a range of evaluation metrics, the hierarchical architecture demonstrates notable advantages in certain critical aspects of model performance. The findings of this study provide valuable guidance for selecting the optimal architecture for the final model, facilitating its seamless integration into a web application for deployment. These insights are further anticipated to advance future algorithm development and broaden the potential of the research applicability across diverse fields.
Full article
(This article belongs to the Special Issue Beyond Diagnosis: Machine Learning in Prognosis, Prevention, Healthcare, Neurosciences, and Precision Medicine)
►▼
Show Figures

Figure 1
Open AccessArticle
A Formal Model of Trajectories for the Aggregation of Semantic Attributes
by
Francisco Javier Moreno Arboleda, Georgia Garani and Natalia Andrea Álvarez Hoyos
Big Data Cogn. Comput. 2025, 9(5), 110; https://doi.org/10.3390/bdcc9050110 - 22 Apr 2025
Abstract
►▼
Show Figures
A trajectory is a set of time-stamped locations of a moving object usually recorded by GPS sensors. Today, an abundance of these data is available. These large quantities of data need to be analyzed to determine patterns and associations of interest to business
[...] Read more.
A trajectory is a set of time-stamped locations of a moving object usually recorded by GPS sensors. Today, an abundance of these data is available. These large quantities of data need to be analyzed to determine patterns and associations of interest to business analysts. In this paper, a formal model of trajectories is proposed, which focuses on the aggregation of semantic attributes. These attributes can be associated by the analyst to different structural elements of a trajectory (either to the points, to the edges, or to the entire trajectory). The model allows the analyst to specify not only these semantic attributes, but also to specify for each semantic attribute the set of aggregation operators (SUM, AVG, MAX, MIN, etc.) that the analyst considers appropriate to be applied to the attribute in question. The concept of PAV (package of aggregate values) is also introduced and formalized. PAVs can help identify patterns in traffic, tourism, migrations, among other fields. Experiments with real data about trajectories of people revealed interesting findings about the way people move and showed the expediency and usefulness of the proposal. The contributions in this work provide a foundation for future research in developing trajectory applications including analysis and visualization of trajectory aggregated data based on formal grounds.
Full article

Figure 1
Open AccessArticle
Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection
by
Khrystyna Lipianina-Honcharenko, Nazar Melnyk, Andriy Ivasechko, Mykola Telka and Oleg Illiashenko
Big Data Cogn. Comput. 2025, 9(4), 109; https://doi.org/10.3390/bdcc9040109 - 21 Apr 2025
Abstract
►▼
Show Figures
Deepfake technology poses significant threats in various domains, including politics, cybersecurity, and social media. This study uses the golden frame selection technique to present a neural network ensemble method for deepfake classification. The proposed approach optimizes computational resources by extracting the most informative
[...] Read more.
Deepfake technology poses significant threats in various domains, including politics, cybersecurity, and social media. This study uses the golden frame selection technique to present a neural network ensemble method for deepfake classification. The proposed approach optimizes computational resources by extracting the most informative video frames, improving detection accuracy. We integrate multiple deep learning models, including ResNet50, EfficientNetB0, Xception, InceptionV3, and Facenet, with an XGBoost meta-model for enhanced classification performance. Experimental results demonstrate a 91% accuracy rate, outperforming traditional deepfake detection models. Additionally, feature importance analysis using Grad-CAM highlights how different architectures focus on distinct facial regions, enhancing overall model interpretability. The findings contribute to of robust and efficient deepfake detection techniques, with potential applications in digital forensics, media verification, and cybersecurity.
Full article

Figure 1
Open AccessArticle
Semantic-Driven Approach for Validation of IoT Streaming Data in Trustable Smart City Decision-Making and Monitoring Systems
by
Oluwaseun Bamgboye, Xiaodong Liu, Peter Cruickshank and Qi Liu
Big Data Cogn. Comput. 2025, 9(4), 108; https://doi.org/10.3390/bdcc9040108 - 21 Apr 2025
Abstract
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams.
[...] Read more.
Ensuring the trustworthiness of data used in real-time analytics remains a critical challenge in smart city monitoring and decision-making. This is because the traditional data validation methods are insufficient for handling the dynamic and heterogeneous nature of Internet of Things (IoT) data streams. This paper describes a semantic IoT streaming data validation approach to provide a semantic IoT data model and process IoT streaming data with the semantic stream processing systems to check the quality requirements of IoT streams. The proposed approach enhances the understanding of smart city data while supporting real-time, data-driven decision-making and monitoring processes. A publicly available sensor dataset collected from a busy road in Milan city is constructed, annotated and semantically processed by the proposed approach and its architecture. The architecture, built on a robust semantic-based system, incorporates a reasoning technique based on forward rules, which is integrated within the semantic stream query processing system. It employs serialized Resource Description Framework (RDF) data formats to enhance stream expressiveness and enables the real-time validation of missing and inconsistent data streams within continuous sliding-window operations. The effectiveness of the approach is validated by deploying multiple RDF stream instances to the architecture before evaluating its accuracy and performance (in terms of reasoning time). The approach underscores the capability of semantic technology in sustaining the validation of IoT streaming data by accurately identifying up to 99% of inconsistent and incomplete streams in each streaming window. Also, it can maintain the performance of the semantic reasoning process in near real time. The approach provides an enhancement to data quality and credibility, capable of providing near-real-time decision support mechanisms for critical smart city applications, and facilitates accurate situational awareness across both the application and operational levels of the smart city.
Full article
(This article belongs to the Special Issue Industrial Applications of IoT and Blockchain for Sustainable Environment)
►▼
Show Figures

Figure 1
Open AccessArticle
3D Urban Digital Twinning on the Web with Low-Cost Technology: 3D Geospatial Data and IoT Integration for Wellness Monitoring
by
Marcello La Guardia
Big Data Cogn. Comput. 2025, 9(4), 107; https://doi.org/10.3390/bdcc9040107 - 21 Apr 2025
Abstract
Recent advances in computer science and geomatics have enabled the digitalization of complex two-dimensional and three-dimensional spatial environments and the sharing of geospatial data on the web. Simultaneously, the widespread adoption of Internet of Things (IoT) technology has facilitated the rapid deployment of
[...] Read more.
Recent advances in computer science and geomatics have enabled the digitalization of complex two-dimensional and three-dimensional spatial environments and the sharing of geospatial data on the web. Simultaneously, the widespread adoption of Internet of Things (IoT) technology has facilitated the rapid deployment of low-cost sensor networks in various scientific applications. The integration of real-time IoT data acquisition in 3D urban environments lays the foundation for the development of Urban Digital Twins. This work proposes a possible low-cost solution as a sample of a structure for 3D digital twinning on the web, presenting a case study related to weather monitoring analysis. Specifically, an indoor-outdoor environmental conditions monitoring system integrated with 3D geospatial data on a 3D WebGIS platform was developed. This solution can be considered as a first step for monitoring human and environmental wellness within a geospatial analysis system that integrates several open-source modules that provide different kinds of information (geospatial data, 3D models, and IoT acquisition). The structure of this system can be valuable for municipalities and private stakeholders seeking to conduct environmental geospatial analysis using cost-effective solutions.
Full article
(This article belongs to the Special Issue Application of Cloud Computing in Industrial Internet of Things)
►▼
Show Figures

Figure 1
Open AccessArticle
From Rating Predictions to Reliable Recommendations in Collaborative Filtering: The Concept of Recommendation Reliability Classes
by
Dionisis Margaris, Costas Vassilakis and Dimitris Spiliotopoulos
Big Data Cogn. Comput. 2025, 9(4), 106; https://doi.org/10.3390/bdcc9040106 - 17 Apr 2025
Abstract
►▼
Show Figures
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to
[...] Read more.
Recommender systems aspire to provide users with recommendations that have a high probability of being accepted. This is accomplished by producing rating predictions for products that the users have not evaluated, and, afterwards, the products with the highest prediction scores are recommended to them. Collaborative filtering is a popular recommender system technique which generates rating prediction scores by blending the ratings that users with similar preferences have previously given to these products. However, predictions may entail errors, which will either lead to recommending products that the users would not accept or failing to recommend products that the users would actually accept. The first case is considered much more critical, since the recommender system will lose a significant amount of reliability and consequently interest. In this paper, after performing a study on rating prediction confidence factors in collaborative filtering, (a) we introduce the concept of prediction reliability classes, (b) we rank these classes in relation to the utility of the rating predictions belonging to each class, and (c) we present a collaborative filtering recommendation algorithm which exploits these reliability classes for prediction formulation. The efficacy of the presented algorithm is evaluated through an extensive multi-parameter evaluation process, which demonstrates that it significantly enhances recommendation quality.
Full article

Figure 1
Open AccessArticle
Assessing the Impact of Temperature and Precipitation Trends of Climate Change on Agriculture Based on Multiple Global Circulation Model Projections in Malta
by
Benjamin Mifsud Scicluna and Charles Galdies
Big Data Cogn. Comput. 2025, 9(4), 105; https://doi.org/10.3390/bdcc9040105 - 17 Apr 2025
Abstract
►▼
Show Figures
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison
[...] Read more.
The Maltese Islands, situated at the centre of the Mediterranean basin, are recognised as a climate change hotspot. This study utilises projected changes in temperature and precipitation derived from the World Climate Research Program (WCRP) and analyses outputs from six coupled model intercomparison project phase 5 (CMIP5) models under two Representative Concentration pathways (RCPs). Through statistical and spatial analysis, the study demonstrates that climate change will have significant adverse effects on Maltese agriculture. Regardless of the RCP scenario considered, projections indicate a substantial increase in temperature and a decline in precipitation, exacerbating aridity and intensifying heat stress. These changes are expected to reduce soil moisture availability and challenge traditional agricultural practices. The study identifies the Western District as a relatively more favourable area for crop cultivation due to its comparatively lower temperatures, whereas the Northern and South Eastern peripheries are projected to experience more severe heat stress. Adaptation strategies, including the selection of heat-tolerant crop varieties such as Tetyda and Finezja, optimised water management techniques, and intercropping practices, are proposed to enhance agricultural resilience. This study is among the few comprehensive assessments of bioclimatic and physical factors affecting Maltese agriculture and highlights the urgent need for targeted adaptation measures to safeguard food production in the region.
Full article

Figure 1
Open AccessReview
Reimagining Robots: The Future of Cybernetic Organisms with Energy-Efficient Designs
by
Stefan Stavrev
Big Data Cogn. Comput. 2025, 9(4), 104; https://doi.org/10.3390/bdcc9040104 - 17 Apr 2025
Abstract
The development of cybernetic organisms—autonomous systems capable of self-regulation and dynamic environmental interaction—requires innovations in both energy efficiency and computational adaptability. This study explores the integration of bio-inspired liquid flow batteries and neuromorphic computing architectures to enable real-time learning and power optimization in
[...] Read more.
The development of cybernetic organisms—autonomous systems capable of self-regulation and dynamic environmental interaction—requires innovations in both energy efficiency and computational adaptability. This study explores the integration of bio-inspired liquid flow batteries and neuromorphic computing architectures to enable real-time learning and power optimization in autonomous robotic systems. Liquid-based energy storage systems, modeled after vascular networks, offer distributed energy management, reducing power bottlenecks and improving resilience in long-duration operations. Complementing this, neuromorphic computing architectures, including memristor-based processors and spiking neural networks (SNNs), enhance computational efficiency while minimizing energy consumption. By integrating these adaptive energy and computing systems, robots can dynamically allocate power and processing resources based on real-time demands, bridging the gap between biological and artificial intelligence. This study evaluates the feasibility of integrating these technologies into robotic platforms, assessing power demands, storage capacity, and operational scalability. While flow batteries and neuromorphic computing show promise in reducing latency and energy constraints, challenges remain in electrolyte stability, computational framework standardization, and real-world implementation. Future research must focus on hybrid computing architectures, self-regulating energy distribution, and material optimizations to enhance the adaptability of cybernetic organisms. By addressing these challenges, this study outlines a roadmap for reimagining robotics through cybernetic principles, paving the way for applications in healthcare, industrial automation, space exploration, and adaptive autonomous systems in dynamic environments.
Full article
Open AccessArticle
Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments
by
Jamolbek Mattiev, Ulugbek Salaev and Branko Kavšek
Big Data Cogn. Comput. 2025, 9(4), 103; https://doi.org/10.3390/bdcc9040103 - 17 Apr 2025
Abstract
►▼
Show Figures
Word games are of great importance in the acquisition of vocabulary and letter recognition among children, usually between the ages of 3 and 13, boosting their memory, word retention, spelling, and cognition. Despite the importance of these games, little attention has been paid
[...] Read more.
Word games are of great importance in the acquisition of vocabulary and letter recognition among children, usually between the ages of 3 and 13, boosting their memory, word retention, spelling, and cognition. Despite the importance of these games, little attention has been paid to the development of word games for low-resource or highly morphologically constructed languages. This study develops an Advanced Cubic-oriented Game (ACG) model by using a character-level N-gram technique and statistics, commonly known as the matching letter game, wherein a player forms words using a given number of cubes with letters on each of its sides. The main objective of this study is to find out the optimal number of letter cubes while maintaining the overall coverage. Comprehensive experiments on 12 datasets (from low-resource and high-resource languages) incorporating morphological features were conducted to form 3–5-letter words using 7–8 cubes and a special case of forming 6–7-letter words using 8–9 cubes. Experimental evaluations show that the ACG model achieved reasonably high results in terms of average total coverage, with 89.5% for 3–5-letter words using eight cubes and 79.7% for 6–7-letter words using nine cubes over 12 datasets. The ACG model obtained over 90% coverage for Uzbek, Turkish, English, Slovenian, Spanish, French, and Malaysian when constructing 3–5-letter words using eight cubes.
Full article

Figure 1
Open AccessArticle
Efficient Trajectory Prediction Using Check-In Patterns in Location-Based Social Network
by
Eman M. Bahgat, Alshaimaa Abo-alian, Sherine Rady and Tarek F. Gharib
Big Data Cogn. Comput. 2025, 9(4), 102; https://doi.org/10.3390/bdcc9040102 - 17 Apr 2025
Abstract
Location-based social networks (LBSNs) leverage geo-location technologies to connect users with places, events, and other users nearby. Using GPS data, platforms like Foursquare enable users to check into locations, share their locations, and receive location-based recommendations. A significant research gap in LBSNs lies
[...] Read more.
Location-based social networks (LBSNs) leverage geo-location technologies to connect users with places, events, and other users nearby. Using GPS data, platforms like Foursquare enable users to check into locations, share their locations, and receive location-based recommendations. A significant research gap in LBSNs lies in the limited exploration of users’ tendencies to withhold certain location data. While existing studies primarily focus on the locations users choose to disclose and the activities they attend, there is a lack of research on the hidden or intentionally omitted locations. Understanding these concealed patterns and integrating them into predictive models could enhance the accuracy and depth of location prediction, offering a more comprehensive view of user mobility behavior. This paper solves this gap by proposing an Associative Hidden Location Trajectory Prediction model (AHLTP) that leverages user trajectories to infer unchecked locations. The FP-growth mining technique is used in AHLTP to extract frequent patterns of check-in locations, combined with machine-learning methods such as K-nearest-neighbor, gradient-boosted-trees, and deep learning to classify hidden locations. Moreover, AHLTP uses association rule mining to derive the frequency of successive check-in pairs for the purpose of hidden location prediction. The proposed AHLTP integrated with the machine-learning models classifies the data effectively, with the KNN attaining the highest accuracy at 98%, followed by gradient-boosted trees at 96% and deep learning at 92%. Comparative study using a real-world dataset demonstrates the model’s superior accuracy compared to state-of-the-art approaches.
Full article
(This article belongs to the Special Issue Research Progress in Artificial Intelligence and Social Network Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Analysis of China’s High-Speed Railway Network Using Complex Network Theory and Graph Convolutional Networks
by
Zhenguo Xu, Jun Li, Irene Moulitsas and Fangqu Niu
Big Data Cogn. Comput. 2025, 9(4), 101; https://doi.org/10.3390/bdcc9040101 - 16 Apr 2025
Abstract
►▼
Show Figures
This study investigated the characteristics and functionalities of China’s High-Speed Railway (HSR) network based on Complex Network Theory (CNT) and Graph Convolutional Networks (GCN). First, complex network analysis was applied to provide insights into the network’s fundamental characteristics, such as small-world properties, efficiency,
[...] Read more.
This study investigated the characteristics and functionalities of China’s High-Speed Railway (HSR) network based on Complex Network Theory (CNT) and Graph Convolutional Networks (GCN). First, complex network analysis was applied to provide insights into the network’s fundamental characteristics, such as small-world properties, efficiency, and robustness. Then, this research developed three novel GCN models to identify key nodes, detect community structures, and predict new links. Findings from the complex network analysis revealed that China’s HSR network exhibits a typical small-world property, with a degree distribution that follows a log-normal pattern rather than a power law. The global efficiency indicator suggested that stations are typically connected through direct routes, while the local efficiency indicator showed that the network performs effectively within local areas. The robustness study indicated that the network can quickly lose connectivity if key nodes fail, though it showed an ability initially to self-regulate and has partially restored its structure after disruption. The GCN model for key node identification revealed that the key nodes in the network were predominantly located in economically significant and densely populated cities, positively contributing to the network’s overall efficiency and robustness. The community structures identified by the integrated GCN model highlight the economic and social connections between official urban clusters and the communities. Results from the link prediction model suggest the necessity of improving the long-distance connectivity across regions. Future work will explore the network’s socio-economic dynamics and refine and generalise the GCN models.
Full article

Figure 1
Open AccessArticle
Subjective Assessment of a Built Environment by ChatGPT, Gemini and Grok: Comparison with Architecture, Engineering and Construction Expert Perception
by
Rachid Belaroussi
Big Data Cogn. Comput. 2025, 9(4), 100; https://doi.org/10.3390/bdcc9040100 - 14 Apr 2025
Abstract
The emergence of Multimodal Large Language Models (MLLMs) has made methods of artificial intelligence accessible to the general public in a conversational way. It offers tools for the automated visual assessment of the quality of a built environment for professionals of urban planning
[...] Read more.
The emergence of Multimodal Large Language Models (MLLMs) has made methods of artificial intelligence accessible to the general public in a conversational way. It offers tools for the automated visual assessment of the quality of a built environment for professionals of urban planning without requiring specific technical knowledge on computing. We investigated the capability of MLLMs to perceive urban environments based on images and textual prompts. We compared the outputs of several popular models—ChatGPT, Gemini and Grok—to the visual assessment of experts in Architecture, Engineering and Construction (AEC) in the context of a real estate construction project. Our analysis was based on subjective attributes proposed to characterize various aspects of a built environment. Four urban identities served as case studies, set in a virtual environment designed using professional 3D models. We found that there can be an alignment between human and AI evaluation on some aspects such as space and scale and architectural style, and more general accordance in environments with vegetation. However, there were noticeable differences in response patterns between the AIs and AEC experts, particularly concerning subjective aspects such as the general emotional resonance of specific urban identities. It raises questions regarding the hallucinations of generative AI where the AI invents information and behaves creatively but its outputs are not accurate.
Full article
(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)
►▼
Show Figures

Figure 1
Open AccessArticle
Predicting College Enrollment for Low-Socioeconomic-Status Students Using Machine Learning Approaches
by
Surina He, Mehrdad Yousefpoori-Naeim, Ying Cui and Maria Cutumisu
Big Data Cogn. Comput. 2025, 9(4), 99; https://doi.org/10.3390/bdcc9040099 - 12 Apr 2025
Abstract
►▼
Show Figures
College enrollment has long been recognized as a critical pathway to better employment prospects and improved economic outcomes. However, the overall enrollment rates have declined in recent years, and students with a lower socioeconomic status (SES) or those from disadvantaged backgrounds remain significantly
[...] Read more.
College enrollment has long been recognized as a critical pathway to better employment prospects and improved economic outcomes. However, the overall enrollment rates have declined in recent years, and students with a lower socioeconomic status (SES) or those from disadvantaged backgrounds remain significantly underrepresented in higher education. To investigate the factors influencing college enrollment among low-SES high school students, this study analyzed data from the High School Longitudinal Study of 2009 (HSLS:09) using five widely used machine learning algorithms. The sample included 5223 ninth-grade students from lower socioeconomic backgrounds (51% female; Mage = 14.59) whose biological parents or stepparents completed a parental questionnaire. The results showed that, among all five classifiers, the random forest algorithm achieved the highest classification accuracy at 67.73%. Additionally, the top three predictors of enrollment in 2-year or 4-year colleges were students’ overall high school GPA, parental educational expectations, and the number of close friends planning to attend a 4-year college. Conversely, the most important predictors of non-enrollment were high school GPA, parental educational expectations, and the number of close friends who had dropped out of high school. These findings advance our understanding of the factors shaping college enrollment for low-SES students and highlight two important factors for intervention: improving students’ academic performance and fostering future-oriented goals among their peers and parents.
Full article

Figure 1
Open AccessArticle
An Enhanced Genetic Algorithm for Optimized Educational Assessment Test Generation Through Population Variation
by
Doru-Anastasiu Popescu
Big Data Cogn. Comput. 2025, 9(4), 98; https://doi.org/10.3390/bdcc9040098 - 11 Apr 2025
Abstract
►▼
Show Figures
The most important aspect of a genetic algorithm (GA) lies in the optimal solution found. The result obtained by a genetic algorithm can be evaluated according to the quality of this solution. It is important that this solution is optimal or close to
[...] Read more.
The most important aspect of a genetic algorithm (GA) lies in the optimal solution found. The result obtained by a genetic algorithm can be evaluated according to the quality of this solution. It is important that this solution is optimal or close to optimal in relation to the defined performance criteria, usually the fitness value. This study addresses the problem of automated generation of assessment tests in education. In this paper, we present the design of a model of assessment test generation used in education using genetic algorithms. The assessment covers a series of courses taught over a period of time. The genetic algorithm presents an improvement or development, which consists of the initial population variation, obtained by the selection of a large fixed number of individuals from various populations, which are ordered by the fitness value using merge sort, chosen for the reason of the high number of individuals. The initial population variation can be seen as a specific modality for increasing the diversity and number of the initial population of a genetic algorithm, which influences the algorithm performance. This process increases the diversity and quality of the initial population, improving the algorithm’s overall performance. The development/novelty brought about by this paper is related to its application to a specific issue (educational assessment test generation) and the specific methodology used for population variation. This development can be applied for large sets of individuals, the variety, and the large number of generated individuals leading to higher odds to increase the performance of the algorithm. Experimental results demonstrate that the proposed method outperforms traditional GA implementations in terms of solution quality and convergence speed, showing its effectiveness for large-scale test generation tasks.
Full article

Figure 1
Open AccessArticle
Deep Learning for Early Skin Cancer Detection: Combining Segmentation, Augmentation, and Transfer Learning
by
Ravi Karki, Shishant G C, Javad Rezazadeh and Ammara Khan
Big Data Cogn. Comput. 2025, 9(4), 97; https://doi.org/10.3390/bdcc9040097 - 11 Apr 2025
Abstract
►▼
Show Figures
Skin cancer, particularly melanoma, is one of the leading causes of cancer-related deaths. It is essential to detect and start the treatment in the early stages for it to be effective and to improve survival rates. This study developed and evaluated a deep
[...] Read more.
Skin cancer, particularly melanoma, is one of the leading causes of cancer-related deaths. It is essential to detect and start the treatment in the early stages for it to be effective and to improve survival rates. This study developed and evaluated a deep learning-based classification model to classify the skin lesion images as benign (non-cancerous) and malignant (cancerous). In this study, we used the ISIC 2016 dataset to train the segmentation model and the Kaggle dataset of 10,000 images to train the classification model. We applied different data pre-processing techniques to enhance the robustness of our model. We used the segmentation model to generate a binary segmentation mask and used it with the corresponding pre-processed image by overlaying its edges to highlight the lesion region, before feeding it to the classification model. We used transfer learning, using ResNet-50 as a backbone model for a feedforward network. We achieved an accuracy of 92.80%, a precision of 98.64%, and a recall of 86.80%. From our study, we have found that integrating deep learning techniques with proper data pre-processing improves the model’s performance. Future work will focus on expanding the datasets and testing more architectures to improve the performance metrics of the model.
Full article

Figure 1
Open AccessArticle
A Self-Attention CycleGAN for Unsupervised Image Hazing
by
Hongyin Ni and Wanshan Su
Big Data Cogn. Comput. 2025, 9(4), 96; https://doi.org/10.3390/bdcc9040096 - 11 Apr 2025
Abstract
The high cost and difficulty of collecting real-world foggy scene images mean that automatic driving datasets produce limited images in bad weather and lead to deficient training in automatic driving systems, causing unsafe judgments and leading to traffic accidents. Therefore, to effectively promote
[...] Read more.
The high cost and difficulty of collecting real-world foggy scene images mean that automatic driving datasets produce limited images in bad weather and lead to deficient training in automatic driving systems, causing unsafe judgments and leading to traffic accidents. Therefore, to effectively promote the safety and robustness of an autonomous driving system, we improved the CycleGAN model to achieve dataset augmentation of foggy images. Firstly, by combining the self-attention mechanism and the residual network architecture, the sense of hierarchy of the fog effect in the synthesized image was significantly refined. Then, LPIPS was employed to adjust the calculation method for cycle consistency loss to make the synthetic picture more similar to the original one in terms of perception. The experimental results showed that the FID index of the foggy image generated by the improved CycleGAN network was reduced by 3.34, the IS index increased by 15.8%, and the SSIM index increased by 0.1%. The modified method enhances the generation of foggy images, while retaining more details of the original image and reducing content distortion.
Full article
(This article belongs to the Special Issue Recent Advances in Machine Learning Methods for Imperfect Large-Scale Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Bayesian Deep Neural Networks with Agnostophilic Approaches
by
Sarah McDougall, Sarah Rauchas and Vahid Rafe
Big Data Cogn. Comput. 2025, 9(4), 95; https://doi.org/10.3390/bdcc9040095 - 9 Apr 2025
Abstract
►▼
Show Figures
A vital area of AI is the ability of a model to recognise the limits of its knowledge and flag when presented with something unclassifiable instead of making incorrect predictions. It has often been claimed that probabilistic networks, particularly Bayesian neural networks, are
[...] Read more.
A vital area of AI is the ability of a model to recognise the limits of its knowledge and flag when presented with something unclassifiable instead of making incorrect predictions. It has often been claimed that probabilistic networks, particularly Bayesian neural networks, are unsuited to this problem due to unknown data, meaning that the denominator in Bayes’ equation would be incalculable. This study challenges this view, approaching the task as a blended problem, by considering unknowns to be highly corrupted data, and creating adequate working spaces and generalizations. The core of this method lies in structuring the network in such a manner as to target the high and low confidence levels of the predictions. Instead of simply adjusting for low confidence, developing a consistent gap in the confidence in class predictions between known image types and unseen, unclassifiable data new datapoints can be accurately identified and unknown inputs flagged accordingly through averaged thresholding. In this way, the model is also self-reflecting, using the uncertainties for all data rather than just the unknown subsections in order to determine the limits of its knowledge. The results show that these models are capable of strong performance on a variety of image datasets, with levels of accuracy, recall, and prediction gap consistency across a range of openness levels similar to those achieved using traditional methods.
Full article

Figure 1
Open AccessArticle
AI-Powered Trade Forecasting: A Data-Driven Approach to Saudi Arabia’s Non-Oil Exports
by
Musab Aloudah, Mahdi Alajmi, Alaa Sagheer, Abdulelah Algosaibi, Badr Almarri and Eid Albelwi
Big Data Cogn. Comput. 2025, 9(4), 94; https://doi.org/10.3390/bdcc9040094 - 9 Apr 2025
Abstract
This paper investigates the application of artificial intelligence (AI) in forecasting Saudi Arabia’s non-oil export trajectories, contributing to the Kingdom’s Vision 2030 objectives for economic diversification. A suite of machine learning models, including LSTM, Transformer variants, Ensemble Stacking, XGBRegressor, and Random Forest, was
[...] Read more.
This paper investigates the application of artificial intelligence (AI) in forecasting Saudi Arabia’s non-oil export trajectories, contributing to the Kingdom’s Vision 2030 objectives for economic diversification. A suite of machine learning models, including LSTM, Transformer variants, Ensemble Stacking, XGBRegressor, and Random Forest, was applied to historical export and GDP data. Among them, the Advanced Transformer model, configured with an increased attention head size, achieved the highest accuracy (MAPE: 0.73%), effectively capturing complex temporal dependencies. The Non-Linear Blending Ensemble, integrating Random Forest, XGBRegressor, and AdaBoost, also performed robustly (MAPE: 1.23%), demonstrating the benefit of leveraging heterogeneous learners. While the Temporal Fusion Transformer (TFT) provided a useful macroeconomic context through GDP integration, its relatively higher error (MAPE: 5.48%) highlighted the challenges of incorporating aggregate indicators into forecasting pipelines. Explainable AI tools, including SHAP analysis and Partial Dependence Plots (PDPs), revealed that recent export lags (lag1, lag2, lag3, and lag10) were the most influential features, offering critical transparency into model behavior. These findings reinforce the promise of interpretable AI-powered forecasting frameworks in delivering actionable, data-informed insights to support strategic economic planning.
Full article
(This article belongs to the Special Issue Industrial Data Mining and Machine Learning Applications)
►▼
Show Figures

Figure 1

Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Conferences
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, BDCC, Future Internet, Information, Sci
Social Computing and Social Network Analysis
Topic Editors: Carson K. Leung, Fei Hao, Giancarlo Fortino, Xiaokang ZhouDeadline: 30 June 2025
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 25 July 2025
Topic in
Algorithms, BDCC, BioMedInformatics, Information, Mathematics
Machine Learning Empowered Drug Screen
Topic Editors: Teng Zhou, Jiaqi Wang, Youyi SongDeadline: 31 August 2025
Topic in
IJERPH, JPM, Healthcare, BDCC, Applied Sciences, Sensors
eHealth and mHealth: Challenges and Prospects, 2nd EditionTopic Editors: Antonis Billis, Manuel Dominguez-Morales, Anton CivitDeadline: 31 October 2025

Conferences
Special Issues
Special Issue in
BDCC
Perception and Detection of Intelligent Vision
Guest Editors: Hongshan Yu, Zhengeng Yang, Mingtao Feng, Qieshi ZhangDeadline: 30 April 2025
Special Issue in
BDCC
Advances in Natural Language Processing and Text Mining
Guest Editors: Zuchao Li, Min PengDeadline: 30 April 2025
Special Issue in
BDCC
Industrial Data Mining and Machine Learning Applications
Guest Editors: Yung Po Tsang, C. H. Wu, Kit-Fai PunDeadline: 30 April 2025
Special Issue in
BDCC
Artificial Intelligence in Sustainable Reconfigurable Manufacturing Systems and Operations Management
Guest Editors: Hamed Gholami, Jose Arturo Garza-ReyesDeadline: 31 May 2025