Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published quarterly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 16.4 days after submission; acceptance to publication is undertaken in 3.9 days (median values for papers published in this journal in the first half of 2023).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2022)
Latest Articles
Defining Semantically Close Words of Kazakh Language with Distributed System Apache Spark
Big Data Cogn. Comput. 2023, 7(4), 160; https://doi.org/10.3390/bdcc7040160 - 27 Sep 2023
Abstract
►
Show Figures
This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from information retrieval to spell checking or even
[...] Read more.
This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from information retrieval to spell checking or even document clustering and classification. Although, in languages with rich linguistic resources, the methods and tools for this task are well established, some languages do not have such tools. The first step in our experiment is to represent the words in a collection in a vector form and then define the semantic similarity of the terms using a vector similarity method. In order to tame the complexity of the task, which relies on the number of word (and, consequently, of the vector) pairs that have to be combined in order to define the semantically closest word pairs, A distributed method that runs on Apache Spark is designed to reduce the calculation time by running comparison tasks in parallel. Three alternative implementations are proposed and tested using a list of target words and seeking the most semantically similar words from a lexicon for each one of them. In a second step, we employ pre-trained multilingual sentence transformers to capture the content semantics at a sentence level and a vector-based semantic index to accelerate the searches. The code is written in MapReduce, and the experiments and results show that the proposed methods can provide an interesting solution for finding similar words or texts in the Kazakh language.
Full article
Open AccessArticle
A Pruning Method Based on Feature Map Similarity Score
Big Data Cogn. Comput. 2023, 7(4), 159; https://doi.org/10.3390/bdcc7040159 - 26 Sep 2023
Abstract
►▼
Show Figures
As the number of layers of deep learning models increases, the number of parameters and computation increases, making it difficult to deploy on edge devices. Pruning has the potential to significantly reduce the number of parameters and computations in a deep learning model.
[...] Read more.
As the number of layers of deep learning models increases, the number of parameters and computation increases, making it difficult to deploy on edge devices. Pruning has the potential to significantly reduce the number of parameters and computations in a deep learning model. Existing pruning methods frequently require a specific distribution of network parameters to achieve good results when measuring filter importance. As a result, a feature map similarity score-based pruning method is proposed. We calculate the similarity score of each feature map to measure the importance of the filter and guide filter pruning using the similarity between the filter output feature maps to measure the redundancy of the corresponding filter. Pruning experiments on ResNet-56 and ResNet-110 networks on Cifar-10 datasets can compress the model by more than 70% while maintaining a higher compression ratio and accuracy than traditional methods.
Full article

Figure 1
Open AccessArticle
Ensemble-Based Short Text Similarity: An Easy Approach for Multilingual Datasets Using Transformers and WordNet in Real-World Scenarios
Big Data Cogn. Comput. 2023, 7(4), 158; https://doi.org/10.3390/bdcc7040158 - 25 Sep 2023
Abstract
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords
[...] Read more.
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords from different sources and languages by exploiting transformers and WordNet-based methods. Key features of the approach include its unsupervised pipeline, mitigation of the lack of context in keywords, scalability for large archives, support for multiple languages and real-world scenarios adaptation capabilities. The work aims to provide a versatile tool for different cultural heritage archives without requiring complex customization. The paper aims to explore different approaches to identifying similarities in 1- or n-gram tags, evaluate and compare different pre-trained language models, and define integrated methods to overcome limitations. Tests to validate the approach have been conducted using the QueryLab portal, a search engine for cultural heritage archives, to evaluate the proposed pipeline.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)
►▼
Show Figures

Figure 1
Open AccessArticle
Intelligent Method for Classifying the Level of Anthropogenic Disasters
Big Data Cogn. Comput. 2023, 7(3), 157; https://doi.org/10.3390/bdcc7030157 - 21 Sep 2023
Abstract
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper
[...] Read more.
Anthropogenic disasters pose a challenge to management in the modern world. At the same time, it is important to have accurate and timely information to assess the level of danger and take appropriate measures to eliminate disasters. Therefore, the purpose of the paper is to develop an effective method for assessing the level of anthropogenic disasters based on information from witnesses to the event. For this purpose, a conceptual model for assessing the consequences of anthropogenic disasters is proposed, the main components of which are the following ones: the analysis of collected data, modeling and assessment of their consequences. The main characteristics of the intelligent method for classifying the level of anthropogenic disasters are considered, in particular, exploratory data analysis using the EDA method, classification based on textual data using SMOTE, and data classification by the ensemble method of machine learning using boosting. The experimental results confirmed that for textual data, the best classification is at level V and level I with an error of 0.97 and 0.94, respectively, and the average error estimate is 0.68. For quantitative data, the classification accuracy of Potential Accident Level relative to Industry Sector is 77%, and the f1-score is 0.88, which indicates a fairly high accuracy of the model. The architecture of a mobile application for classifying the level of anthropogenic disasters has been developed, which reduces the time required to assess consequences of danger in the region. In addition, the proposed approach ensures interaction with dynamic and uncertain environments, which makes it an effective tool for classifying.
Full article
(This article belongs to the Special Issue Quality and Security of Critical Infrastructure Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
Big Data Analytics with the Multivariate Adaptive Regression Splines to Analyze Key Factors Influencing Accident Severity in Industrial Zones of Thailand: A Study on Truck and Non-Truck Collisions
by
, , , , , , and
Big Data Cogn. Comput. 2023, 7(3), 156; https://doi.org/10.3390/bdcc7030156 - 21 Sep 2023
Abstract
Machine learning currently holds a vital position in predicting collision severity. Identifying factors associated with heightened risks of injury and fatalities aids in enhancing road safety measures and management. Presently, Thailand faces considerable challenges with respect to road traffic accidents. These challenges are
[...] Read more.
Machine learning currently holds a vital position in predicting collision severity. Identifying factors associated with heightened risks of injury and fatalities aids in enhancing road safety measures and management. Presently, Thailand faces considerable challenges with respect to road traffic accidents. These challenges are particularly acute in industrial zones, where they contribute to a rise in injuries and fatalities. The mixture of heavy traffic, comprising both trucks and non-trucks, significantly amplifies the risk of accidents. This situation, hence, generates profound concerns for road safety in Thailand. Consequently, discerning the factors that influence the severity of injuries and fatalities becomes pivotal for formulating effective road safety policies and measures. This study is specifically aimed at predicting the factors contributing to the severity of accidents involving truck and non-truck collisions in industrial zones. It considers a variety of aspects, including roadway characteristics, underlying assumptions of cause, crash characteristics, and weather conditions. Due to the fact that accident data is big data with specific characteristics and complexity, with the employment of machine learning in tandem with the Multi-variate Adaptive Regression Splines technique, we can make precise predictions to identify the factors influencing the severity of collision outcomes. The analysis demonstrates that various factors augment the severity of accidents involving trucks. These include darting in front of a vehicle, head-on collisions, and pedestrian collisions. Conversely, for non-truck related collisions, the significant factors that heighten severity are tailgating, running signs/signals, angle collisions, head-on collisions, overtaking collisions, pedestrian collisions, obstruction collisions, and collisions during overcast conditions. These findings illuminate the significant factors influencing the severity of accidents involving trucks and non-trucks. Such insights provide invaluable information for developing targeted road safety measures and policies, thereby contributing to the mitigation of injuries and fatalities.
Full article
(This article belongs to the Special Issue Sustainable Big Data Analytics and Machine Learning Technologies)
►▼
Show Figures

Figure 1
Open AccessArticle
Semi-Supervised Classification with A*: A Case Study on Electronic Invoicing
Big Data Cogn. Comput. 2023, 7(3), 155; https://doi.org/10.3390/bdcc7030155 - 20 Sep 2023
Abstract
This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine
[...] Read more.
This paper addresses the time-intensive task of assigning accurate account labels to invoice entries within corporate bookkeeping. Despite the advent of electronic invoicing, many software solutions still rely on rule-based approaches that fail to address the multifaceted nature of this challenge. While machine learning holds promise for such repetitive tasks, the presence of low-quality training data often poses a hurdle. Frequently, labels pertain to invoice rows at a group level rather than an individual level, leading to the exclusion of numerous records during preprocessing. To enhance the efficiency of an invoice entry classifier within a semi-supervised context, this study proposes an innovative approach that combines the classifier with the A* graph search algorithm. Through experimentation across various classifiers, the results consistently demonstrated a noteworthy increase in accuracy, ranging between 1% and 4%. This improvement is primarily attributed to a marked reduction in the discard rate of data, which decreased from 39% to 14%. This paper contributes to the literature by presenting a method that leverages the synergy of a classifier and A* graph search to overcome challenges posed by limited and group-level label information in the realm of electronic invoicing classification.
Full article
(This article belongs to the Special Issue Computational Finance and Big Data Analytics)
►▼
Show Figures

Figure 1
Open AccessArticle
Efficient and Controllable Model Compression through Sequential Knowledge Distillation and Pruning
by
and
Big Data Cogn. Comput. 2023, 7(3), 154; https://doi.org/10.3390/bdcc7030154 - 19 Sep 2023
Abstract
►▼
Show Figures
Efficient model deployment is a key focus in deep learning. This has led to the exploration of methods such as knowledge distillation and network pruning to compress models and increase their performance. In this study, we investigate the potential synergy between knowledge distillation
[...] Read more.
Efficient model deployment is a key focus in deep learning. This has led to the exploration of methods such as knowledge distillation and network pruning to compress models and increase their performance. In this study, we investigate the potential synergy between knowledge distillation and network pruning to achieve optimal model efficiency and improved generalization. We introduce an innovative framework for model compression that combines knowledge distillation, pruning, and fine-tuning to achieve enhanced compression while providing control over the degree of compactness. Our research is conducted on popular datasets, CIFAR-10 and CIFAR-100, employing diverse model architectures, including ResNet, DenseNet, and EfficientNet. We could calibrate the amount of compression achieved. This allows us to produce models with different degrees of compression while still being just as accurate, or even better. Notably, we demonstrate its efficacy by producing two compressed variants of ResNet 101: ResNet 50 and ResNet 18. Our results reveal intriguing findings. In most cases, the pruned and distilled student models exhibit comparable or superior accuracy to the distilled student models while utilizing significantly fewer parameters.
Full article

Figure 1
Open AccessArticle
Implementing a Synchronization Method between a Relational and a Non-Relational Database
Big Data Cogn. Comput. 2023, 7(3), 153; https://doi.org/10.3390/bdcc7030153 - 18 Sep 2023
Abstract
►▼
Show Figures
The accelerating pace of application development requires more frequent database switching, as technological advancements demand agile adaptation. The increase in the volume of data and at the same time, the number of transactions has determined that some applications migrate from one database to
[...] Read more.
The accelerating pace of application development requires more frequent database switching, as technological advancements demand agile adaptation. The increase in the volume of data and at the same time, the number of transactions has determined that some applications migrate from one database to another, especially from a relational database to a non-relational (NoSQL) alternative. In this transition phase, the coexistence of both databases becomes necessary. In addition, certain users choose to keep both databases permanently updated to exploit the individual strengths of each database in order to streamline operations. Existing solutions mainly focus on replication, failing to adequately address the management of synchronization between a relational and a non-relational (NoSQL) database. This paper proposes a practical IT approach to this problem and tests the feasibility of the proposed solution by developing an application that maintains the synchronization between a MySQL database as a relational database and MongoDB as a non-relational database. The performance and capabilities of the solution are analyzed to ensure data consistency and correctness. In addition, problems that arose during the development of the application are highlighted and solutions are proposed to solve them.
Full article

Figure 1
Open AccessArticle
Predicting Forex Currency Fluctuations Using a Novel Bio-Inspired Modular Neural Network
Big Data Cogn. Comput. 2023, 7(3), 152; https://doi.org/10.3390/bdcc7030152 - 15 Sep 2023
Abstract
►▼
Show Figures
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes
[...] Read more.
In the realm of foreign exchange (Forex) market predictions, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been commonly employed. However, these models often exhibit instability due to vulnerability to data perturbations attributed to their monolithic architecture. Hence, this study proposes a novel neuroscience-informed modular network that harnesses closing prices and sentiments from Yahoo Finance and Twitter APIs. Compared to monolithic methods, the objective is to advance the effectiveness of predicting price fluctuations in Euro to British Pound Sterling (EUR/GBP). The proposed model offers a unique methodology based on a reinvigorated modular CNN, replacing pooling layers with orthogonal kernel initialisation RNNs coupled with Monte Carlo Dropout (MCoRNNMCD). It integrates two pivotal modules: a convolutional simple RNN and a convolutional Gated Recurrent Unit (GRU). These modules incorporate orthogonal kernel initialisation and Monte Carlo Dropout techniques to mitigate overfitting, assessing each module’s uncertainty. The synthesis of these parallel feature extraction modules culminates in a three-layer Artificial Neural Network (ANN) decision-making module. Established on objective metrics like the Mean Square Error (MSE), rigorous evaluation underscores the proposed MCoRNNMCD–ANN’s exceptional performance. MCoRNNMCD–ANN surpasses single CNNs, LSTMs, GRUs, and the state-of-the-art hybrid BiCuDNNLSTM, CLSTM, CNN–LSTM, and LSTM–GRU in predicting hourly EUR/GBP closing price fluctuations.
Full article

Figure 1
Open AccessArticle
Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect
Big Data Cogn. Comput. 2023, 7(3), 151; https://doi.org/10.3390/bdcc7030151 - 15 Sep 2023
Abstract
►▼
Show Figures
The Kuwaiti dialect is a particular dialect of Arabic spoken in Kuwait; it differs significantly from standard Arabic and the dialects of neighboring countries in the same region. Few research papers with a focus on the Kuwaiti dialect have been published in the
[...] Read more.
The Kuwaiti dialect is a particular dialect of Arabic spoken in Kuwait; it differs significantly from standard Arabic and the dialects of neighboring countries in the same region. Few research papers with a focus on the Kuwaiti dialect have been published in the field of NLP. In this study, we created Kuwaiti dialect language resources using Q8VaxStance, a vaccine stance labeling system for a large dataset of tweets. This dataset fills this gap and provides a valuable resource for researchers studying vaccine hesitancy in Kuwait. Furthermore, it contributes to the Arabic natural language processing field by providing a dataset for developing and evaluating machine learning models for stance detection in the Kuwaiti dialect. The proposed vaccine stance labeling system combines the benefits of weak supervised learning and zero-shot learning; for this purpose, we implemented 52 experiments on 42,815 unlabeled tweets extracted between December 2020 and July 2022. The results of the experiments show that using keyword detection in conjunction with zero-shot model labeling functions is significantly better than using only keyword detection labeling functions or just zero-shot model labeling functions. Furthermore, for the total number of generated labels, the difference between using the Arabic language in both the labels and prompt or a mix of Arabic labels and an English prompt is statistically significant, indicating that it generates more labels than when using English in both the labels and prompt. The best accuracy achieved in our experiments in terms of the Macro-F1 values was found when using keyword and hashtag detection labeling functions in conjunction with zero-shot model labeling functions, specifically in experiments KHZSLF-EE4 and KHZSLF-EA1, with values of 0.83 and 0.83, respectively. Experiment KHZSLF-EE4 was able to label 42,270 tweets, while experiment KHZSLF-EA1 was able to label 42,764 tweets. Finally, the average value of annotation agreement between the generated labels and human labels ranges between 0.61 and 0.64, which is considered a good level of agreement.
Full article

Figure 1
Open AccessArticle
Impulsive Aggression Break, Based on Early Recognition Using Spatiotemporal Features
Big Data Cogn. Comput. 2023, 7(3), 150; https://doi.org/10.3390/bdcc7030150 - 14 Sep 2023
Abstract
The study of human behaviors aims to gain a deeper perception of stimuli that control decision making. To describe, explain, predict, and control behavior, human behavior can be classified as either non-aggressive or anomalous behavior. Anomalous behavior is any unusual activity; impulsive aggressive,
[...] Read more.
The study of human behaviors aims to gain a deeper perception of stimuli that control decision making. To describe, explain, predict, and control behavior, human behavior can be classified as either non-aggressive or anomalous behavior. Anomalous behavior is any unusual activity; impulsive aggressive, or violent behaviors are the most harmful. The detection of such behaviors at the initial spark is critical for guiding public safety decisions and a key to its security. This paper proposes an automatic aggressive-event recognition method based on effective feature representation and analysis. The proposed approach depends on a spatiotemporal discriminative feature that combines histograms of oriented gradients and dense optical flow features. In addition, the principal component analysis (PCA) and linear discriminant analysis (LDA) techniques are used for complexity reduction. The performance of the proposed approach is analyzed on three datasets: Hockey-Fight (HF), Stony Brook University (SBU)-Kinect, and Movie-Fight (MF), with accuracy rates of 96.5%, 97.8%, and 99.6%, respectively. Also, this paper assesses and contrasts the feature engineering and learned features for impulsive aggressive event recognition. Experiments show promising results of the proposed method compared to the state of the art. The implementation of the proposed work is available here.
Full article
(This article belongs to the Special Issue Applied Data Science for Social Good)
►▼
Show Figures

Figure 1
Open AccessArticle
Visual Explanations of Differentiable Greedy Model Predictions on the Influence Maximization Problem
by
, , , , and
Big Data Cogn. Comput. 2023, 7(3), 149; https://doi.org/10.3390/bdcc7030149 - 05 Sep 2023
Abstract
Social networks have become important objects of study in recent years. Social media marketing has, for example, greatly benefited from the vast literature developed in the past two decades. The study of social networks has taken advantage of recent advances in machine learning
[...] Read more.
Social networks have become important objects of study in recent years. Social media marketing has, for example, greatly benefited from the vast literature developed in the past two decades. The study of social networks has taken advantage of recent advances in machine learning to process these immense amounts of data. Automatic emotional labeling of content on social media has, for example, been made possible by the recent progress in natural language processing. In this work, we are interested in the influence maximization problem, which consists of finding the most influential nodes in the social network. The problem is classically carried out using classical performance metrics such as accuracy or recall, which is not the end goal of the influence maximization problem. Our work presents an end-to-end learning model, SGREEDYNN, for the selection of the most influential nodes in a social network, given a history of information diffusion. In addition, this work proposes data visualization techniques to interpret the augmenting performances of our method compared to classical training. The results of this method are confirmed by visualizing the final influence of the selected nodes on network instances with edge bundling techniques. Edge bundling is a visual aggregation technique that makes patterns emerge. It has been shown to be an interesting asset for decision-making. By using edge bundling, we observe that our method chooses more diverse and high-degree nodes compared to the classical training.
Full article
(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)
►▼
Show Figures

Figure 1
Open AccessArticle
Crafting a Museum Guide Using ChatGPT4
Big Data Cogn. Comput. 2023, 7(3), 148; https://doi.org/10.3390/bdcc7030148 - 04 Sep 2023
Abstract
This paper introduces a groundbreaking approach to enriching the museum experience using ChatGPT4, a state-of-the-art language model by OpenAI. By developing a museum guide powered by ChatGPT4, we aimed to address the challenges visitors face in navigating vast collections of artifacts and interpreting
[...] Read more.
This paper introduces a groundbreaking approach to enriching the museum experience using ChatGPT4, a state-of-the-art language model by OpenAI. By developing a museum guide powered by ChatGPT4, we aimed to address the challenges visitors face in navigating vast collections of artifacts and interpreting their significance. Leveraging the model’s natural-language-understanding and -generation capabilities, our guide offers personalized, informative, and engaging experiences. However, caution must be exercised as the generated information may lack scientific integrity and accuracy. To mitigate this, we propose incorporating human oversight and validation mechanisms. The subsequent sections present our own case study, detailing the design, architecture, and experimental evaluation of the museum guide system, highlighting its practical implementation and insights into the benefits and limitations of employing ChatGPT4 in the cultural heritage context.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)
►▼
Show Figures

Figure 1
Open AccessReview
Innovative Robotic Technologies and Artificial Intelligence in Pharmacy and Medicine: Paving the Way for the Future of Health Care—A Review
by
and
Big Data Cogn. Comput. 2023, 7(3), 147; https://doi.org/10.3390/bdcc7030147 - 30 Aug 2023
Abstract
►▼
Show Figures
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such
[...] Read more.
The future of innovative robotic technologies and artificial intelligence (AI) in pharmacy and medicine is promising, with the potential to revolutionize various aspects of health care. These advances aim to increase efficiency, improve patient outcomes, and reduce costs while addressing pressing challenges such as personalized medicine and the need for more effective therapies. This review examines the major advances in robotics and AI in the pharmaceutical and medical fields, analyzing the advantages, obstacles, and potential implications for future health care. In addition, prominent organizations and research institutions leading the way in these technological advancements are highlighted, showcasing their pioneering efforts in creating and utilizing state-of-the-art robotic solutions in pharmacy and medicine. By thoroughly analyzing the current state of robotic technologies in health care and exploring the possibilities for further progress, this work aims to provide readers with a comprehensive understanding of the transformative power of robotics and AI in the evolution of the healthcare sector. Striking a balance between embracing technology and preserving the human touch, investing in R&D, and establishing regulatory frameworks within ethical guidelines will shape a future for robotics and AI systems. The future of pharmacy and medicine is in the seamless integration of robotics and AI systems to benefit patients and healthcare providers.
Full article

Figure 1
Open AccessCommunication
Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis
Big Data Cogn. Comput. 2023, 7(3), 146; https://doi.org/10.3390/bdcc7030146 - 25 Aug 2023
Abstract
►▼
Show Figures
Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object,
[...] Read more.
Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.
Full article

Figure 1
Open AccessArticle
Applied Digital Twin Concepts Contributing to Heat Transition in Building, Campus, Neighborhood, and Urban Scale
by
, , , , , , , and
Big Data Cogn. Comput. 2023, 7(3), 145; https://doi.org/10.3390/bdcc7030145 - 25 Aug 2023
Abstract
The heat transition is a central pillar of the energy transition, aiming to decarbonize and improve the energy efficiency of the heat supply in both the private and industrial sectors. On the one hand, this is achieved by substituting fossil fuels with renewable
[...] Read more.
The heat transition is a central pillar of the energy transition, aiming to decarbonize and improve the energy efficiency of the heat supply in both the private and industrial sectors. On the one hand, this is achieved by substituting fossil fuels with renewable energy. On the other hand, it involves reducing overall heat consumption and associated transmission and ventilation losses. In addition to refurbishment, digitalization contributes significantly. Despite substantial research on Digital Twins (DTs) for heat transition at different scales, a cross-scale perspective on heat optimization still needs to be developed. In response to this research gap, the present study examines four instances of applied DTs across various scales: building, campus, neighborhood, and urban. The study compares their objectives and conceptual frameworks while also identifying common challenges and potential synergies. The study’s findings indicate that all DT scales face similar data-related challenges, such as gathering, ownership, connectivity, and reliability. Also, hierarchical synergy is identified among the DTs, implying the need for collaboration and exchange. In response to this, the “Wärmewende” data platform, whose objectives and concepts are presented in the paper, promotes research data and knowledge exchange with internal and external stakeholders.
Full article
(This article belongs to the Special Issue Digital Twins for Complex Systems)
►▼
Show Figures

Figure 1
Open AccessArticle
Enhancing the Early Detection of Chronic Kidney Disease: A Robust Machine Learning Model
Big Data Cogn. Comput. 2023, 7(3), 144; https://doi.org/10.3390/bdcc7030144 - 16 Aug 2023
Abstract
Clinical decision-making in chronic disorder prognosis is often hampered by high variance, leading to uncertainty and negative outcomes, especially in cases such as chronic kidney disease (CKD). Machine learning (ML) techniques have emerged as valuable tools for reducing randomness and enhancing clinical decision-making.
[...] Read more.
Clinical decision-making in chronic disorder prognosis is often hampered by high variance, leading to uncertainty and negative outcomes, especially in cases such as chronic kidney disease (CKD). Machine learning (ML) techniques have emerged as valuable tools for reducing randomness and enhancing clinical decision-making. However, conventional methods for CKD detection often lack accuracy due to their reliance on limited sets of biological attributes. This research proposes a novel ML model for predicting CKD, incorporating various preprocessing steps, feature selection, a hyperparameter optimization technique, and ML algorithms. To address challenges in medical datasets, we employ iterative imputation for missing values and a novel sequential approach for data scaling, combining robust scaling, z-standardization, and min-max scaling. Feature selection is performed using the Boruta algorithm, and the model is developed using ML algorithms. The proposed model was validated on the UCI CKD dataset, achieving outstanding performance with 100% accuracy. Our approach, combining innovative preprocessing steps, the Boruta feature selection, and the k-nearest neighbors algorithm, along with a hyperparameter optimization using grid-search cross-validation (CV), demonstrates its effectiveness in enhancing the early detection of CKD. This research highlights the potential of ML techniques in improving clinical support systems and reducing the impact of uncertainty in chronic disorder prognosis.
Full article
(This article belongs to the Special Issue Big Data in Health Care Information Systems)
►▼
Show Figures

Figure 1
Open AccessReview
Ransomware Detection Using Machine Learning: A Survey
Big Data Cogn. Comput. 2023, 7(3), 143; https://doi.org/10.3390/bdcc7030143 - 16 Aug 2023
Abstract
Ransomware attacks pose significant security threats to personal and corporate data and information. The owners of computer-based resources suffer from verification and privacy violations, monetary losses, and reputational damage due to successful ransomware assaults. As a result, it is critical to accurately and
[...] Read more.
Ransomware attacks pose significant security threats to personal and corporate data and information. The owners of computer-based resources suffer from verification and privacy violations, monetary losses, and reputational damage due to successful ransomware assaults. As a result, it is critical to accurately and swiftly identify ransomware. Numerous methods have been proposed for identifying ransomware, each with its own advantages and disadvantages. The main objective of this research is to discuss current trends in and potential future debates on automated ransomware detection. This document includes an overview of ransomware, a timeline of assaults, and details on their background. It also provides comprehensive research on existing methods for identifying, avoiding, minimizing, and recovering from ransomware attacks. An analysis of studies between 2017 and 2022 is another advantage of this research. This provides readers with up-to-date knowledge of the most recent developments in ransomware detection and highlights advancements in methods for combating ransomware attacks. In conclusion, this research highlights unanswered concerns and potential research challenges in ransomware detection.
Full article
(This article belongs to the Special Issue Managing Cybersecurity Threats and Increasing Organizational Resilience)
►▼
Show Figures

Figure 1
Open AccessArticle
Breast Cancer Classification Using Concatenated Triple Convolutional Neural Networks Model
Big Data Cogn. Comput. 2023, 7(3), 142; https://doi.org/10.3390/bdcc7030142 - 16 Aug 2023
Abstract
►▼
Show Figures
Improved disease prediction accuracy and reliability are the main concerns in the development of models for the medical field. This study examined methods for increasing classification accuracy and proposed a precise and reliable framework for categorizing breast cancers using mammography scans. Concatenated Convolutional
[...] Read more.
Improved disease prediction accuracy and reliability are the main concerns in the development of models for the medical field. This study examined methods for increasing classification accuracy and proposed a precise and reliable framework for categorizing breast cancers using mammography scans. Concatenated Convolutional Neural Networks (CNN) were developed based on three models: Two by transfer learning and one entirely from scratch. Misclassification of lesions from mammography images can also be reduced using this approach. Bayesian optimization performs hyperparameter tuning of the layers, and data augmentation will refine the model by using more training samples. Analysis of the model’s accuracy revealed that it can accurately predict disease with 97.26% accuracy in binary cases and 99.13% accuracy in multi-classification cases. These findings are in contrast with recent studies on the same issue using the same dataset and demonstrated a 16% increase in multi-classification accuracy. In addition, an accuracy improvement of 6.4% was achieved after hyperparameter modification and augmentation. Thus, the model tested in this study was deemed superior to those presented in the extant literature. Hence, the concatenation of three different CNNs from scratch and transfer learning allows the extraction of distinct and significant features without leaving them out, enabling the model to make exact diagnoses.
Full article

Figure 1
Open AccessArticle
Hadiths Classification Using a Novel Author-Based Hadith Classification Dataset (ABCD)
by
, , , , , and
Big Data Cogn. Comput. 2023, 7(3), 141; https://doi.org/10.3390/bdcc7030141 - 14 Aug 2023
Abstract
►▼
Show Figures
Religious studies are a rich land for Natural Language Processing (NLP). The reason is that all religions have their instructions as written texts. In this paper, we apply NLP to Islamic Hadiths, which are the written traditions, sayings, actions, approvals, and discussions of
[...] Read more.
Religious studies are a rich land for Natural Language Processing (NLP). The reason is that all religions have their instructions as written texts. In this paper, we apply NLP to Islamic Hadiths, which are the written traditions, sayings, actions, approvals, and discussions of the Prophet Muhammad, his companions, or his followers. A Hadith is composed of two parts: the chain of narrators (Sanad) and the content of the Hadith (Matn). A Hadith is transmitted from its author to a Hadith book author using a chain of narrators. The problem we solve focuses on the classification of Hadiths based on their origin of narration. This is important for several reasons. First, it helps determine the authenticity and reliability of the Hadiths. Second, it helps trace the chain of narration and identify the narrators involved in transmitting Hadiths. Finally, it helps understand the historical and cultural contexts in which Hadiths were transmitted, and the different levels of authority attributed to the narrators. To the best of our knowledge, and based on our literature review, this problem is not solved before using machine/deep learning approaches. To solve this classification problem, we created a novel Author-Based Hadith Classification Dataset (ABCD) collected from classical Hadiths’ books. The ABCD size is 29 K Hadiths and it contains unique 18 K narrators, with all their information. We applied machine learning (ML), and deep learning (DL) approaches. ML was applied on Sanad and Matn separately; then, we did the same with DL. The results revealed that ML performs better than DL using the Matn input data, with a 77% F1-score. DL performed better than ML using the Sanad input data, with a 92% F1-score. We used precision and recall alongside the F1-score; details of the results are explained at the end of the paper. We claim that the ABCD and the reported results will motivate the community to work in this new area. Our dataset and results will represent a baseline for further research on the same problem.
Full article

Figure 1

Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Algorithms, Applied Sciences, BDCC, MAKE, Sensors
Artificial Intelligence and Fuzzy Systems
Topic Editors: Amelia Zafra, Jose Manuel Soto HidalgoDeadline: 30 November 2023
Topic in
Applied Sciences, BDCC, Photonics, Processes, Remote Sensing, Automation
Advances in AI-Empowered Beamline Automation and Data Science in Advanced Photon Sources
Topic Editors: Yi Zhang, Xiaogang Yang, Chunpeng Wang, Junrong ZhangDeadline: 20 December 2023
Topic in
AI, Applied Sciences, BDCC, Sensors, Information
Applied Computing and Machine Intelligence (ACMI)
Topic Editors: Chuan-Ming Liu, Wei-Shinn KuDeadline: 31 December 2023
Topic in
AI, BDCC, Economies, IJFS, JTAER, Sustainability
Artificial Intelligence Applications in Financial Technology
Topic Editors: Albert Y.S. Lam, Yanhui GengDeadline: 1 March 2024

Conferences
Special Issues
Special Issue in
BDCC
Data Science in Health Care
Guest Editors: Nadav Rappoport, Yuval Shahar, Hyojung PaikDeadline: 20 October 2023
Special Issue in
BDCC
Cyber Security in Big Data Era
Guest Editor: Fabrizio BaiardiDeadline: 27 October 2023
Special Issue in
BDCC
Research Progress in Artificial Intelligence and Social Network Analysis
Guest Editors: Yong Tang, Chaobo He, Chengzhou FuDeadline: 10 November 2023
Special Issue in
BDCC
Human Factor in Information Systems Development and Management
Guest Editors: Paweł Weichbroth, Jolanta Kowal, Mieczysław Lech OwocDeadline: 30 November 2023
Topical Collections
Topical Collection in
BDCC
Machine Learning and Artificial Intelligence for Health Applications on Social Networks
Collection Editor: Carmela Comito