Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 4.5 days (median values for papers published in this journal in the first half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2023)
Latest Articles
From Fact Drafts to Operational Systems: Semantic Search in Legal Decisions Using Fact Drafts
Big Data Cogn. Comput. 2024, 8(12), 185; https://doi.org/10.3390/bdcc8120185 - 10 Dec 2024
Abstract
►
Show Figures
This research paper presents findings from an investigation in the semantic similarity search task within the legal domain, using a corpus of 1172 Hungarian court decisions. The study establishes the groundwork for an operational semantic similarity search system designed to identify cases with
[...] Read more.
This research paper presents findings from an investigation in the semantic similarity search task within the legal domain, using a corpus of 1172 Hungarian court decisions. The study establishes the groundwork for an operational semantic similarity search system designed to identify cases with comparable facts using preliminary legal fact drafts. Evaluating such systems often poses significant challenges, given the need for thorough document checks, which can be costly and limit evaluation reusability. To address this, the study employs manually created fact drafts for legal cases, enabling reliable ranking of original cases within retrieved documents and quantitative comparison of various vectorization methods. The study compares twelve different text embedding solutions (the most recent became available just a few weeks before the manuscript was written) identifying Cohere’s embed-multilingual-v3.0, Beijing Academy of Artificial Intelligence’s bge-m3, Jina AI’s jina-embeddings-v3, OpenAI’s text-embedding-3-large, and Microsoft’s multilingual-e5-large models as top performers. To overcome the transformer-based models’ context window limitation, we investigated chunking, striding, and last chunk scaling techniques, with last chunk scaling significantly improving embedding quality. The results suggest that the effectiveness of striding varies based on token count. Notably, employing striding with 16 tokens yielded optimal results, representing 3.125% of the context window size for the best-performing models. Results also suggested that from the models having 8192 token long context window the bge-m3 model is superior compared to jina-embeddings-v3 and text-embedding-3-large models in capturing the relevant parts of a document if the text contains significant amount of noise. The validity of the approach was evaluated and confirmed by legal experts. These insights led to an operational semantic search system for a prominent legal content provider.
Full article
Open AccessArticle
The Use of Eye-Tracking to Explore the Relationship Between Consumers’ Gaze Behaviour and Their Choice Process
by
Maria-Jesus Agost and Vicente Bayarri-Porcar
Big Data Cogn. Comput. 2024, 8(12), 184; https://doi.org/10.3390/bdcc8120184 - 9 Dec 2024
Abstract
►▼
Show Figures
Eye-tracking technology can assist researchers in understanding motivational decision-making and choice processes by analysing consumers’ gaze behaviour. Previous studies showed that attention is related to decision, as the preferred stimulus is generally the most observed and the last visited before a decision is
[...] Read more.
Eye-tracking technology can assist researchers in understanding motivational decision-making and choice processes by analysing consumers’ gaze behaviour. Previous studies showed that attention is related to decision, as the preferred stimulus is generally the most observed and the last visited before a decision is made. In this work, the relationship between gaze behaviour and decision-making was explored using eye-tracking technology. Images of six wardrobes incorporating different sustainable design strategies were presented to 57 subjects, who were tasked with selecting the wardrobe they intended to keep the longest. The amount of time spent looking was higher when it was the chosen version. Detailed analyses of gaze plots and heat maps derived from eye-tracking records were employed to identify different patterns of gaze behaviour during the selection process. These patterns included alternating attention between a few versions or comparing them against a reference, allowing the identification of stimuli that initially piqued interest but were ultimately not chosen, as well as potential doubts in the decision-making process. These findings suggest that doubts that arise before making a selection warrant further investigation. By identifying stimuli that attract attention but are not chosen, this study provides valuable insights into consumer behaviour and decision-making processes.
Full article
Figure 1
Open AccessArticle
eFC-Evolving Fuzzy Classifier with Incremental Clustering Algorithm Based on Samples Mean Value
by
Emmanuel Tavares, Gray Farias Moita and Alisson Marques Silva
Big Data Cogn. Comput. 2024, 8(12), 183; https://doi.org/10.3390/bdcc8120183 - 6 Dec 2024
Abstract
►▼
Show Figures
This paper introduces a new multiclass classifier called the evolving Fuzzy Classifier (eFC). Starting its knowledge base from scratch, the eFC structure evolves based on a clustering algorithm that can add, merge, delete, or update clusters (= rules) simultaneously while providing class predictions.
[...] Read more.
This paper introduces a new multiclass classifier called the evolving Fuzzy Classifier (eFC). Starting its knowledge base from scratch, the eFC structure evolves based on a clustering algorithm that can add, merge, delete, or update clusters (= rules) simultaneously while providing class predictions. The procedure to add clusters uses the procrastination idea to prevent outliers from affecting the quality of learning. Two pruning mechanisms are used to maintain a concise and compact structure. In the first, redundant clusters are merged based on a similarity measure, and in the second, obsolete and unrepresentative clusters are excluded based on an inactivity strategy. The center of the clusters is adjusted based on the mean value of the attributes. The eFC model was evaluated and compared with state-of-the-art evolving fuzzy systems on 8 randomly selected data streams from the UCI and Kaggle repositories. The experimental results indicate that the eFC outperforms or is at least comparable to alternative state-of-the-art models. Specifically, the eFC achieved an average accuracy of 7% to 37% higher than the competing classifiers. The results and comparisons demonstrate that the eFC is a promising alternative for classification tasks in non-stationary environments, offering good accuracy, a compact structure, low computational cost, and efficient processing time.
Full article
Figure 1
Open AccessArticle
A Centrality-Weighted Bidirectional Encoder Representation from Transformers Model for Enhanced Sequence Labeling in Key Phrase Extraction from Scientific Texts
by
Tsitsi Zengeya, Jean Vincent Fonou Dombeu and Mandlenkosi Gwetu
Big Data Cogn. Comput. 2024, 8(12), 182; https://doi.org/10.3390/bdcc8120182 - 4 Dec 2024
Abstract
Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding
[...] Read more.
Deep learning approaches, utilizing Bidirectional Encoder Representation from Transformers (BERT) and advanced fine-tuning techniques, have achieved state-of-the-art accuracies in the domain of term extraction from texts. However, BERT presents some limitations in that it primarily captures the semantic context relative to the surrounding text without considering how relevant or central a token is to the overall document content. There has also been research on the application of sequence labeling on contextualized embeddings; however, the existing methods often rely solely on local context for extracting key phrases from texts. To address these limitations, this study proposes a centrality-weighted BERT model for key phrase extraction from text using sequence labelling (CenBERT-SEQ). The proposed CenBERT-SEQ model utilizes BERT to represent terms with various contextual embedding architectures, and introduces a centrality-weighting layer that integrates document-level context into BERT. This layer leverages document embeddings to influence the importance of each term based on its relevance to the entire document. Finally, a linear classifier layer is employed to model the dependencies between the outputs, thereby enhancing the accuracy of the CenBERT-SEQ model. The proposed CenBERT-SEQ model was evaluated against the standard BERT base-uncased model using three Computer Science article datasets, namely, SemEval-2010, WWW, and KDD. The experimental results show that, although the CenBERT-SEQ and BERT-base models achieved higher and close comparable accuracy, the proposed CenBERT-SEQ model achieved higher precision, recall, and F1-score than the BERT-base model. Furthermore, a comparison of the proposed CenBERT-SEQ model to that of related studies revealed that the proposed CenBERT-SEQ model achieved a higher accuracy, precision, recall, and F1-score of 95%, 97%, 91%, and 94%, respectively, than related studies, showing the superior capabilities of the CenBERT-SEQ model in keyphrase extraction from scientific documents.
Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
►▼
Show Figures
Figure 1
Open AccessArticle
Suspension Parameter Estimation Method for Heavy-Duty Freight Trains Based on Deep Learning
by
Changfan Zhang, Yuxuan Wang and Jing He
Big Data Cogn. Comput. 2024, 8(12), 181; https://doi.org/10.3390/bdcc8120181 - 4 Dec 2024
Abstract
The suspension parameters of heavy-duty freight trains can deviate from their initial design values due to material aging and performance degradation. While traditional multibody dynamics simulation models are usually designed for fixed working conditions, it is difficult for them to adequately analyze the
[...] Read more.
The suspension parameters of heavy-duty freight trains can deviate from their initial design values due to material aging and performance degradation. While traditional multibody dynamics simulation models are usually designed for fixed working conditions, it is difficult for them to adequately analyze the safety status of the vehicle–line system in actual operation. To address this issue, this research provides a suspension parameter estimation technique based on CNN-GRU. Firstly, a prototype C80 train was utilized to build a simulation model for multibody dynamics. Secondly, six key suspension parameters for wheel–rail force were selected using the Sobol global sensitivity analysis method. Then, a CNN-GRU proxy model was constructed, with the actually measured wheel–rail forces as a reference. By combining this approach with NSGA-II (Non-dominated Sorting Genetic Algorithm II), the key suspension parameters were calculated. Finally, the estimated parameter values were applied into the vehicle–line coupled multibody dynamical model and validated. The results show that, with the corrected dynamical model, the relative errors of the simulated wheel–rail force are reduced from 9.28%, 6.24% and 18.11% to 7%, 4.52% and 10.44%, corresponding to straight, curve, and long and steep uphill conditions, respectively. The wheel–rail force simulation’s precision is increased, indicating that the proposed method is effective in estimating the suspension parameters for heavy-duty freight trains.
Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
►▼
Show Figures
Figure 1
Open AccessArticle
Patient Satisfaction with the Mawiidi Hospital Appointment Scheduling Application: Insights from the Information Systems Success Model and Technology Acceptance Model in a Moroccan Healthcare Setting
by
Abdelaziz Ouajdouni, Khalid Chafik, Soukaina Allioui and Mourad Jbene
Big Data Cogn. Comput. 2024, 8(12), 180; https://doi.org/10.3390/bdcc8120180 - 3 Dec 2024
Abstract
►▼
Show Figures
This article aims to find the determinants that affect patient satisfaction regarding the Mawiidi public portal in Moroccan public hospitals and assess its outpatient online booking system effectiveness using a model that integrates the Technology Acceptance Model (TAM) with the Information Systems Success
[...] Read more.
This article aims to find the determinants that affect patient satisfaction regarding the Mawiidi public portal in Moroccan public hospitals and assess its outpatient online booking system effectiveness using a model that integrates the Technology Acceptance Model (TAM) with the Information Systems Success Model (ISSM) while adopting a quantitative research methodology. The analysis was conducted using 348 self-administered questionnaires to analyze eight key constructs, such as information quality, patient satisfaction, perceived ease of use, and privacy protection, among others. The results of PLS-SEM verified six out of eleven hypotheses tested, which reflected that information quality has a positive influence on perceived ease of use, which again enhances patient satisfaction. The major factors influencing the satisfaction and trust of patients in online appointment scheduling systems at public hospitals are highlighted. Indeed, privacy protection enhances patient satisfaction and trust. Service quality positively affects satisfaction but to a lesser degree. Website-related anxiety impacts perceived ease of use, although it has a limited influence on satisfaction. Such findings can inform suggestions for the managers of hospitals and portal designers to increase user satisfaction. This study uses a model from the TAM and ISSM frameworks, including cultural and socioeconomic aspects that apply to Morocco’s healthcare context.
Full article
Figure 1
Open AccessArticle
Exploring Named Entity Recognition via MacBERT-BiGRU and Global Pointer with Self-Attention
by
Chengzhe Yuan, Feiyi Tang, Chun Shan, Weiqiang Shen, Ronghua Lin, Chengjie Mao and Junxian Li
Big Data Cogn. Comput. 2024, 8(12), 179; https://doi.org/10.3390/bdcc8120179 - 3 Dec 2024
Abstract
Named Entity Recognition (NER) is a fundamental task in natural language processing that aims to identify and categorize named entities within unstructured text. In recent years, with the development of deep learning techniques, pre-trained language models have been widely used in NER tasks.
[...] Read more.
Named Entity Recognition (NER) is a fundamental task in natural language processing that aims to identify and categorize named entities within unstructured text. In recent years, with the development of deep learning techniques, pre-trained language models have been widely used in NER tasks. However, these models still face limitations in terms of their scalability and adaptability, especially when dealing with complex linguistic phenomena such as nested entities and long-range dependencies. To address these challenges, we propose the MacBERT-BiGRU-Self Attention-Global Pointer (MB-GAP) model, which integrates MacBERT for deep semantic understanding, BiGRU for rich contextual information, self-attention for focusing on relevant parts of the input, and a global pointer mechanism for precise entity boundary detection. By optimizing the number of attention heads and global pointer heads, our model achieves an effective balance between complexity and performance. Extensive experiments on benchmark datasets, including ResumeNER, CLUENER2020, and SCHOLAT-School, demonstrate significant improvements over baseline models.
Full article
(This article belongs to the Special Issue Research Progress in Artificial Intelligence and Social Network Analysis)
►▼
Show Figures
Figure 1
Open AccessArticle
A Multimodal Machine Learning Model in Pneumonia Patients Hospital Length of Stay Prediction
by
Anna Annunziata, Salvatore Cappabianca, Salvatore Capuozzo, Nicola Coppola, Camilla Di Somma, Ludovico Docimo, Giuseppe Fiorentino, Michela Gravina, Lidia Marassi, Stefano Marrone, Domenico Parmeggiani, Giorgio Emanuele Polistina, Alfonso Reginelli, Caterina Sagnelli and Carlo Sansone
Big Data Cogn. Comput. 2024, 8(12), 178; https://doi.org/10.3390/bdcc8120178 - 3 Dec 2024
Abstract
Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a
[...] Read more.
Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a classification task, predicting broad ranges of hospital days, an exact day-based regression model is often crucial for precise planning. Additionally, available data are typically limited and heterogeneous, often collected from a small patient cohort. To address these challenges, we present a novel multimodal ML framework that combines imaging and clinical data to enhance LOS prediction accuracy. Specifically, our approach uses the following: (i) feature extraction from chest CT scans via a convolutional neural network (CNN), (ii) their integration with clinically relevant tabular data from patient exams, refined through a feature selection system to retain only significant predictors. As a case study, we applied this framework to pneumonia patient data collected during the COVID-19 pandemic at two hospitals in Naples, Italy—one specializing in infectious diseases and the other general-purpose. Under our experimental setup, the proposed system achieved an average prediction error of only three days, demonstrating its potential to improve patient flow management in critical care environments.
Full article
(This article belongs to the Special Issue Application of Deep Learning and Convolution Neural Networks for Social Healthcare)
►▼
Show Figures
Figure 1
Open AccessReview
Application of Task Allocation Algorithms in Multi-UAV Intelligent Transportation Systems: A Critical Review
by
Marco Rinaldi, Sheng Wang, Renan Sanches Geronel and Stefano Primatesta
Big Data Cogn. Comput. 2024, 8(12), 177; https://doi.org/10.3390/bdcc8120177 - 2 Dec 2024
Abstract
Unmanned aerial vehicles (UAVs), commonly known as drones, are being seen as the most promising type of autonomous vehicles in the context of intelligent transportation system (ITS) technology. A key enabling factor for the current development of ITS technology based on autonomous vehicles
[...] Read more.
Unmanned aerial vehicles (UAVs), commonly known as drones, are being seen as the most promising type of autonomous vehicles in the context of intelligent transportation system (ITS) technology. A key enabling factor for the current development of ITS technology based on autonomous vehicles is the task allocation architecture. This approach allows tasks to be efficiently assigned to robots of a multi-agent system, taking into account both the robots’ capabilities and service requirements. Consequently, this study provides an overview of the application of drones in ITSs, focusing on the applications of task allocation algorithms for UAV networks. Currently, there are different types of algorithms that are employed for task allocation in drone-based intelligent transportation systems, including market-based approaches, game-theory-based algorithms, optimization-based algorithms, machine learning techniques, and other hybrid methodologies. This paper offers a comprehensive literature review of how such approaches are being utilized to optimize the allocation of tasks in UAV-based ITSs. The main characteristics, constraints, and limitations are detailed to highlight their advantages, current achievements, and applicability to different types of UAV-based ITSs. Current research trends in this field as well as gaps in the literature are also thoughtfully discussed.
Full article
(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)
Open AccessArticle
PSR-LeafNet: A Deep Learning Framework for Identifying Medicinal Plant Leaves Using Support Vector Machines
by
Praveen Kumar Sekharamantry, Marada Srinivasa Rao, Yarramalle Srinivas and Archana Uriti
Big Data Cogn. Comput. 2024, 8(12), 176; https://doi.org/10.3390/bdcc8120176 - 1 Dec 2024
Abstract
In computer vision, recognizing plant pictures has emerged as a multidisciplinary area of interest. In the last several years, much research has been conducted to determine the type of plant in each image automatically. The challenges in identifying the medicinal plants are due
[...] Read more.
In computer vision, recognizing plant pictures has emerged as a multidisciplinary area of interest. In the last several years, much research has been conducted to determine the type of plant in each image automatically. The challenges in identifying the medicinal plants are due to the changes in the effects of image light, stance, and orientation. Further, it is difficult to identify the medicinal plants due to factors like variations in leaf shape with age and changing leaf color in response to varying weather conditions. The proposed work uses machine learning techniques and deep neural networks to choose appropriate leaf features to determine if the leaf is a medicinal or non-medicinal plant. This study presents a neural network design based on PSR-LeafNet (PSR-LN). PSR-LeafNet is a single network that combines the P-Net, S-Net, and R-Net, all intended for leaf feature extraction using the minimum redundancy maximum relevance (MRMR) approach. The PSR-LN helps obtain the shape features, color features, venation of the leaf, and textural features. A support vector machine (SVM) is applied to the output achieved from the PSR network, which helps classify the name of the plant. The model design is named PSR-LN-SVM. The advantage of the designed model is that it suits more considerable dataset processing and provides better results than traditional neural network models. The methodology utilized in the work achieves an accuracy of 97.12% for the MalayaKew dataset, 98.10% for the IMP dataset, and 95.88% for the Flavia dataset. The proposed models surpass all the existing models, having an improvement in accuracy. These outcomes demonstrate that the suggested method is successful in accurately recognizing the leaves of medicinal plants, paving the way for more advanced uses in plant taxonomy and medicine.
Full article
(This article belongs to the Special Issue Emerging Trends and Applications of Big Data in Robotic Systems)
►▼
Show Figures
Graphical abstract
Open AccessArticle
Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models
by
Kewei Wu, Yiran Wang, Xiaogang He, Jinyu Yan, Yang Guo, Zhuqing Jiang, Xing Zhang, Wei Wang, Yongping Xiong, Aidong Men and Li Xiao
Big Data Cogn. Comput. 2024, 8(12), 175; https://doi.org/10.3390/bdcc8120175 - 29 Nov 2024
Abstract
Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to
[...] Read more.
Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to a certain extent, but they still have a gap in detection accuracy compared with closed-set detection models and cannot meet the requirements for high-precision detection in practical applications. In addition, existing detection technologies are also insufficient in interpretability, making it difficult to clearly show users the basis and process of judgment of detection results, causing users to have doubts about the trust and application of detection results. Based on the above deficiencies, we propose a new object detection algorithm based on multi-modal large language models that significantly improves the detection effect of closed-set object detection models for more difficult boundary tasks while ensuring detection accuracy, thereby achieving a semi-open set object detection algorithm. It has significant improvements in accuracy and interpretability under the verification of seven common traffic and safety production scenarios.
Full article
(This article belongs to the Special Issue Big Data Analytics and Edge Computing: Recent Trends and Future)
►▼
Show Figures
Figure 1
Open AccessReview
Exploring IoT and Blockchain: A Comprehensive Survey on Security, Integration Strategies, Applications and Future Research Directions
by
Muath A. Obaidat, Majdi Rawashdeh, Mohammad Alja’afreh, Meryem Abouali, Kutub Thakur and Ali Karime
Big Data Cogn. Comput. 2024, 8(12), 174; https://doi.org/10.3390/bdcc8120174 - 28 Nov 2024
Abstract
The rise of the Internet of Things (IoT) has driven significant advancements across sectors such as urbanization, manufacturing, and healthcare, all of which are focused on enhancing quality of life and stimulating the global economy. This survey offers an in-depth analysis of the
[...] Read more.
The rise of the Internet of Things (IoT) has driven significant advancements across sectors such as urbanization, manufacturing, and healthcare, all of which are focused on enhancing quality of life and stimulating the global economy. This survey offers an in-depth analysis of the integration of blockchain technology with IoT, addressing aspects such as architectural alignment, applications, security, limitations, scalability, and latency. Moreover, this survey focuses on security, integration techniques, and future research directions. The primary contributions of this review include a taxonomy of security concerns specific to IoT, an analysis of integration methods, and insights into consensus mechanisms suitable for resource-constrained environments. These findings highlight the unique challenges and opportunities in IoT–blockchain integration, providing a foundation for advancing secure and scalable IoT applications. By exploring consensus mechanisms and resource-constrained deployments, this paper provides a framework for developing secure and efficient IoT applications utilizing blockchain technology and providing a basis for future research and practical applications. In addition, this survey investigates innovative trends, including AI-driven blockchain for IoT.
Full article
(This article belongs to the Special Issue Big Data and Internet of Things in Smart Cities)
►▼
Show Figures
Figure 1
Open AccessArticle
Personality Traits Estimation Based on Job Interview Video Analysis: Importance of Human Nonverbal Cues Detection
by
Kenan Kassab and Alexey Kashevnik
Big Data Cogn. Comput. 2024, 8(12), 173; https://doi.org/10.3390/bdcc8120173 - 28 Nov 2024
Abstract
►▼
Show Figures
In this research, we delve into the analysis of non-verbal cues and their impact on evaluating job performance estimation and hireability by analyzing video interviews. We study a variety of non-verbal cues, which can be extracted from video interviews and can provide a
[...] Read more.
In this research, we delve into the analysis of non-verbal cues and their impact on evaluating job performance estimation and hireability by analyzing video interviews. We study a variety of non-verbal cues, which can be extracted from video interviews and can provide a framework that utilizes the extracted features, and we combine them with personality traits to estimate sales abilities. Experimenting on the (Human Face Video Dataset for Personality Traits Detection) VPTD dataset, we proved the importance of smiling as a valid indicator for estimating extraversion and sales abilities. We also examined the role of head movements (represented by the rotation angles, roll, pitch, and yaw) since they play a crucial role in evaluating personality traits in general and extraversion and neuroticism in particular. The testing results show how these non-verbal cues can be used as assisting features in the proposed approach to provide a valid, reliable, and accurate estimation of sales abilities and job performance.
Full article
Figure 1
Open AccessArticle
Machine Learning-Driven Dynamic Traffic Steering in 6G: A Novel Path Selection Scheme
by
Hibatul Azizi Hisyam Ng and Toktam Mahmoodi
Big Data Cogn. Comput. 2024, 8(12), 172; https://doi.org/10.3390/bdcc8120172 - 27 Nov 2024
Abstract
►▼
Show Figures
Machine learning is taking on a significant role in materializing a new vision of 6G. 6G aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the
[...] Read more.
Machine learning is taking on a significant role in materializing a new vision of 6G. 6G aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the optimal candidates to support and deliver these aspirations. Traffic steering functions encompass many opportunities to help enable new use cases and improve overall performance. The emergence and advancement of the non-terrestrial network is another driving factor for creating an intelligence selection scheme to have a dynamic traffic steering function. With service-based architecture, 5G and 6G are data-driven architectures that use massive transactional data to emerge a new approach to handling highly complex processes. A highly complex process, a massive volume of data, and a short timeframe require a scheme using machine learning techniques to resolve the challenges. In this paper, the study creates a scheme to use the massive historical data and provide a decision scheme that enables dynamic traffic steering functions addressing the future emergence of the heterogeneous transport network and aligns with the Open Radio Access Network (O-RAN). The proposed scheme in this paper gives an inference to be programmed in the telecommunication nodes. It provides a novel scheme to enable dynamic traffic steering functions for the 6G transport network. The study shows an appropriate data size to create a high-performance multi-output classification model that produces more than 90% accuracy for traffic steering functions.
Full article
Figure 1
Open AccessArticle
Behavioral Analysis of Android Riskware Families Using Clustering and Explainable Machine Learning
by
Mohammed M. Alani and Moatsum Alawida
Big Data Cogn. Comput. 2024, 8(12), 171; https://doi.org/10.3390/bdcc8120171 - 26 Nov 2024
Abstract
►▼
Show Figures
The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for
[...] Read more.
The Android operating system has become increasingly popular, not only on mobile phones but also in various other platforms such as Internet-of-Things devices, tablet computers, and wearable devices. Due to its open-source nature and significant market share, Android poses an attractive target for malicious actors. One of the notable security challenges associated with this operating system is riskware. Riskware refers to applications that may pose a security threat due to their vulnerability and potential for misuse. Although riskware constitutes a considerable portion of Android’s ecosystem malware, it has not been studied as extensively as other types of malware such as ransomware and trojans. In this study, we employ machine learning techniques to analyze the behavior of different riskware families and identify similarities in their actions. Furthermore, our research identifies specific behaviors that can be used to distinguish these riskware families. To achieve these insights, we utilize various tools such as k-Means clustering, principal component analysis, extreme gradient boost classifiers, and Shapley additive explanation. Our findings can contribute significantly to the detection, identification, and forensic analysis of Android riskware.
Full article
Figure 1
Open AccessArticle
Investigating Offensive Language Detection in a Low-Resource Setting with a Robustness Perspective
by
Israe Abdellaoui, Anass Ibrahimi, Mohamed Amine El Bouni, Asmaa Mourhir, Saad Driouech and Mohamed Aghzal
Big Data Cogn. Comput. 2024, 8(12), 170; https://doi.org/10.3390/bdcc8120170 - 25 Nov 2024
Abstract
Moroccan Darija, a dialect of Arabic, presents unique challenges for natural language processing due to its lack of standardized orthographies, frequent code switching, and status as a low-resource language. In this work, we focus on detecting offensive language in Darija, addressing these complexities.
[...] Read more.
Moroccan Darija, a dialect of Arabic, presents unique challenges for natural language processing due to its lack of standardized orthographies, frequent code switching, and status as a low-resource language. In this work, we focus on detecting offensive language in Darija, addressing these complexities. We present three key contributions that advance the field. First, we introduce a human-labeled dataset of Darija text collected from social media platforms. Second, we explore and fine-tune various language models on the created dataset. This investigation identifies a Darija RoBERTa-based model as the most effective approach, with an accuracy of 90% and F1 score of 85%. Third, we evaluate the best model beyond accuracy by assessing properties such as correctness, robustness and fairness using metamorphic testing and adversarial attacks. The results highlight potential vulnerabilities in the model’s robustness, with the model being susceptible to attacks such as inserting dots (29.4% success rate), inserting spaces (24.5%), and modifying characters in words (18.3%). Fairness assessments show that while the model is generally fair, it still exhibits bias in specific cases, with a 7% success rate for attacks targeting entities typically subject to discrimination. The key finding is that relying solely on offline metrics such as the F1 score and accuracy in evaluating machine learning systems is insufficient. For low-resource languages, the recommendation is to focus on identifying and addressing domain-specific biases and enhancing pre-trained monolingual language models with diverse and noisier data to improve their robustness and generalization capabilities in diverse linguistic scenarios.
Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
►▼
Show Figures
Figure 1
Open AccessArticle
Electroencephalography-Based Motor Imagery Classification Using Multi-Scale Feature Fusion and Adaptive Lasso
by
Shimiao Chen, Nan Li, Xiangzeng Kong, Dong Huang and Tingting Zhang
Big Data Cogn. Comput. 2024, 8(12), 169; https://doi.org/10.3390/bdcc8120169 - 25 Nov 2024
Abstract
Brain–computer interfaces, where motor imagery electroencephalography (EEG) signals are transformed into control commands, offer a promising solution for enhancing the standard of living for disabled individuals. However, the performance of EEG classification has been limited in most studies due to a lack of
[...] Read more.
Brain–computer interfaces, where motor imagery electroencephalography (EEG) signals are transformed into control commands, offer a promising solution for enhancing the standard of living for disabled individuals. However, the performance of EEG classification has been limited in most studies due to a lack of attention to the complementary information inherent at different temporal scales. Additionally, significant inter-subject variability in sensitivity to biological motion poses another critical challenge in achieving accurate EEG classification in a subject-dependent manner. To address these challenges, we propose a novel machine learning framework combining multi-scale feature fusion, which captures global and local spatial information from different-sized EEG segmentations, and adaptive Lasso-based feature selection, a mechanism for adaptively retaining informative subject-dependent features and discarding irrelevant ones. Experimental results on multiple public benchmark datasets revealed substantial improvements in EEG classification, achieving rates of 81.36%, 75.90%, and 68.30% for the BCIC-IV-2a, SMR-BCI, and OpenBMI datasets, respectively. These results not only surpassed existing methodologies but also underscored the effectiveness of our approach in overcoming specific challenges in EEG classification. Ablation studies further confirmed the efficacy of both the multi-scale feature analysis and adaptive selection mechanisms. This framework marks a significant advancement in the decoding of motor imagery EEG signals, positioning it for practical applications in real-world BCIs.
Full article
(This article belongs to the Special Issue Machine Learning Methodologies and Applications in Cybersecurity Data Analysis)
►▼
Show Figures
Figure 1
Open AccessReview
The Need for Standards in Evaluating the Quality of Electronic Health Records and Dental Records: A Narrative Review
by
Varadraj P. Gurupur, Giang Vu, Veena Mayya and Christian King
Big Data Cogn. Comput. 2024, 8(12), 168; https://doi.org/10.3390/bdcc8120168 - 25 Nov 2024
Abstract
Over the past two decades, there has been an enormous growth in the utilization of electronic health records (EHRs). However, the adoption and use of EHRs vary widely across countries, healthcare systems, and individual facilities. This variance poses several challenges for seamless communication
[...] Read more.
Over the past two decades, there has been an enormous growth in the utilization of electronic health records (EHRs). However, the adoption and use of EHRs vary widely across countries, healthcare systems, and individual facilities. This variance poses several challenges for seamless communication between systems, leading to unintended consequences. In this article, we outline the primary factors and issues arising from the absence of standards in EHRs and dental record implementation, underscoring the need for global standards in this area. We delve into various scenarios and concepts that emphasize the necessity of global standards for healthcare systems. Additionally, we explore the adverse outcomes stemming from the absence of standards, as well as the missed opportunities within the healthcare ecosystem. Our discussions provide key insights on the impacts of the lack of standardization.
Full article
(This article belongs to the Special Issue Applied Data Science for Social Good)
►▼
Show Figures
Figure 1
Open AccessArticle
Aspect-Based Sentiment Analysis of Patient Feedback Using Large Language Models
by
Omer S. Alkhnbashi, Rasheed Mohammad and Mohammad Hammoudeh
Big Data Cogn. Comput. 2024, 8(12), 167; https://doi.org/10.3390/bdcc8120167 - 21 Nov 2024
Abstract
►▼
Show Figures
Online medical forums have emerged as vital platforms for patients to share their experiences and seek advice, providing a valuable, cost-effective source of feedback for medical service management. This feedback not only measures patient satisfaction and improves health service quality but also offers
[...] Read more.
Online medical forums have emerged as vital platforms for patients to share their experiences and seek advice, providing a valuable, cost-effective source of feedback for medical service management. This feedback not only measures patient satisfaction and improves health service quality but also offers crucial insights into the effectiveness of medical treatments, pain management strategies, and alternative therapies. This study systematically identifies and categorizes key aspects of patient experiences, emphasizing both positive and negative sentiments expressed in their narratives. We collected a dataset of approximately 15,000 entries from various sections of the widely used medical forum, patient.info. Our innovative approach integrates content analysis with aspect-based sentiment analysis, deep learning techniques, and a large language model (LLM) to analyze these data. Our methodology is designed to uncover a wide range of aspect types reflected in patient feedback. The analysis revealed seven distinct aspect types prevalent in the feedback, demonstrating that deep learning models can effectively predict these aspect types and their corresponding sentiment values. Notably, the LLM with few-shot learning outperformed other models. Our findings enhance the understanding of patient experiences in online forums and underscore the utility of advanced analytical techniques in extracting meaningful insights from unstructured patient feedback, offering valuable implications for healthcare providers and medical service management.
Full article
Figure 1
Open AccessArticle
Sentiment Analysis Using Amazon Web Services and Microsoft Azure
by
Sergiu C. Ivan, Robert Ş. Győrödi and Cornelia A. Győrödi
Big Data Cogn. Comput. 2024, 8(12), 166; https://doi.org/10.3390/bdcc8120166 - 21 Nov 2024
Abstract
Recently, more and more companies are using machine learning platforms offered by cloud service providers to build sentiment analysis models that can then be used to analyze public opinions via social media. This paper aims to conduct a comparative analysis of two of
[...] Read more.
Recently, more and more companies are using machine learning platforms offered by cloud service providers to build sentiment analysis models that can then be used to analyze public opinions via social media. This paper aims to conduct a comparative analysis of two of the most popular cloud computing platforms, namely Amazon Web Services (AWS) and Microsoft Azure, in terms of their sentiment detection services through the complex analysis of multiple texts. The comparative analysis was carried out by implementing an application that integrates both the sentiment analysis (SA) solutions provided by Amazon Web Services and those offered by Microsoft Azure. To evaluate the services offered by the two platforms, different evaluation metrics were analyzed and compared, such as accuracy, precision, recall, and other relevant characteristics. Also, the paper examines the costs and limitations of the two platforms, Amazon Comprehend and Azure AI Language Text, when they are used to implement solutions for analyzing the sentiments of product reviews. The results obtained highlighted the advantages and disadvantages between the two platforms from several perspectives, such as performance, the quality of the answers provided, or their accuracy. All these aspects help to obtain a clear picture of the advantages and limitations of each service offered by the two cloud platforms.
Full article
(This article belongs to the Special Issue Knowledge Graphs in the Big Data Era: Navigating the Confluence of Distribution, Visualization, and Advanced Computational Models)
►▼
Show Figures
Figure 1
Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
BDCC, Digital, Information, Mathematics, Systems
Data-Driven Group Decision-Making
Topic Editors: Shaojian Qu, Ying Ji, M. Faisal NadeemDeadline: 31 December 2024
Topic in
BDCC, Data, Environments, Geosciences, Remote Sensing
Database, Mechanism and Risk Assessment of Slope Geologic Hazards
Topic Editors: Chong Xu, Yingying Tian, Xiaoyi Shao, Zikang Xiao, Yulong CuiDeadline: 28 February 2025
Topic in
Applied Sciences, BDCC, Future Internet, Information, Sci
Social Computing and Social Network Analysis
Topic Editors: Carson K. Leung, Fei Hao, Giancarlo Fortino, Xiaokang ZhouDeadline: 30 June 2025
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 25 July 2025
Conferences
Special Issues
Special Issue in
BDCC
Application of Cloud Computing in Industrial Internet of Things
Guest Editors: Muhammad Kazim, Mujeeb Ur Rehman, Stefan KuhnDeadline: 31 December 2024
Special Issue in
BDCC
Security, Privacy, and Trust in Artificial Intelligence Applications
Guest Editor: Giuseppe Maria Luigi SarnèDeadline: 31 December 2024
Special Issue in
BDCC
Natural Language Processing Applications in Big Data
Guest Editors: Xingyi Song, Ye Jiang, Yunfei LongDeadline: 31 December 2024
Special Issue in
BDCC
Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology
Guest Editors: Hossein Hassani, Steve MacFeelyDeadline: 31 December 2024