Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Management Information Systems)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18 days after submission; acceptance to publication is undertaken in 4.5 days (median values for papers published in this journal in the first half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
3.7 (2023)
Latest Articles
Exploiting Content Characteristics for Explainable Detection of Fake News
Big Data Cogn. Comput. 2024, 8(10), 129; https://doi.org/10.3390/bdcc8100129 (registering DOI) - 4 Oct 2024
Abstract
►
Show Figures
The proliferation of fake news threatens the integrity of information ecosystems, creating a pressing need for effective and interpretable detection mechanisms. Recent advances in machine learning, particularly with transformer-based models, offer promising solutions due to their superior ability to analyze complex language patterns.
[...] Read more.
The proliferation of fake news threatens the integrity of information ecosystems, creating a pressing need for effective and interpretable detection mechanisms. Recent advances in machine learning, particularly with transformer-based models, offer promising solutions due to their superior ability to analyze complex language patterns. However, the practical implementation of these solutions often presents challenges due to their high computational costs and limited interpretability. In this work, we explore using content-based features to enhance the explainability and effectiveness of fake news detection. We propose a comprehensive feature framework encompassing characteristics related to linguistic, affective, cognitive, social, and contextual processes. This framework is evaluated across several public English datasets to identify key differences between fake and legitimate news. We assess the detection performance of these features using various traditional classifiers, including single and ensemble methods and analyze how feature reduction affects classifier performance. Our results show that, while traditional classifiers may not fully match transformer-based models, they achieve competitive results with significantly lower computational requirements. We also provide an interpretability analysis highlighting the most influential features in classification decisions. This study demonstrates the potential of interpretable features to build efficient, explainable, and accessible fake news detection systems.
Full article
Open AccessArticle
Factors Affecting Single and Multivehicle Motorcycle Crashes: Insights from Day and Night Analysis Using XGBoost-SHAP Algorithm
by
Panuwat Wisutwattanasak, Chamroeun Se, Thanapong Champahom, Rattanaporn Kasemsri, Sajjakaj Jomnonkwao and Vatanavongs Ratanavaraha
Big Data Cogn. Comput. 2024, 8(10), 128; https://doi.org/10.3390/bdcc8100128 - 3 Oct 2024
Abstract
This study aimed to identify and compare the risk factors associated with motorcycle crash severity during both daytime and nighttime, for single and multivehicle incidents in Thailand using 2021–2024 data. The research employed the XGBoost (Extreme Gradient Boosting) method for statistical analysis and
[...] Read more.
This study aimed to identify and compare the risk factors associated with motorcycle crash severity during both daytime and nighttime, for single and multivehicle incidents in Thailand using 2021–2024 data. The research employed the XGBoost (Extreme Gradient Boosting) method for statistical analysis and extensively examined the temporal instability of risk factors. The results highlight the importance of features impacting the injury severity of roadway collisions across various conditions. For single motorcycle crashes, the key risk factors included speeding, early morning incidents, off-road events, and long holidays. In multivehicle crashes, rear-end collisions, interactions with large vehicles, and collisions involving other motorcycles or passenger cars were linked to increased injury severity. The findings indicate that the important factors associated with motorcyclist injury severity in roadway crashes vary depending on the type of crash and time of day. These insights are valuable for policymakers and relevant authorities in developing targeted interventions to enhance road safety and mitigate the incidence of severe and fatal motorcycle crashes.
Full article
(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)
Open AccessReview
Classification and Recognition of Lung Sounds Using Artificial Intelligence and Machine Learning: A Literature Review
by
Xiaoran Xu and Ravi Sankar
Big Data Cogn. Comput. 2024, 8(10), 127; https://doi.org/10.3390/bdcc8100127 - 1 Oct 2024
Abstract
►▼
Show Figures
This review explores the latest advances in artificial intelligence (AI) and machine learning (ML) for the identification and classification of lung sounds. The article provides a historical overview from the invention of the electronic stethoscope to the auscultation of lung sounds, emphasizing the
[...] Read more.
This review explores the latest advances in artificial intelligence (AI) and machine learning (ML) for the identification and classification of lung sounds. The article provides a historical overview from the invention of the electronic stethoscope to the auscultation of lung sounds, emphasizing the importance of the rapid diagnosis of lung diseases in the post-COVID-19 era. The review classifies lung sounds, including wheezes and stridors, and explores their pathological relevance. In addition, the article deeply explores feature extraction strategies, measurement methods, and multiple advanced machine learning models for classification, such as deep residual networks (ResNets), convolutional neural networks combined with long short-term memory networks (CNN–LSTM), and transformer models (transformer). The article discusses the problems of insufficient data and replicating human expert experience and proposes future research directions, including improved data utilization, enhanced feature extraction, and classification using spectrograms. Finally, the article emphasizes the expanding role of AI and ML in lung sound diagnosis and their potential for further development in this field.
Full article
Figure 1
Open AccessArticle
Estimating Rainfall Intensity Using an Image-Based Convolutional Neural Network Inversion Technique for Potential Crowdsourcing Applications in Urban Areas
by
Youssef Shalaby, Mohammed I. I. Alkhatib, Amin Talei, Tak Kwin Chang, Ming Fai Chow and Valentijn R. N. Pauwels
Big Data Cogn. Comput. 2024, 8(10), 126; https://doi.org/10.3390/bdcc8100126 - 29 Sep 2024
Abstract
►▼
Show Figures
High-quality rainfall data are essential in many water management problems, including stormwater management, water resources management, and more. Due to the high spatial–temporal variations, rainfall measurement could be challenging and costly, especially in urban areas. This could be even more challenging in tropical
[...] Read more.
High-quality rainfall data are essential in many water management problems, including stormwater management, water resources management, and more. Due to the high spatial–temporal variations, rainfall measurement could be challenging and costly, especially in urban areas. This could be even more challenging in tropical regions with their typical short-duration and high-intensity rainfall events, as some of the undeveloped or developing countries in those regions lack a dense rain gauge network and have limited resources to use radar and satellite readings. Thus, exploring alternative rainfall estimation methods could be helpful to back up some shortcomings. Recently, a few studies have examined the utilisation of citizen science methods to collect rainfall data as a complement to the existing rain gauge networks. However, these attempts are in the early stages, and limited works have been published on improving the quality of such data. Therefore, this study focuses on image-based rainfall estimation with potential usage in citizen science. For this, a novel convolutional neural network (CNN) model is developed to predict rainfall intensity by processing the images captured by citizens (e.g., by smartphones or security cameras) in an urban area. The developed model is merely a complementary sensing tool (e.g., better spatial coverage) to the existing rain gauge network in an urban area and is not meant to replace it. This study also presents one of the most extensive datasets of rain image data ever published in the literature. The estimated rainfall data by the proposed CNN model of this study using images captured by surveillance cameras and smartphone cameras are compared with observed rainfall by a weather station and exhibit strong R2 values of 0.955 and 0.840, respectively.
Full article
Figure 1
Open AccessArticle
An Improved Deep Learning Framework for Multimodal Medical Data Analysis
by
Sachin Kumar and Shivani Sharma
Big Data Cogn. Comput. 2024, 8(10), 125; https://doi.org/10.3390/bdcc8100125 - 29 Sep 2024
Abstract
Lung disease is one of the leading causes of death worldwide. This emphasizes the need for early diagnosis in order to provide appropriate treatment and save lives. Physicians typically require information about patients’ clinical symptoms, various laboratory and pathology tests, along with chest
[...] Read more.
Lung disease is one of the leading causes of death worldwide. This emphasizes the need for early diagnosis in order to provide appropriate treatment and save lives. Physicians typically require information about patients’ clinical symptoms, various laboratory and pathology tests, along with chest X-rays to confirm the diagnosis of lung disease. In this study, we present a transformer-based multimodal deep learning approach that incorporates imaging and clinical data for effective lung disease diagnosis on a new multimodal medical dataset. The proposed method employs a cross-attention transformer module to merge features from the heterogeneous modalities. Then unified fused features are used for disease classification. The experiments were performed and evaluated on several classification metrics to illustrate the performance of the proposed approach. The study’s results revealed that the proposed method achieved an accuracy of 95% in terms of accurate classification of tuberculosis and outperformed other traditional fusion methods on multimodal tuberculosis data used in this study.
Full article
(This article belongs to the Special Issue Application of Deep Learning and Convolution Neural Networks for Social Healthcare)
►▼
Show Figures
Figure 1
Open AccessArticle
Does Social Media Enhance Job Performance? Examining Internal Communication and Teamwork as Mediating Mechanisms
by
Satinder Kumar, Zohour Sohbaty, Ruchika Jain, Iqra Shafi and Ramona Rupeika-Apoga
Big Data Cogn. Comput. 2024, 8(10), 124; https://doi.org/10.3390/bdcc8100124 - 27 Sep 2024
Abstract
This study investigates the impact of social media use on faculty job performance, exploring the mediating roles of internal communication and teamwork. Drawing on the Uses and Gratifications theory, we examine how faculty members utilize social media for three distinct purposes: social interaction
[...] Read more.
This study investigates the impact of social media use on faculty job performance, exploring the mediating roles of internal communication and teamwork. Drawing on the Uses and Gratifications theory, we examine how faculty members utilize social media for three distinct purposes: social interaction (social use), enjoyment (hedonic use), and information seeking (cognitive use). We analyze how these three dimensions of social media use influence teachers’ performance, encompassing both routine and innovative aspects. This analysis is based on data collected via an online survey completed by 456 faculty members at public state colleges in northern India in 2024. Structural Equation Modeling (SEM) was used to test the hypotheses. The findings reveal that social, hedonic, and cognitive use of social media positively affects faculty innovative and routine job performance, with teamwork and internal communication acting as partial mediators in this relationship. This research offers valuable insights for faculty development professionals, educational administrators, and policymakers.
Full article
(This article belongs to the Special Issue Challenges and Perspectives of Social Networks within Social Computing)
►▼
Show Figures
Figure 1
Open AccessArticle
Brain Tumor Detection Using Magnetic Resonance Imaging and Convolutional Neural Networks
by
Rafael Martínez-Del-Río-Ortega, Javier Civit-Masot, Francisco Luna-Perejón and Manuel Domínguez-Morales
Big Data Cogn. Comput. 2024, 8(9), 123; https://doi.org/10.3390/bdcc8090123 - 21 Sep 2024
Abstract
►▼
Show Figures
Early and precise detection of brain tumors is critical for improving clinical outcomes and patient quality of life. This research focused on developing an image classifier using convolutional neural networks (CNN) to detect brain tumors in magnetic resonance imaging (MRI). Brain tumors are
[...] Read more.
Early and precise detection of brain tumors is critical for improving clinical outcomes and patient quality of life. This research focused on developing an image classifier using convolutional neural networks (CNN) to detect brain tumors in magnetic resonance imaging (MRI). Brain tumors are a significant cause of morbidity and mortality worldwide, with approximately 300,000 new cases diagnosed annually. Magnetic resonance imaging (MRI) offers excellent spatial resolution and soft tissue contrast, making it indispensable for identifying brain abnormalities. However, accurate interpretation of MRI scans remains challenging, due to human subjectivity and variability in tumor appearance. This study employed CNNs, which have demonstrated exceptional performance in medical image analysis, to address these challenges. Various CNN architectures were implemented and evaluated to optimize brain tumor detection. The best model achieved an accuracy of 97.5%, sensitivity of 99.2%, and binary accuracy of 98.2%, surpassing previous studies. These results underscore the potential of deep learning techniques in clinical applications, significantly enhancing diagnostic accuracy and reliability.
Full article
Figure 1
Open AccessArticle
The Relative Importance of Key Factors for Integrating Enterprise Resource Planning (ERP) Systems and Performance Management Practices in the UAE Healthcare Sector
by
Karam Al-Assaf, Wadhah Alzahmi, Ryan Alshaikh, Zied Bahroun and Vian Ahmed
Big Data Cogn. Comput. 2024, 8(9), 122; https://doi.org/10.3390/bdcc8090122 - 13 Sep 2024
Abstract
This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency,
[...] Read more.
This study examines integrating Enterprise Resource Planning (ERP) systems with performance management (PM) practices in the UAE healthcare sector, identifying key factors for successful adoption. It addresses a critical gap by analyzing the interplay between ERP systems and PM to enhance operational efficiency, patient care, and administrative processes. A literature review identified thirty-six critical factors, refined through expert interviews to highlight nine weak integration areas and two new factors. An online survey with 81 experts, who rated the 38 factors on a five-point Likert scale, provided data to calculate the Relative Importance Index (RII). The results reveal that employee involvement in performance metrics and effective organizational measures significantly impact system effectiveness and alignment. Mid-tier factors such as leadership and managerial support are essential for integration momentum, while foundational elements like infrastructure, scalability, security, and compliance are crucial for long-term success. The study recommends a holistic approach to these factors to maximize ERP benefits, offering insights for healthcare administrators and policymakers. Additionally, it highlights the need to address the challenges, opportunities, and ethical considerations associated with using digital health technology in healthcare. Future research should explore ERP integration challenges in public and private healthcare settings, tailoring systems to specific organizational needs.
Full article
(This article belongs to the Special Issue Revolutionizing Healthcare: Exploring the Latest Advances in Digital Health Technology)
►▼
Show Figures
Figure 1
Open AccessSystematic Review
Medical IoT Record Security and Blockchain: Systematic Review of Milieu, Milestones, and Momentum
by
Simeon Okechukwu Ajakwe, Igboanusi Ikechi Saviour, Vivian Ukamaka Ihekoronye, Odinachi U. Nwankwo, Mohamed Abubakar Dini, Izuazu Urslla Uchechi, Dong-Seong Kim and Jae Min Lee
Big Data Cogn. Comput. 2024, 8(9), 121; https://doi.org/10.3390/bdcc8090121 - 12 Sep 2024
Abstract
The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security
[...] Read more.
The sensitivity and exclusivity attached to personal health records make such records a prime target for cyber intruders, as unauthorized access causes unfathomable repudiation and public defamation. In reality, most medical records are micro-managed by different healthcare providers, exposing them to various security issues, especially unauthorized third-party access. Over time, substantial progress has been made in preventing unauthorized access to this critical and highly classified information. This review investigated the mainstream security challenges associated with the transmissibility of medical records, the evolutionary security strategies for maintaining confidentiality, and the existential enablers of trustworthy and transparent authorization and authentication before data transmission can be carried out. The review adopted the PRSIMA-SPIDER methodology for a systematic review of 122 articles, comprising 9 surveys (7.37%) for qualitative analysis, 109 technical papers (89.34%), and 4 online reports (3.27%) for quantitative studies. The review outcome indicates that the sensitivity and confidentiality of a highly classified document, such as a medical record, demand unabridged authorization by the owner, unquestionable preservation by the host, untainted transparency in transmission, unbiased traceability, and ubiquitous security, which blockchain technology guarantees, although at the infancy stage. Therefore, developing blockchain-assisted frameworks for digital medical record preservation and addressing inherent technological hitches in blockchain will further accelerate transparent and trustworthy preservation, user authorization, and authentication of medical records before they are transmitted by the host for third-party access.
Full article
(This article belongs to the Special Issue Research on Privacy and Data Security)
►▼
Show Figures
Figure 1
Open AccessArticle
An Efficient Green AI Approach to Time Series Forecasting Based on Deep Learning
by
Luis Balderas, Miguel Lastra and José M. Benítez
Big Data Cogn. Comput. 2024, 8(9), 120; https://doi.org/10.3390/bdcc8090120 - 11 Sep 2024
Abstract
►▼
Show Figures
Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this
[...] Read more.
Time series forecasting is undoubtedly a key area in machine learning due to the numerous fields where it is crucial to estimate future data points of sequences based on a set of previously observed values. Deep learning has been successfully applied to this area. On the other hand, growing concerns about the steady increase in the amount of resources required by deep learning-based tools have made Green AI gain traction as a move towards making machine learning more sustainable. In this paper, we present a deep learning-based time series forecasting methodology called GreeNNTSF, which aims to reduce the size of the resulting model, thereby diminishing the associated computational and energetic costs without giving up adequate forecasting performance. The methodology, based on the ODF2NNA algorithm, produces models that outperform state-of-the-art techniques not only in terms of prediction accuracy but also in terms of computational costs and memory footprint. To prove this claim, after presenting the main state-of-the-art methods that utilize deep learning for time series forecasting and introducing our methodology we test GreeNNTSF on a selection of real-world forecasting problems that are commonly used as benchmarks, such as SARS-CoV-2 and PhysioNet (medicine), Brazilian Weather (climate), WTI and Electricity (economics), and Traffic (smart cities). The results of each experiment conducted objectively demonstrate, rigorously following the experimentation presented in the original papers that addressed these problems, that our method is more competitive than other state-of-the-art approaches, producing more accurate and efficient models.
Full article
Figure 1
Open AccessArticle
Hierarchical Progressive Image Forgery Detection and Localization Method Based on UNet
by
Yang Liu, Xiaofei Li, Jun Zhang, Shuohao Li, Shengze Hu and Jun Lei
Big Data Cogn. Comput. 2024, 8(9), 119; https://doi.org/10.3390/bdcc8090119 - 10 Sep 2024
Abstract
►▼
Show Figures
The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately
[...] Read more.
The rapid development of generative technologies has made the production of forged products easier, and AI-generated forged images are increasingly difficult to accurately detect, posing serious privacy risks and cognitive obstacles to individuals and society. Therefore, constructing an effective method that can accurately detect and locate forged regions has become an important task. This paper proposes a hierarchical and progressive forged image detection and localization method called HPUNet. This method assigns more reasonable hierarchical multi-level labels to the dataset as supervisory information at different levels, following cognitive laws. Secondly, multiple types of features are extracted from AI-generated images for detection and localization, and the detection and localization results are combined to enhance the task-relevant features. Subsequently, HPUNet expands the obtained image features into four different resolutions and performs detection and localization at different levels in a coarse-to-fine cognitive order. To address the limited feature field of view caused by inconsistent forgery sizes, we employ three sets of densely cross-connected hierarchical networks for sufficient interaction between feature images at different resolutions. Finally, a UNet network with a soft-threshold-constrained feature enhancement module is used to achieve detection and localization at different scales, and the reliance on a progressive mechanism establishes relationships between different branches. We use ACC and F1 as evaluation metrics, and extensive experiments on our method and the baseline methods demonstrate the effectiveness of our approach.
Full article
Figure 1
Open AccessArticle
DBSCAN SMOTE LSTM: Effective Strategies for Distributed Denial of Service Detection in Imbalanced Network Environments
by
Rissal Efendi, Teguh Wahyono and Indrastanti Ratna Widiasari
Big Data Cogn. Comput. 2024, 8(9), 118; https://doi.org/10.3390/bdcc8090118 - 10 Sep 2024
Abstract
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced
[...] Read more.
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced network environments. This research employed DBSCAN and SMOTE to increase the class distribution of the dataset by allowing models using LSTM to learn time anomalies effectively when DDoS attacks occur. The experiments carried out revealed significant improvement in the performance of the LSTM model when integrated with DBSCAN and SMOTE. These include validation loss results of 0.048 for LSTM DBSCAN and SMOTE and 0.1943 for LSTM without DBSCAN and SMOTE, with accuracy of 99.50 and 97.50. Apart from that, there was an increase in the F1 score from 93.4% to 98.3%. This research proved that DBSCAN and SMOTE can be used as an effective strategy to improve model performance in detecting DDoS attacks on heterogeneous networks, as well as increasing model robustness and reliability.
Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
►▼
Show Figures
Figure 1
Open AccessArticle
An End-to-End Scene Text Recognition for Bilingual Text
by
Bayan M. Albalawi, Amani T. Jamal, Lama A. Al Khuzayem and Olaa A. Alsaedi
Big Data Cogn. Comput. 2024, 8(9), 117; https://doi.org/10.3390/bdcc8090117 - 9 Sep 2024
Abstract
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily
[...] Read more.
Text localization and recognition from natural scene images has gained a lot of attention recently due to its crucial role in various applications, such as autonomous driving and intelligent navigation. However, two significant gaps exist in this area: (1) prior research has primarily focused on recognizing English text, whereas Arabic text has been underrepresented, and (2) most prior research has adopted separate approaches for scene text localization and recognition, as opposed to one integrated framework. To address these gaps, we propose a novel bilingual end-to-end approach that localizes and recognizes both Arabic and English text within a single natural scene image. Specifically, our approach utilizes pre-trained CNN models (ResNet and EfficientNetV2) with kernel representation for localization text and RNN models (LSTM and BiLSTM) with an attention mechanism for text recognition. In addition, the AraElectra Arabic language model was incorporated to enhance Arabic text recognition. Experimental results on the EvArest, ICDAR2017, and ICDAR2019 datasets demonstrated that our model not only achieves superior performance in recognizing horizontally oriented text but also in recognizing multi-oriented and curved Arabic and English text in natural scene images.
Full article
(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)
►▼
Show Figures
Figure 1
Open AccessArticle
Attention-Driven Transfer Learning Model for Improved IoT Intrusion Detection
by
Salma Abdelhamid, Islam Hegazy, Mostafa Aref and Mohamed Roushdy
Big Data Cogn. Comput. 2024, 8(9), 116; https://doi.org/10.3390/bdcc8090116 - 9 Sep 2024
Abstract
The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This
[...] Read more.
The proliferation of Internet of Things (IoT) devices has become inevitable in contemporary life, significantly affecting myriad applications. Nevertheless, the pervasive use of heterogeneous IoT gadgets introduces vulnerabilities to malicious cyber-attacks, resulting in data breaches that jeopardize the network’s integrity and resilience. This study proposes an Intrusion Detection System (IDS) for IoT environments that leverages Transfer Learning (TL) and the Convolutional Block Attention Module (CBAM). We extensively evaluate four prominent pre-trained models, each integrated with an independent CBAM at the uppermost layer. Our methodology is validated using the BoT-IoT dataset, which undergoes preprocessing to rectify the imbalanced data distribution, eliminate redundancy, and reduce dimensionality. Subsequently, the tabular dataset is transformed into RGB images to enhance the interpretation of complex patterns. Our evaluation results demonstrate that integrating TL models with the CBAM significantly improves classification accuracy and reduces false-positive rates. Additionally, to further enhance the system performance, we employ an Ensemble Learning (EL) technique to aggregate predictions from the two best-performing models. The final findings prove that our TL-CBAM-EL model achieves superior performance, attaining an accuracy of 99.93% as well as high recall, precision, and F1-score. Henceforth, the proposed IDS is a robust and efficient solution for securing IoT networks.
Full article
(This article belongs to the Special Issue Advances in Intelligent Defense Systems for the Internet of Things)
►▼
Show Figures
Figure 1
Open AccessArticle
QA-RAG: Exploring LLM Reliance on External Knowledge
by
Aigerim Mansurova, Aiganym Mansurova and Aliya Nugumanova
Big Data Cogn. Comput. 2024, 8(9), 115; https://doi.org/10.3390/bdcc8090115 - 9 Sep 2024
Abstract
Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by
[...] Read more.
Large language models (LLMs) can store factual knowledge within their parameters and have achieved superior results in question-answering tasks. However, challenges persist in providing provenance for their decisions and keeping their knowledge up to date. Some approaches aim to address these challenges by combining external knowledge with parametric memory. In contrast, our proposed QA-RAG solution relies solely on the data stored within an external knowledge base, specifically a dense vector index database. In this paper, we compare RAG configurations using two LLMs—Llama 2b and 13b—systematically examining their performance in three key RAG capabilities: noise robustness, knowledge gap detection, and external truth integration. The evaluation reveals that while our approach achieves an accuracy of 83.3%, showcasing its effectiveness across all baselines, the model still struggles significantly in terms of external truth integration. These findings suggest that considerable work is still required to fully leverage RAG in question-answering tasks.
Full article
(This article belongs to the Special Issue Generative AI and Large Language Models)
►▼
Show Figures
Figure 1
Open AccessArticle
Analysis of Highway Vehicle Lane Change Duration Based on Survival Model
by
Sheng Zhao, Shengwen Huang, Huiying Wen and Weiming Liu
Big Data Cogn. Comput. 2024, 8(9), 114; https://doi.org/10.3390/bdcc8090114 - 6 Sep 2024
Abstract
►▼
Show Figures
To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest
[...] Read more.
To investigate highway vehicle lane-changing behavior, we utilized the publicly available naturalistic driving dataset, HighD, to extract the movement data of vehicles involved in lane changes and their proximate counterparts. We employed univariate and multivariate Cox proportional hazards models alongside random survival forest models to analyze the influence of various factors on lane change duration, assess their statistical significance, and compare the performance of multiple random survival forest models. Our findings indicate that several variables significantly impact lane change duration, including the standard deviation of lane-changing vehicles, lane-changing vehicle speed, distance to the following vehicle in the target lane, lane-changing vehicle length, and distance to the following vehicle in the current lane. Notably, the standard deviation and vehicle length act as protective factors, with increases in these variables correlating with longer lane change durations. Conversely, higher lane-changing vehicle speeds and shorter distances to following vehicles in both the current and target lanes are associated with shorter lane change durations, indicating their role as risk factors. Feature variable selection did not substantially improve the training performance of the random survival forest model based on our findings. However, validation set evaluation showed that careful feature variable selection can enhance model accuracy, leading to improved AUC values. These insights lay the groundwork for advancing research in predicting lane-changing behaviors, understanding lane-changing intentions, and developing pre-emptive safety measures against hazardous lane changes.
Full article
Figure 1
Open AccessArticle
Detection of Hate Speech, Racism and Misogyny in Digital Social Networks: Colombian Case Study
by
Luis Gabriel Moreno-Sandoval, Alexandra Pomares-Quimbaya, Sergio Andres Barbosa-Sierra and Liliana Maria Pantoja-Rojas
Big Data Cogn. Comput. 2024, 8(9), 113; https://doi.org/10.3390/bdcc8090113 - 6 Sep 2024
Abstract
►▼
Show Figures
The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and
[...] Read more.
The growing popularity of social networking platforms worldwide has substantially increased the presence of offensive language on these platforms. To date, most of the systems developed to mitigate this challenge focus primarily on English content. However, this issue is a global concern, and therefore, other languages, such as Spanish, are involved. This article addresses the task of identifying hate speech, racism, and misogyny in Spanish within the Colombian context on social networks, and introduces a gold standard dataset specifically developed for this purpose. Indeed, the experiment compares the performance of TLM models from Deep Learning methods, such as BERT, Roberta, XLM, and BETO adjusted to the Colombian slang domain, then compares the best TLM model against a GPT, having a significant impact on achieving more accurate predictions in this task. Finally, this study provides a detailed understanding of the different components used in the system, including the architecture of the models and the selection of functions. The best results show that the BERT model achieves an accuracy of 83.6% for hate speech detection, while the GPT model achieves an accuracy of 90.8% for racism speech and 90.4% for misogyny detection.
Full article
Figure 1
Open AccessArticle
Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection
by
Bayode Ogunleye, Hemlata Sharma and Olamilekan Shobayo
Big Data Cogn. Comput. 2024, 8(9), 112; https://doi.org/10.3390/bdcc8090112 - 5 Sep 2024
Abstract
►▼
Show Figures
The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal
[...] Read more.
The World Health Organisation (WHO) revealed approximately 280 million people in the world suffer from depression. Yet, existing studies on early-stage depression detection using machine learning (ML) techniques are limited. Prior studies have applied a single stand-alone algorithm, which is unable to deal with data complexities, prone to overfitting, and limited in generalization. To this end, our paper examined the performance of several ML algorithms for early-stage depression detection using two benchmark social media datasets (D1 and D2). More specifically, we incorporated sentiment indicators to improve our model performance. Our experimental results showed that sentence bidirectional encoder representations from transformers (SBERT) numerical vectors fitted into the stacking ensemble model achieved comparable F1 scores of 69% in the dataset (D1) and 76% in the dataset (D2). Our findings suggest that utilizing sentiment indicators as an additional feature for depression detection yields an improved model performance, and thus, we recommend the development of a depressive term corpus for future work.
Full article
Figure 1
Open AccessArticle
A Data-Centric Approach to Understanding the 2020 U.S. Presidential Election
by
Satish Mahadevan Srinivasan and Yok-Fong Paat
Big Data Cogn. Comput. 2024, 8(9), 111; https://doi.org/10.3390/bdcc8090111 - 4 Sep 2024
Abstract
The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and
[...] Read more.
The application of analytics on Twitter feeds is a very popular field for research. A tweet with a 280-character limitation can reveal a wealth of information on how individuals express their sentiments and emotions within their network or community. Upon collecting, cleaning, and mining tweets from different individuals on a particular topic, we can capture not only the sentiments and emotions of an individual but also the sentiments and emotions expressed by a larger group. Using the well-known Lexicon-based NRC classifier, we classified nearly seven million tweets across seven battleground states in the U.S. to understand the emotions and sentiments expressed by U.S. citizens toward the 2020 presidential candidates. We used the emotions and sentiments expressed within these tweets as proxies for their votes and predicted the swing directions of each battleground state. When compared to the outcome of the 2020 presidential candidates, we were able to accurately predict the swing directions of four battleground states (Arizona, Michigan, Texas, and North Carolina), thus revealing the potential of this approach in predicting future election outcomes. The week-by-week analysis of the tweets using the NRC classifier corroborated well with the various political events that took place before the election, making it possible to understand the dynamics of the emotions and sentiments of the supporters in each camp. These research strategies and evidence-based insights may be translated into real-world settings and practical interventions to improve election outcomes.
Full article
(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)
►▼
Show Figures
Figure 1
Open AccessArticle
Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews
by
Goran Mitrov, Boris Stanoev, Sonja Gievska, Georgina Mirceva and Eftim Zdravevski
Big Data Cogn. Comput. 2024, 8(9), 110; https://doi.org/10.3390/bdcc8090110 - 4 Sep 2024
Abstract
The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting
[...] Read more.
The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work.
Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
►▼
Show Figures
Figure 1
Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
BDCC, Entropy, Information, MCA, Mathematics
New Advances in Granular Computing and Data Mining
Topic Editors: Xibei Yang, Bin Xie, Pingxin Wang, Hengrong JuDeadline: 30 October 2024
Topic in
Electronics, Applied Sciences, BDCC, Mathematics, Chips
Theory and Applications of High Performance Computing
Topic Editors: Pavel Lyakhov, Maxim DeryabinDeadline: 30 November 2024
Topic in
BDCC, Digital, Information, Mathematics, Systems
Data-Driven Group Decision-Making
Topic Editors: Shaojian Qu, Ying Ji, M. Faisal NadeemDeadline: 31 December 2024
Topic in
BDCC, Data, Environments, Geosciences, Remote Sensing
Database, Mechanism and Risk Assessment of Slope Geologic Hazards
Topic Editors: Chong Xu, Yingying Tian, Xiaoyi Shao, Zikang Xiao, Yulong CuiDeadline: 28 February 2025
Conferences
Special Issues
Special Issue in
BDCC
Brain-Inspired Hyperdimensional Computing: Theoretical Perspectives and Real-World Applications
Guest Editors: Bryan Raubenolt, Rahul Shubhra Mandal, Fabio Cumbo, Jayadev JoshiDeadline: 31 October 2024
Special Issue in
BDCC
Augmented Reality, Virtual Reality, and Computer Graphics
Guest Editor: Adrian ClarkDeadline: 31 October 2024
Special Issue in
BDCC
Semantic Web Technology and Recommender Systems 2nd Edition
Guest Editors: Konstantinos Kotis, Dimitris SpiliotopoulosDeadline: 31 October 2024
Special Issue in
BDCC
Artificial Intelligence and Natural Language Processing
Guest Editors: Tim Schlippe, Matthias WölfelDeadline: 31 October 2024