-
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
-
Enhancing Recommendation Systems with Real-Time Adaptive Learning and Multi-Domain Knowledge Graphs
-
State of the Art and Future Directions of Small Language Models: A Systematic Review
-
CNN-Based Framework for Classifying COVID-19, Pneumonia, and Normal Chest X-Rays
Journal Description
Big Data and Cognitive Computing
Big Data and Cognitive Computing
is an international, peer-reviewed, open access journal on big data and cognitive computing published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), dblp, Inspec, Ei Compendex, and other databases.
- Journal Rank: JCR - Q1 (Computer Science, Theory and Methods) / CiteScore - Q1 (Computer Science Applications)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 24.5 days after submission; acceptance to publication is undertaken in 4.6 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
4.4 (2024);
5-Year Impact Factor:
4.2 (2024)
Latest Articles
Assessing the Influence of Feedback Strategies on Errors in Crowdsourced Annotation of Tumor Images
Big Data Cogn. Comput. 2025, 9(9), 220; https://doi.org/10.3390/bdcc9090220 - 26 Aug 2025
Abstract
Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks
[...] Read more.
Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks they solve due to the plethora of various tasks available. Therefore, the tasks—even on a micro scale—also need to include appropriate training for the crowdworkers to enable them to complete them successfully. However, training crowdworkers efficiently in a short time for complex tasks poses a challenge and remains an unresolved issue. This paper addresses this challenge by empirically comparing different training strategies for crowdworkers and evaluating their impact on the crowdworkers’ task results. We perform comparisons between a basic training strategy, a strategy based on previous errors made by other crowdworkers, and the addition of instant feedback during training and task completion. Our results show that adding instant feedback during both the training phase and during the task yields more attention from the workers in difficult tasks and hence reduces errors and improves the results. We conclude that more attention is retained when the content of instant feedback includes information about mistakes made by other crowdworkers previously.
Full article
(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)
Open AccessSystematic Review
A Systematic Literature Review of Artificial Intelligence in Prehospital Emergency Care
by
Omar Elfahim, Kokou Laris Edjinedja, Johan Cossus, Mohamed Youssfi, Oussama Barakat and Thibaut Desmettre
Big Data Cogn. Comput. 2025, 9(9), 219; https://doi.org/10.3390/bdcc9090219 - 26 Aug 2025
Abstract
Background: The emergency medical services (EMS) sector, as a complex system, presents substantial hurdles in providing excellent treatment while operating within limited resources, prompting greater adoption of artificial intelligence (AI) as a tool for improving operational efficiency. While AI models have proved beneficial
[...] Read more.
Background: The emergency medical services (EMS) sector, as a complex system, presents substantial hurdles in providing excellent treatment while operating within limited resources, prompting greater adoption of artificial intelligence (AI) as a tool for improving operational efficiency. While AI models have proved beneficial in healthcare operations, there is limited explainability and interpretability, as well as a lack of data used in their application and technological advancement. Methods: The scoping review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for scoping reviews, using PubMed, IEEE Xplore, and Web of Science, with a procedure of double screening and extraction. The search included articles published from 2018 to the beginning of 2025. Studies were excluded if they did not explicitly identify an artificial intelligence (AI) component, lacked relevance to emergency department (ED) or prehospital contexts, failed to report measurable outcomes or evaluations, or did not exploit real-world data. We analyzed the data source used, clinical subclasses, AI domains, ML algorithms, their performance, as well as potential roles for large language models (LLMs) in future applications. Results: A comprehensive PRISMA-guided methodology was used to search academic databases, finding 1181 papers on prehospital emergency treatment from 2018 to 2025, with 65 articles identified after an extensive screening procedure. The results reveal a significant increase in AI publications. A notable technological advancement in the application of AI in EMS using different types of data was explored. Conclusions: These findings highlighted that AI and ML have emerged as revolutionary innovations with huge potential in the fields of healthcare and medicine. There are several promising AI interventions that can improve prehospital emergency care, particularly for out-of-hospital cardiac arrest and triage prioritization scenarios. Implications for EMS Practice: Integrating AI methods into prehospital care can optimize the use of available resources, as well as triage and dispatch efficiency. LLMs may have the potential to improve understanding and assist in decision-making under pressure in emergency situations by combining various forms of recorded data. However, there is a need to emphasize continued research and strong collaboration between AI experts and EMS physicians to ensure the safe, ethical, and effective integration of AI into EMS practice.
Full article
(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)
►▼
Show Figures

Figure 1
Open AccessArticle
Applying Additional Auxiliary Context Using Large Language Model for Metaphor Detection
by
Takuya Hayashi and Minoru Sasaki
Big Data Cogn. Comput. 2025, 9(9), 218; https://doi.org/10.3390/bdcc9090218 - 25 Aug 2025
Abstract
Metaphor detection is challenging in natural language processing (NLP) because it requires recognizing nuanced semantic shifts beyond literal meaning, and conventional models often falter when contextual cues are limited. We propose a method to enhance metaphor detection by augmenting input sentences with auxiliary
[...] Read more.
Metaphor detection is challenging in natural language processing (NLP) because it requires recognizing nuanced semantic shifts beyond literal meaning, and conventional models often falter when contextual cues are limited. We propose a method to enhance metaphor detection by augmenting input sentences with auxiliary context generated by ChatGPT. In our approach, ChatGPT produces semantically relevant sentences that are inserted before, after, or on both sides of a target sentence, allowing us to analyze the impact of context position and length on classification. Experiments on three benchmark datasets (MOH-X, VUA_All, VUA_Verb) show that this context-enriched input consistently outperforms the no-context baseline across accuracy, precision, recall, and F1-score, with the MOH-X dataset achieving the largest F1 gain. These improvements are statistically significant based on two-tailed t-tests. Our findings demonstrate that generative models can effectively enrich context for metaphor understanding, highlighting context placement and quantity as critical factors. Finally, we outline future directions, including advanced prompt engineering, optimizing context lengths, and extending this approach to multilingual metaphor detection.
Full article
Open AccessArticle
An Adaptive Unsupervised Learning Approach for Credit Card Fraud Detection
by
John Adejoh, Nsikak Owoh, Moses Ashawa, Salaheddin Hosseinzadeh, Alireza Shahrabi and Salma Mohamed
Big Data Cogn. Comput. 2025, 9(9), 217; https://doi.org/10.3390/bdcc9090217 - 25 Aug 2025
Abstract
Credit card fraud remains a major cause of financial loss around the world. Traditional fraud detection methods that rely on supervised learning often struggle because fraudulent transactions are rare compared to legitimate ones, leading to imbalanced datasets. Additionally, the models must be retrained
[...] Read more.
Credit card fraud remains a major cause of financial loss around the world. Traditional fraud detection methods that rely on supervised learning often struggle because fraudulent transactions are rare compared to legitimate ones, leading to imbalanced datasets. Additionally, the models must be retrained frequently, as fraud patterns change over time and require new labeled data for retraining. To address these challenges, this paper proposes an ensemble unsupervised learning approach for credit card fraud detection that combines Autoencoders (AEs), Self-Organizing Maps (SOMs), and Restricted Boltzmann Machines (RBMs), integrated with an Adaptive Reconstruction Threshold (ART) mechanism. The ART dynamically adjusts anomaly detection thresholds by leveraging the clustering properties of SOMs, effectively overcoming the limitations of static threshold approaches in machine learning and deep learning models. The proposed models, AE-ASOMs (Autoencoder—Adaptive Self-Organizing Maps) and RBM-ASOMs (Restricted Boltzmann Machines—Adaptive Self-Organizing Maps), were evaluated on the Kaggle Credit Card Fraud Detection and IEEE-CIS datasets. Our AE-ASOM model achieved an accuracy of 0.980 and an F1-score of 0.967, while the RBM-ASOM model achieved an accuracy of 0.975 and an F1-score of 0.955. Compared to models such as One-Class SVM and Isolation Forest, our approach demonstrates higher detection accuracy and significantly reduces false positive rates. In addition to its performance, the model offers considerable computational efficiency with a training time of 200.52 s and memory usage of 3.02 megabytes.
Full article
(This article belongs to the Special Issue Transforming Cyber Security Provision through Utilizing Artificial Intelligence)
►▼
Show Figures

Figure 1
Open AccessArticle
MS-PreTE: A Multi-Scale Pre-Training Encoder for Mobile Encrypted Traffic Classification
by
Ziqi Wang, Yufan Qiu, Yaping Liu, Shuo Zhang and Xinyi Liu
Big Data Cogn. Comput. 2025, 9(8), 216; https://doi.org/10.3390/bdcc9080216 - 21 Aug 2025
Abstract
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar
[...] Read more.
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar TCP flow characteristics across different applications. This makes it challenging for existing traffic classification methods to effectively identify mobile traffic. To address the challenge, we propose MS-PreTE, a two-phase pre-training framework for mobile traffic classification. MS-PreTE introduces a novel multi-level representation model to preserve traffic information from diverse perspectives and hierarchical levels. Furthermore, MS-PreTE incorporates a focal-attention mechanism to enhance the model’s capability in discerning subtle differences among similar traffic flows. Evaluations demonstrate that MS-PreTE achieves state-of-the-art performance on three mobile application datasets, boosting the F1 score for Cross-platform (iOS) to 99.34% (up by 2.1%), Cross-platform (Android) to 98.61% (up by 1.6%), and NUDT-Mobile-Traffic to 87.70% (up by 2.47%). Moreover, MS-PreTE exhibits strong generalization capabilities across four real-world traffic datasets.
Full article
(This article belongs to the Special Issue Machine Learning Methodologies and Applications in Cybersecurity Data Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
A Multi-Level Annotation Model for Fake News Detection: Implementing Kazakh-Russian Corpus via Label Studio
by
Madina Sambetbayeva, Anargul Nekessova, Aigerim Yerimbetova, Abdygalym Bayangali, Mira Kaldarova, Duman Telman and Nurzhigit Smailov
Big Data Cogn. Comput. 2025, 9(8), 215; https://doi.org/10.3390/bdcc9080215 - 20 Aug 2025
Abstract
►▼
Show Figures
This paper presents a multi-level annotation model for detecting fake news in Kazakh and Russian languages, aiming to enhance understanding of disinformation strategies in multilingual digital media environments. Unlike traditional binary models, our approach captures the complexity of disinformation by accounting for both
[...] Read more.
This paper presents a multi-level annotation model for detecting fake news in Kazakh and Russian languages, aiming to enhance understanding of disinformation strategies in multilingual digital media environments. Unlike traditional binary models, our approach captures the complexity of disinformation by accounting for both linguistic and cultural factors. To support this, a corpus of over 5000 news texts was manually annotated using the Label Studio platform. The annotation scheme consists of seven interrelated categories: CLAIM, SOURCE, EVIDENCE, DISINFORMATION_TECHNIQUE, AUTHOR_INTENT, TARGET_AUDIENCE, and TIMESTAMP. Inter-annotator agreement, evaluated using Cohen’s Kappa, ranged from 0.72 to 0.81, indicating substantial consistency. The annotated data reveals recurring patterns of disinformation, such as emotional manipulation, targeting of vulnerable individuals, and the strategic concealment of intent. Semantic relations between entities, such as CLAIM → EVIDENCE and CLAIM → AUTHOR_INTENT were formalized to represent disinformation narratives as knowledge graphs. This study contributes the first linguistically and culturally adapted annotation model for Kazakh and Russian languages, providing a robust and empirical resource for building interpretable and context-aware fake news detection systems. The resulting annotated corpus and its semantic structure offer valuable empirical material for further research in natural language processing, computational linguistics, and media studies in low-resource language environments.
Full article

Figure 1
Open AccessArticle
Detection and Localization of Hidden IoT Devices in Unknown Environments Based on Channel Fingerprints
by
Xiangyu Ju, Yitang Chen, Zhiqiang Li and Biao Han
Big Data Cogn. Comput. 2025, 9(8), 214; https://doi.org/10.3390/bdcc9080214 - 20 Aug 2025
Abstract
In recent years, hidden IoT monitoring devices installed indoors have raised significant concerns about privacy breaches and other security threats. To address the challenges of detecting such devices, low positioning accuracy, and lengthy detection times, this paper proposes a hidden device detection and
[...] Read more.
In recent years, hidden IoT monitoring devices installed indoors have raised significant concerns about privacy breaches and other security threats. To address the challenges of detecting such devices, low positioning accuracy, and lengthy detection times, this paper proposes a hidden device detection and localization system that operates on the Android platform. This technology utilizes the Received Signal Strength Indication (RSSI) signals received by the detection terminal device to achieve the detection, classification, and localization of hidden IoT devices in unfamiliar environments. This technology integrates three key designs: (1) actively capturing the RSSI sequence of hidden devices by sending RTS frames and receiving CTS frames, which is used to generate device channel fingerprints and estimate the distance between hidden devices and detection terminals; (2) training an RSSI-based ranging model using the XGBoost algorithm, followed by multi-point localization for accurate positioning; (3) implementing augmented reality-based visual localization to support handheld detection terminals. This prototype system successfully achieves active data sniffing based on RTS/CTS and terminal localization based on the RSSI-based ranging model, effectively reducing signal acquisition time and improving localization accuracy. Real-world experiments show that the system can detect and locate hidden devices in unfamiliar environments, achieving an accuracy of 98.1% in classifying device types. The time required for detection and localization is approximately one-sixth of existing methods, with system runtime maintained within 5 min. The localization error is 0.77 m, a 48.7% improvement over existing methods with an average error of 1.5 m.
Full article
(This article belongs to the Special Issue Machine Learning Methodologies and Applications in Cybersecurity Data Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Efficient Dynamic Emotion Recognition from Facial Expressions Using Statistical Spatio-Temporal Geometric Features
by
Yacine Yaddaden
Big Data Cogn. Comput. 2025, 9(8), 213; https://doi.org/10.3390/bdcc9080213 - 19 Aug 2025
Abstract
Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal
[...] Read more.
Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal evolution of facial expressions but often suffer from high computational costs, limiting their suitability for real-time use. In this paper, we propose an efficient dynamic AFER approach based on a novel spatio-temporal representation. Facial landmarks are extracted, and all possible Euclidean distances are computed to model the spatial structure. To capture temporal variations, three statistical metrics are applied to each distance sequence. A feature selection stage based on the Extremely Randomized Trees (ExtRa-Trees) algorithm is then performed to reduce dimensionality and enhance classification performance. Finally, the emotions are classified using a linear multi-class Support Vector Machine (SVM) and compared against the k-Nearest Neighbors (k-NN) method. The proposed approach is evaluated on three benchmark datasets: CK+, MUG, and MMI, achieving recognition rates of 94.65%, 93.98%, and 75.59%, respectively. Our results demonstrate that the proposed method achieves a strong balance between accuracy and computational efficiency, making it well-suited for real-time facial expression recognition applications.
Full article
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)
►▼
Show Figures

Figure 1
Open AccessArticle
Proposal of a Blockchain-Based Data Management System for Decentralized Artificial Intelligence Devices
by
Keundug Park and Heung-Youl Youm
Big Data Cogn. Comput. 2025, 9(8), 212; https://doi.org/10.3390/bdcc9080212 - 18 Aug 2025
Abstract
A decentralized artificial intelligence (DAI) system is a human-oriented artificial intelligence (AI) system, which performs self-learning and shares its knowledge with other DAI systems like humans. A DAI device is an individual device (e.g., a mobile phone, a personal computer, a robot, a
[...] Read more.
A decentralized artificial intelligence (DAI) system is a human-oriented artificial intelligence (AI) system, which performs self-learning and shares its knowledge with other DAI systems like humans. A DAI device is an individual device (e.g., a mobile phone, a personal computer, a robot, a car, etc.) running a DAI system. A DAI device acquires validated knowledge data and raw data from a blockchain system as a trust anchor and improves its knowledge level by self-learning using the validated data. A DAI device using the proposed system reduces unreliable tasks, including the generation of unreliable products (e.g., deepfakes, fake news, and hallucinations), but the proposed system also prevents these malicious DAI devices from acquiring the validated data. This paper proposes a new architecture for a blockchain-based data management system for DAI devices, together with the service scenario and data flow, security threats, and security requirements. It also describes the key features and expected effects of the proposed system. This paper discusses the considerations for developing or operating the proposed system and concludes with future works.
Full article
(This article belongs to the Special Issue Transforming Cyber Security Provision through Utilizing Artificial Intelligence)
►▼
Show Figures

Figure 1
Open AccessArticle
Empathetic Response Generation Based on Emotional Transition Prompt and Dual-Semantic Contrastive Learning
by
Yanying Mao, Yijia Zhang, Taihua Shao and Honghui Chen
Big Data Cogn. Comput. 2025, 9(8), 211; https://doi.org/10.3390/bdcc9080211 - 18 Aug 2025
Abstract
Empathetic response generation stands as a pivotal endeavor in the development of human-like dialogue systems. An effective approach in previous research is integrating external knowledge to generate empathetic responses. However, existing approaches only focus on identifying a user’s current emotional state, and they
[...] Read more.
Empathetic response generation stands as a pivotal endeavor in the development of human-like dialogue systems. An effective approach in previous research is integrating external knowledge to generate empathetic responses. However, existing approaches only focus on identifying a user’s current emotional state, and they overlook the user’s emotional transition during context, and fail to propel the sustainability of the dialogue. To tackle the aforementioned issues, we propose an empathetic response generation model based on an emotional transition prompt and dual-semantic contrastive learning (EPDC). Specifically, we first compute the transition in users’ sentiment polarity during the conversation and incorporate it into the conversation embedding as sentiment prompts. Then, we generate two distinct fine-grained contextual representations and treat them as positive examples for contrastive learning, respectively, aiming at extracting high-order semantic information to guide the subsequent turn of dialogue. Finally, we also leverage commonsense knowledge to enhance the contextual representations, and the empathetic responses are generated by decoding the combination of semantic and emotional states. Notably, our work represents the pioneering application of emotional prompts and contrastive learning to augment the sustainability of empathetic dialogue. Extensive experiments conducted on the benchmark dataset EMPATHETICDIALOGUES demonstrate that EPDC outperforms the baselines in both automatic evaluations and human evaluations.
Full article
(This article belongs to the Special Issue Application of Semantic Technologies in Intelligent Environment)
►▼
Show Figures

Figure 1
Open AccessArticle
AI-Based Phishing Detection and Student Cybersecurity Awareness in the Digital Age
by
Zeinab Shahbazi, Rezvan Jalali and Maryam Molaeevand
Big Data Cogn. Comput. 2025, 9(8), 210; https://doi.org/10.3390/bdcc9080210 - 15 Aug 2025
Abstract
Phishing attacks are an increasingly common cybersecurity threat and are characterized by deceiving people into giving out their private credentials via emails, websites, and messages. An insight into students’ challenges in recognizing phishing threats can provide valuable information on how AI-based detection systems
[...] Read more.
Phishing attacks are an increasingly common cybersecurity threat and are characterized by deceiving people into giving out their private credentials via emails, websites, and messages. An insight into students’ challenges in recognizing phishing threats can provide valuable information on how AI-based detection systems can be improved to enhance accuracy, reduce false positives, and build user trust in cybersecurity. This study focuses on students’ awareness of phishing attempts and evaluates AI-based phishing detection systems. Questionnaires were circulated amongst students, and responses were evaluated to uncover prevailing patterns and issues. The results indicate that most college students are knowledgeable about phishing methods, but many do not recognize the dangers of phishing. Because of this, AI-based detection systems have potential but also face issues relating to accuracy, false positives, and user faith. This research highlights the importance of bolstering cybersecurity education and ongoing enhancements to AI models to improve phishing detection. Future studies should include a more representative sample, evaluate AI detection systems in real-world settings, and assess longer-term changes in phishing-related awareness. By combining AI-driven solutions with education a safer digital world can created.
Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
►▼
Show Figures

Figure 1
Open AccessArticle
SplitGround: Long-Chain Reasoning Split via Modular Multi-Expert Collaboration for Training-Free Scene Knowledge-Guided Visual Grounding
by
Xilong Qin, Yue Hu, Wansen Wu, Xinmeng Li and Quanjun Yin
Big Data Cogn. Comput. 2025, 9(8), 209; https://doi.org/10.3390/bdcc9080209 - 14 Aug 2025
Abstract
►▼
Show Figures
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the
[...] Read more.
Scene Knowledge-guided Visual Grounding (SK-VG) is a multi-modal detection task built upon conventional visual grounding (VG) for human–computer interaction scenarios. It utilizes an additional passage of scene knowledge apart from the image and context-dependent textual query for referred object localization. Due to the inherent difficulty in directly establishing correlations between the given query and the image without leveraging scene knowledge, this task imposes significant demands on a multi-step knowledge reasoning process to achieve accurate grounding. Off-the-shelf VG models underperform under such a setting due to the requirement of detailed description in the query and a lack of knowledge inference based on implicit narratives of the visual scene. Recent Vision–Language Models (VLMs) exhibit improved cross-modal reasoning capabilities. However, their monolithic architectures, particularly in lightweight implementations, struggle to maintain coherent reasoning chains across sequential logical deductions, leading to error accumulation in knowledge integration and object localization. To address the above-mentioned challenges, we propose SplitGround—a collaborative framework that strategically decomposes complex reasoning processes by fusing the input query and image with knowledge through two auxiliary modules. Specifically, it implements an Agentic Annotation Workflow (AAW) for explicit image annotation and a Synonymous Conversion Mechanism (SCM) for semantic query transformation. This hierarchical decomposition enables VLMs to focus on essential reasoning steps while offloading auxiliary cognitive tasks to specialized modules, effectively splitting long reasoning chains into manageable subtasks with reduced complexity. Comprehensive evaluations on the SK-VG benchmark demonstrate the significant advancements of our method. Remarkably, SplitGround attains an accuracy improvement of 15.71% on the hard split of the test set over the previous training-required SOTA, using only a compact VLM backbone without fine-tuning, which provides new insights for knowledge-intensive visual grounding tasks.
Full article

Figure 1
Open AccessArticle
Ontology Matching Method Based on Deep Learning and Syntax
by
Jiawei Lu and Changfeng Yan
Big Data Cogn. Comput. 2025, 9(8), 208; https://doi.org/10.3390/bdcc9080208 - 14 Aug 2025
Abstract
Ontology technology addresses data heterogeneity challenges in Internet of Everything (IoE) systems enabled by Cyber Twin and 6G, yet the subjective nature of ontology engineering often leads to differing definitions of the same concept across ontologies, resulting in ontology heterogeneity. To solve this
[...] Read more.
Ontology technology addresses data heterogeneity challenges in Internet of Everything (IoE) systems enabled by Cyber Twin and 6G, yet the subjective nature of ontology engineering often leads to differing definitions of the same concept across ontologies, resulting in ontology heterogeneity. To solve this problem, this study introduces a hybrid ontology matching method that integrates a Recurrent Neural Network (RNN) with syntax-based analysis. The method first extracts representative entities by leveraging in-degree and out-degree information from ontological tree structures, which reduces training noise and improves model generalization. Next, a matching framework combining RNN and N-gram is designed: the RNN captures medium-distance dependencies and complex sequential patterns, supporting the dynamic optimization of embedding parameters and semantic feature extraction; the N-gram module further captures local information and relationships between adjacent characters, improving the coverage of matched entities. The experiments were conducted on the OAEI benchmark dataset, where the proposed method was compared with representative baseline methods from OAEI as well as a Transformer-based method. The results demonstrate that the proposed method achieved an 18.18% improvement in F-measure over the best-performing baseline. This improvement was statistically significant, as validated by the Friedman and Holm tests. Moreover, the proposed method achieves the shortest runtime among all the compared methods. Compared to other RNN-based hybrid frameworks that adopt classical structure-based and semantics-based similarity measures, the proposed method further improved the F-measure by 18.46%. Furthermore, a comparison of time and space complexity with the standalone RNN model and its variants demonstrated that the proposed method achieved high performance while maintaining favorable computational efficiency. These findings confirm the effectiveness and efficiency of the method in addressing ontology heterogeneity in complex IoE environments.
Full article
(This article belongs to the Special Issue Evolutionary Computation and Artificial Intelligence: Building a Sustainable Future for Smart Cities)
►▼
Show Figures

Figure 1
Open AccessArticle
Target-Oriented Opinion Words Extraction Based on Dependency Tree
by
Yan Wen, Enhai Yu, Jiawei Qu, Lele Cheng, Yuao Chen and Siyu Lu
Big Data Cogn. Comput. 2025, 9(8), 207; https://doi.org/10.3390/bdcc9080207 - 13 Aug 2025
Abstract
►▼
Show Figures
Target-oriented opinion words extraction (TOWE) is a novel subtask of aspect-based sentiment analysis (ABSA), which aims to extract opinion words corresponding to a given opinion target within a sentence. In recent years, neural networks have been widely used to solve this problem and
[...] Read more.
Target-oriented opinion words extraction (TOWE) is a novel subtask of aspect-based sentiment analysis (ABSA), which aims to extract opinion words corresponding to a given opinion target within a sentence. In recent years, neural networks have been widely used to solve this problem and have achieved competitive results. However, when faced with complex and long sentences, the existing methods struggle to accurately identify the semantic relationships between distant opinion targets and opinion words. This is primarily because they rely on literal distance, rather than semantic distance, to model the local context or opinion span of the opinion target. To address this issue, we propose a neural network model called DTOWE, which comprises (1) a global module using Inward-LSTM and Outward-LSTM to capture general sentence-level context, and (2) a local module that employs BiLSTM combined with DT-LCF to focus on target-specific opinion spans. DT-LCF is implemented in two ways: DT-LCF-Mask, which uses a binary mask to zero out non-local context beyond a dependency tree distance threshold, α, and DT-LCF-weight, which applies a dynamic weighted decay to downweigh distant context based on semantic distance. These mechanisms leverage dependency tree structures to measure semantic proximity, reducing the impact of irrelevant words and enhancing the accuracy of opinion span detection. Extensive experiments on four benchmark datasets demonstrate that DTOWE outperforms state-of-the-art models. Specifically, DT-LCF-Weight achieves F1-scores of 73.62% (14lap), 82.24% (14res), 75.35% (15res), and 83.83% (16res), with improvements of 2.63% to 3.44% over the previous state-of-the-art (SOTA) model, IOG. Ablation studies confirm that the dependency tree-based distance measurement and DT-LCF mechanism are critical to the model’s effectiveness, validating their ability to handle complex sentences and capture semantic dependencies between targets and opinion words.
Full article

Figure 1
Open AccessArticle
Research on Multi-Stage Detection of APT Attacks: Feature Selection Based on LDR-RFECV and Hyperparameter Optimization via LWHO
by
Lihong Zeng, Honghui Li, Xueliang Fu, Daoqi Han, Shuncheng Zhou and Xin He
Big Data Cogn. Comput. 2025, 9(8), 206; https://doi.org/10.3390/bdcc9080206 - 12 Aug 2025
Abstract
►▼
Show Figures
In the highly interconnected digital ecosystem, cyberspace has become the main battlefield for complex attacks such as Advanced Persistent Threat (APT). The complexity and concealment of APT attacks are increasing, posing unprecedented challenges to network security. Current APT detection methods largely depend on
[...] Read more.
In the highly interconnected digital ecosystem, cyberspace has become the main battlefield for complex attacks such as Advanced Persistent Threat (APT). The complexity and concealment of APT attacks are increasing, posing unprecedented challenges to network security. Current APT detection methods largely depend on general datasets, making it challenging to capture the stages and complexity of APT attacks. Moreover, existing detection methods often suffer from suboptimal accuracy, high false alarm rates, and a lack of real-time capabilities. In this paper, we introduce LDR-RFECV, a novel feature selection (FS) algorithm that uses LightGBM, Decision Trees (DTs), and Random Forest (RF) as integrated feature evaluators instead of single evaluators in recursive feature elimination algorithms. This approach helps select the optimal feature subset, thereby significantly enhancing detection efficiency. In addition, a novel optimization algorithm called LWHO was proposed, which integrates the Levy flight mechanism with the Wild Horse Optimizer (WHO) to optimize the hyperparameters of the LightGBM model, ultimately enhancing performance in APT attack detection. More importantly, this optimization strategy significantly boosts the detection rate during the lateral movement phase of APT attacks, a pivotal stage where attackers infiltrate key resources. Timely identification is essential for disrupting the attack chain and achieving precise defense. Experimental results demonstrate that the proposed method achieves 97.31% and 98.32% accuracy on two typical APT attack datasets, DAPT2020 and Unraveled, respectively, which is 2.86% and 4.02% higher than the current research methods, respectively.
Full article

Figure 1
Open AccessArticle
Explainable Deep Learning Model for ChatGPT-Rephrased Fake Review Detection Using DistilBERT
by
Rania A. AlQadi, Shereen A. Taie, Amira M. Idrees and Esraa Elhariri
Big Data Cogn. Comput. 2025, 9(8), 205; https://doi.org/10.3390/bdcc9080205 - 11 Aug 2025
Abstract
Customers heavily depend on reviews for product information. Fake reviews may influence the perception of product quality, making online reviews less effective. ChatGPT’s (GPT-3.5 and GPT-4) ability to generate human-like reviews and responses to inquiries across several disciplines has increased recently. This leads
[...] Read more.
Customers heavily depend on reviews for product information. Fake reviews may influence the perception of product quality, making online reviews less effective. ChatGPT’s (GPT-3.5 and GPT-4) ability to generate human-like reviews and responses to inquiries across several disciplines has increased recently. This leads to an increase in the number of reviewers and applications using ChatGPT to create fake reviews. Consequently, the detection of fake reviews generated or rephrased by ChatGPT has become essential. This paper proposes a new approach that distinguishes ChatGPT-rephrased reviews, considered fake, from real ones, utilizing a balanced dataset to analyze the sentiment and linguistic patterns that characterize both reviews. The proposed model further leverages Explainable Artificial Intelligence (XAI) techniques, including Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) for deeper insights into the model’s predictions and the classification logic. The proposed model performs a pre-processing phase that includes part-of-speech (POS) tagging, word lemmatization, tokenization, and then fine-tuned Transformer-based Machine Learning (ML) model DistilBERT for predictions. The obtained experimental results indicate that the proposed fine-tuned DistilBERT, utilizing the constructed balanced dataset along with a pre-processing phase, outperforms other state-of-the-art methods for detecting ChatGPT-rephrased reviews, achieving an accuracy of 97.25% and F1-score of 97.56%. The use of LIME and SHAP techniques not only enhanced the model’s interpretability, but also offered valuable insights into the key factors that affect the differentiation of genuine reviews from ChatGPT-rephrased ones. According to XAI, ChatGPT’s writing style is polite, uses grammatical structure, lacks specific descriptions and information in reviews, uses fancy words, is impersonal, and has deficiencies in emotional expression. These findings emphasize the effectiveness and reliability of the proposed approach.
Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
►▼
Show Figures

Figure 1
Open AccessArticle
Rebalancing in Supervised Contrastive Learning for Long-Tailed Visual Recognition
by
Jiahui Lv, Jun Lei, Jun Zhang, Chao Chen and Shuohao Li
Big Data Cogn. Comput. 2025, 9(8), 204; https://doi.org/10.3390/bdcc9080204 - 11 Aug 2025
Abstract
►▼
Show Figures
In real-world visual recognition tasks, long-tailed distribution is a pervasive challenge, where the extreme class imbalance severely limits the representation learning capability of deep models. Although supervised learning has demonstrated certain potential in long-tailed visual recognition, these models’ gradient updates dominated by head
[...] Read more.
In real-world visual recognition tasks, long-tailed distribution is a pervasive challenge, where the extreme class imbalance severely limits the representation learning capability of deep models. Although supervised learning has demonstrated certain potential in long-tailed visual recognition, these models’ gradient updates dominated by head classes often lead to insufficient representation of tail classes, resulting in ambiguous decision boundaries. While existing Supervised Contrastive Learning variants mitigate class bias through instance-level similarity comparison, they are still limited by biased negative sample selection and insufficient modeling of the feature space structure. To address this, we propose Rebalancing Supervised Contrastive Learning (Reb-SupCon), which constructs a balanced and discriminative feature space during model training to alleviate performance deviation. Our method consists of two key components: (1) a dynamic rebalancing factor that automatically adjusts sample contributions through differentiable weighting, thereby establishing class-balanced feature representations; (2) a prototype-aware enhancement module that further improves feature discriminability by explicitly constraining the geometric structure of the feature space through introduced feature prototypes, enabling locally discriminative feature reconstruction. This breaks through the limitations of conventional instance contrastive learning and helps the model to identify more reasonable decision boundaries. Experimental results show that this method demonstrates superior performance on mainstream long-tailed benchmark datasets, with ablation studies and feature visualizations validating the modules’ synergistic effects.
Full article

Figure 1
Open AccessArticle
Application of a Multi-Algorithm-Optimized CatBoost Model in Predicting the Strength of Multi-Source Solid Waste Backfilling Materials
by
Jianhui Qiu, Jielin Li, Xin Xiong and Keping Zhou
Big Data Cogn. Comput. 2025, 9(8), 203; https://doi.org/10.3390/bdcc9080203 - 7 Aug 2025
Abstract
Backfilling materials are commonly employed materials in mines for filling mining waste, and the strength of the consolidated backfill formed by the binding material directly influences the stability of the surrounding rock and production safety in mines. The traditional approach to obtaining the
[...] Read more.
Backfilling materials are commonly employed materials in mines for filling mining waste, and the strength of the consolidated backfill formed by the binding material directly influences the stability of the surrounding rock and production safety in mines. The traditional approach to obtaining the strength of the backfill demands a considerable amount of manpower and time. The rapid and precise acquisition and optimization of backfill strength parameters hold utmost significance for mining safety. In this research, the authors carried out a backfill strength experiment with five experimental parameters, namely concentration, cement–sand ratio, waste rock–tailing ratio, curing time, and curing temperature, using an orthogonal design. They collected 174 sets of backfill strength parameters and employed six population optimization algorithms, including the Artificial Ecosystem-based Optimization (AEO) algorithm, Aquila Optimization (AO) algorithm, Germinal Center Optimization (GCO), Sand Cat Swarm Optimization (SCSO), Sparrow Search Algorithm (SSA), and Walrus Optimization Algorithm (WaOA), in combination with the CatBoost algorithm to conduct a prediction study of backfill strength. The study also utilized the Shapley Additive explanatory (SHAP) method to analyze the influence of different parameters on the prediction of backfill strength. The results demonstrate that when the population size was 60, the AEO-CatBoost algorithm model exhibited a favorable fitting effect (R2 = 0.947, VAF = 93.614), and the prediction error was minimal (RMSE = 0.606, MAE = 0.465), enabling the accurate and rapid prediction of the strength parameters of the backfill under different ratios and curing conditions. Additionally, an increase in curing temperature and curing time enhanced the strength of the backfill, and the influence of the waste rock–tailing ratio on the strength of the backfill was negative at a curing temperature of 50 °C, which is attributed to the change in the pore structure at the microscopic level leading to macroscopic mechanical alterations. When the curing conditions are adequate and the parameter ratios are reasonable, the smaller the porosity rate in the backfill, the greater the backfill strength will be. This study offers a reliable and accurate method for the rapid acquisition of backfill strength and provides new technical support for the development of filling mining technology.
Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence and Data Management in Data Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
Evidential K-Nearest Neighbors with Cognitive-Inspired Feature Selection for High-Dimensional Data
by
Yawen Liu, Yang Zhang, Xudong Wang and Xinyuan Qu
Big Data Cogn. Comput. 2025, 9(8), 202; https://doi.org/10.3390/bdcc9080202 - 6 Aug 2025
Abstract
►▼
Show Figures
The Evidential K-Nearest Neighbor (EK-NN) classifier has demonstrated robustness in handling incomplete and uncertain data; however, its application in high-dimensional big data for feature selection, such as genomic datasets with tens of thousands of gene features, remains underexplored. Our proposed Granular–Elastic Evidential K-Nearest
[...] Read more.
The Evidential K-Nearest Neighbor (EK-NN) classifier has demonstrated robustness in handling incomplete and uncertain data; however, its application in high-dimensional big data for feature selection, such as genomic datasets with tens of thousands of gene features, remains underexplored. Our proposed Granular–Elastic Evidential K-Nearest Neighbor (GEK-NN) approach addresses this gap. In the context of big data, GEK-NN integrates an Elastic Net within the Genetic Algorithm’s fitness function to efficiently sift through vast amounts of data, identifying relevant feature subsets. This process mimics human cognitive behavior of filtering and refining information, similar to concepts in cognitive computing. A granularity metric is further employed to optimize subset size, maximizing its impact. GEK-NN consists of two crucial phases. Initially, an Elastic Net-based feature evaluation is conducted to pinpoint relevant features from the high-dimensional data. Subsequently, granularity-based optimization refines the subset size, adapting to the complexity of big data. Before applying to genomic big data, experiments on UCI datasets demonstrated the feasibility and effectiveness of GEK-NN. By using an Evidence Theory framework, GEK-NN overcomes feature-selection challenges in both low-dimensional UCI datasets and high-dimensional genomic big data, significantly enhancing pattern recognition and classification accuracy. Comparative analyses with existing EK-NN feature-selection methods, using both UCI and high-dimensional gene datasets, underscore GEK-NN’s superiority in handling big data for feature selection and classification. These results indicate that GEK-NN not only enriches EK-NN applications but also offers a cognitive-inspired solution for complex gene data analysis, effectively tackling high-dimensional feature-selection challenges in the realm of big data.
Full article

Figure 1
Open AccessArticle
Exploring Scientific Collaboration Patterns from the Perspective of Disciplinary Difference: Evidence from Scientific Literature Data
by
Jun Zhang, Shengbo Liu and Yifei Wang
Big Data Cogn. Comput. 2025, 9(8), 201; https://doi.org/10.3390/bdcc9080201 - 1 Aug 2025
Abstract
►▼
Show Figures
With the accelerating globalization and rapid development of science and technology, scientific collaboration has become a key driver of knowledge production, yet its patterns vary significantly across disciplines. This study aims to explore the disciplinary differences in scholars’ scientific collaboration patterns and their
[...] Read more.
With the accelerating globalization and rapid development of science and technology, scientific collaboration has become a key driver of knowledge production, yet its patterns vary significantly across disciplines. This study aims to explore the disciplinary differences in scholars’ scientific collaboration patterns and their underlying mechanisms. Data were collected from the China National Knowledge Infrastructure (CNKI) database, covering papers from four disciplines: mathematics, mechanical engineering, philosophy, and sociology. Using social network analysis, we examined core network metrics (degree centrality, neighbor connectivity, clustering coefficient) in collaboration networks, analyzed collaboration patterns across scholars of different academic ages, and compared the academic age distribution of collaborators and network characteristics across career stages. Key findings include the following. (1) Mechanical engineering exhibits the highest and most stable clustering coefficient (mean 0.62) across all academic ages, reflecting tight team collaboration, with degree centrality increasing fastest with academic age (3.2 times higher for senior vs. beginner scholars), driven by its reliance on experimental resources and skill division. (2) Philosophy shows high degree centrality in early career stages (mean 0.38 for beginners) but a sharp decline in clustering coefficient in senior stages (from 0.42 to 0.17), indicating broad early collaboration but loose later ties due to individualized knowledge production. (3) Mathematics scholars prefer collaborating with high-centrality peers (higher neighbor connectivity, mean 0.51), while sociology shows more inclusive collaboration with dispersed partner centrality.
Full article

Figure 1

Journal Menu
► ▼ Journal Menu-
- BDCC Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Special Issues
- Topics
- Topical Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Algorithms, BDCC, BioMedInformatics, Information, Mathematics
Machine Learning Empowered Drug Screen
Topic Editors: Teng Zhou, Jiaqi Wang, Youyi SongDeadline: 31 August 2025
Topic in
IJERPH, JPM, Healthcare, BDCC, Applied Sciences, Sensors
eHealth and mHealth: Challenges and Prospects, 2nd Edition
Topic Editors: Antonis Billis, Manuel Dominguez-Morales, Anton CivitDeadline: 31 October 2025
Topic in
Actuators, Algorithms, BDCC, Future Internet, JMMP, Machines, Robotics, Systems
Smart Product Design and Manufacturing on Industrial Internet
Topic Editors: Pingyu Jiang, Jihong Liu, Ying Liu, Jihong YanDeadline: 31 December 2025
Topic in
Computers, Information, AI, Electronics, Technologies, BDCC
Graph Neural Networks and Learning Systems
Topic Editors: Huijia Li, Jun Hu, Weichen Zhao, Jie CaoDeadline: 31 January 2026

Special Issues
Special Issue in
BDCC
Semantic Web Technology and Recommender Systems 2nd Edition
Guest Editors: Konstantinos Kotis, Dimitris SpiliotopoulosDeadline: 31 August 2025
Special Issue in
BDCC
Machine Learning Methodologies and Applications in Cybersecurity Data Analysis
Guest Editors: Biao Han, Xiaoyan Wang, Xiucai Ye, Na ZhaoDeadline: 31 August 2025
Special Issue in
BDCC
Energy Conservation Towards a Low-Carbon and Sustainability Future
Guest Editors: Yongming Han, Xuan HuDeadline: 25 September 2025
Special Issue in
BDCC
Application of Artificial Intelligence in Traffic Management
Guest Editors: Weihao Ma, Dongfang MaDeadline: 30 September 2025