Machine Learning Methodologies and Applications in Cybersecurity Data Analysis

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer, National University of Defense Technology, Changsha 410073, China
Interests: AI for networks; multipath transmission; cybersecurity

E-Mail Website
Guest Editor
Department of Electrical and Electronic Systems Engineering, College of Engineering, Ibaraki University, Hitachi city, Japan
Interests: wireless communication; wireless sensing; AI; security
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
Interests: machine learning; data analysis; security; bioinformatics

E-Mail Website
Guest Editor
Department of Information Technology, Hunan Police Academy, Changsha 410000, China
Interests: cybersecurity; deep learning; artificial intelligence for IT operations

Special Issue Information

Dear Colleagues,

Machine learning (ML) represents a pivotal technology for current and future information systems, with many domains already leveraging its capabilities. However, ML deployment in cybersecurity is still at an early stage, revealing a significant discrepancy between research and practice. ML is able to quickly analyze large volumes of historical and dynamic data, enabling applications to operationalize data from various sources in near-real time. Recently, we have witnessed the rapid development in ML methodologies and applications for cybersecurity data analysis in threat detection, raw data analysis, and alert management, among others. Yet, in this specific domain, unleashing the full benefits of ML in practice stems from balancing the underlying conflict between the intrinsic characteristics of the cybersecurity domain and the fundamental assumptions of ML. 

This Special Issue aims to collect recent advancements in machine learning methodologies and applications targeted towards tackling cybersecurity data challenges, highly valuing interdisciplinary research to contribute new challenges, research questions, approaches, and datasets related to this topic. 

This Special Issue invites new research contributions to machine learning methodologies and applications specifically tailored to cybersecurity data analysis challenges. The scope includes but is not limited to the following topics:

  • ML methods and applications for capturing/handling/evaluating cybersecurity datasets;
  • ML methods and applications for data-driven cybersecurity decision making;
  • ML methods and applications for security policy rule generation;
  • ML methods and applications for protecting valuable security data;
  • ML methods and applications for context-aware cybersecurity data analysis;
  • ML methods and applications for feature engineering in cybersecurity;
  • ML methods and applications for PHY/MAC/L3-L7 security protocol design and evaluation
  • ML methods and applications for PHY/MAC/L3-L7 security protocol optimization;
  • ML methods and applications for data-driven network protocol fuzzing;
  • ML methods and applications for data-driven anomaly/ intrusion detection;
  • ML methods and applications for data-driven network traffic analysis;
  • ML methods and applications for data-driven endpoint detection and response;
  • ML methods and applications for data-driven cybersecurity defense framework;
  • Cybersecurity datasets/benchmark for data analysis in ML methods and applications;
  • Cybersecurity prototypes/testbeds for data analysis in ML methods and applications, etc.

We look forward to receiving your contributions.  

Prof. Dr. Biao Han
Dr. Xiaoyan Wang
Prof. Dr. Xiucai Ye
Dr. Na Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • cybersecurity
  • data science
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

29 pages, 2044 KB  
Article
A Dual-Branch Transformer Framework for Trace-Level Anomaly Detection via Phase-Space Embedding and Causal Message Propagation
by Siyuan Liu, Yiting Chen, Sen Li, Jining Chen and Qian He
Big Data Cogn. Comput. 2026, 10(1), 10; https://doi.org/10.3390/bdcc10010010 - 28 Dec 2025
Viewed by 531
Abstract
In cloud-based distributed systems, trace anomaly detection plays a vital role in maintaining system reliability by identifying early signs of performance degradation or faults. However, existing methods often fail to capture the complex temporal and structural dependencies inherent in trace data. To address [...] Read more.
In cloud-based distributed systems, trace anomaly detection plays a vital role in maintaining system reliability by identifying early signs of performance degradation or faults. However, existing methods often fail to capture the complex temporal and structural dependencies inherent in trace data. To address this, we propose a novel dual-branch Transformer-based framework that integrates both temporal modeling and causal reasoning. The first branch encodes the original trace data to capture direct service-level dynamics, while the second employs phase-space reconstruction to reveal nonlinear temporal interactions by embedding time-delayed representations. To better capture how anomalies propagate across services, we introduce a causal propagation module that leverages directed service call graphs to enforce the time order and directionality during feature aggregation, ensuring anomaly signals propagate along realistic causal paths. Additionally, we propose a hybrid loss function combining the reconstruction error with symmetric Kullback–Leibler divergence between attention maps from the two branches, enabling the model to distinguish normal and anomalous patterns more effectively. Extensive experiments conducted on multiple real-world trace datasets demonstrate that our method consistently outperforms state-of-the-art baselines in terms of precision, recall, and F1 score. The proposed framework proves robust across diverse scenarios, offering improved detection accuracy, and robustness to noisy or complex service dependencies. Full article
Show Figures

Figure 1

24 pages, 3009 KB  
Article
SpaceTime: A Deep Similarity Defense Against Poisoning Attacks in Federated Learning
by Geethapriya Thamilarasu and Christian Dunham
Big Data Cogn. Comput. 2025, 9(12), 313; https://doi.org/10.3390/bdcc9120313 - 5 Dec 2025
Viewed by 581
Abstract
Federated learning has gained popularity in recent years to enhance IoT security because the model allows decentralized devices to collaboratively learn a shared model without exchanging raw data. Despite its privacy advantages, federated learning is vulnerable to poisoning attacks, where malicious devices introduce [...] Read more.
Federated learning has gained popularity in recent years to enhance IoT security because the model allows decentralized devices to collaboratively learn a shared model without exchanging raw data. Despite its privacy advantages, federated learning is vulnerable to poisoning attacks, where malicious devices introduce manipulated data or model updates to corrupt the global model. These attacks can degrade the model’s performance or bias its outcomes, making it difficult to ensure the integrity of the learning process across decentralized devices. In this research, our goal is to develop a defense mechanism against poisoning attacks in federated learning models. Specifically, we develop a spacetime model, that combines the three dimensions of space and the one dimension of time into a four-dimensional manifold. Poisoning attacks have complex spatial and time relationships that present identifiable patterns in that manifold. We propose SpaceTime-Deep Similarity Defense (ST-DSD), a deep learning recurrent neural network that includes space and time perceptions to provide a defense against poisoning attacks for federated learning models. The proposed mechanism is built upon a time series regression many-to-one architecture using spacetime relationships to provide an adversarial trained deep learning poisoning defense. Simulation results show that SpaceTime defense outperforms existing solutions for poisoning defenses in IoT environments. Full article
Show Figures

Figure 1

22 pages, 23322 KB  
Article
MS-PreTE: A Multi-Scale Pre-Training Encoder for Mobile Encrypted Traffic Classification
by Ziqi Wang, Yufan Qiu, Yaping Liu, Shuo Zhang and Xinyi Liu
Big Data Cogn. Comput. 2025, 9(8), 216; https://doi.org/10.3390/bdcc9080216 - 21 Aug 2025
Viewed by 1580
Abstract
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar [...] Read more.
Mobile traffic classification serves as a fundamental component in network security systems. In recent years, pre-training methods have significantly advanced this field. However, as mobile traffic is typically mixed with third-party services, the deep integration of such shared services results in highly similar TCP flow characteristics across different applications. This makes it challenging for existing traffic classification methods to effectively identify mobile traffic. To address the challenge, we propose MS-PreTE, a two-phase pre-training framework for mobile traffic classification. MS-PreTE introduces a novel multi-level representation model to preserve traffic information from diverse perspectives and hierarchical levels. Furthermore, MS-PreTE incorporates a focal-attention mechanism to enhance the model’s capability in discerning subtle differences among similar traffic flows. Evaluations demonstrate that MS-PreTE achieves state-of-the-art performance on three mobile application datasets, boosting the F1 score for Cross-platform (iOS) to 99.34% (up by 2.1%), Cross-platform (Android) to 98.61% (up by 1.6%), and NUDT-Mobile-Traffic to 87.70% (up by 2.47%). Moreover, MS-PreTE exhibits strong generalization capabilities across four real-world traffic datasets. Full article
Show Figures

Figure 1

18 pages, 5825 KB  
Article
Detection and Localization of Hidden IoT Devices in Unknown Environments Based on Channel Fingerprints
by Xiangyu Ju, Yitang Chen, Zhiqiang Li and Biao Han
Big Data Cogn. Comput. 2025, 9(8), 214; https://doi.org/10.3390/bdcc9080214 - 20 Aug 2025
Viewed by 1873
Abstract
In recent years, hidden IoT monitoring devices installed indoors have raised significant concerns about privacy breaches and other security threats. To address the challenges of detecting such devices, low positioning accuracy, and lengthy detection times, this paper proposes a hidden device detection and [...] Read more.
In recent years, hidden IoT monitoring devices installed indoors have raised significant concerns about privacy breaches and other security threats. To address the challenges of detecting such devices, low positioning accuracy, and lengthy detection times, this paper proposes a hidden device detection and localization system that operates on the Android platform. This technology utilizes the Received Signal Strength Indication (RSSI) signals received by the detection terminal device to achieve the detection, classification, and localization of hidden IoT devices in unfamiliar environments. This technology integrates three key designs: (1) actively capturing the RSSI sequence of hidden devices by sending RTS frames and receiving CTS frames, which is used to generate device channel fingerprints and estimate the distance between hidden devices and detection terminals; (2) training an RSSI-based ranging model using the XGBoost algorithm, followed by multi-point localization for accurate positioning; (3) implementing augmented reality-based visual localization to support handheld detection terminals. This prototype system successfully achieves active data sniffing based on RTS/CTS and terminal localization based on the RSSI-based ranging model, effectively reducing signal acquisition time and improving localization accuracy. Real-world experiments show that the system can detect and locate hidden devices in unfamiliar environments, achieving an accuracy of 98.1% in classifying device types. The time required for detection and localization is approximately one-sixth of existing methods, with system runtime maintained within 5 min. The localization error is 0.77 m, a 48.7% improvement over existing methods with an average error of 1.5 m. Full article
Show Figures

Figure 1

19 pages, 10741 KB  
Article
Electroencephalography-Based Motor Imagery Classification Using Multi-Scale Feature Fusion and Adaptive Lasso
by Shimiao Chen, Nan Li, Xiangzeng Kong, Dong Huang and Tingting Zhang
Big Data Cogn. Comput. 2024, 8(12), 169; https://doi.org/10.3390/bdcc8120169 - 25 Nov 2024
Viewed by 2312
Abstract
Brain–computer interfaces, where motor imagery electroencephalography (EEG) signals are transformed into control commands, offer a promising solution for enhancing the standard of living for disabled individuals. However, the performance of EEG classification has been limited in most studies due to a lack of [...] Read more.
Brain–computer interfaces, where motor imagery electroencephalography (EEG) signals are transformed into control commands, offer a promising solution for enhancing the standard of living for disabled individuals. However, the performance of EEG classification has been limited in most studies due to a lack of attention to the complementary information inherent at different temporal scales. Additionally, significant inter-subject variability in sensitivity to biological motion poses another critical challenge in achieving accurate EEG classification in a subject-dependent manner. To address these challenges, we propose a novel machine learning framework combining multi-scale feature fusion, which captures global and local spatial information from different-sized EEG segmentations, and adaptive Lasso-based feature selection, a mechanism for adaptively retaining informative subject-dependent features and discarding irrelevant ones. Experimental results on multiple public benchmark datasets revealed substantial improvements in EEG classification, achieving rates of 81.36%, 75.90%, and 68.30% for the BCIC-IV-2a, SMR-BCI, and OpenBMI datasets, respectively. These results not only surpassed existing methodologies but also underscored the effectiveness of our approach in overcoming specific challenges in EEG classification. Ablation studies further confirmed the efficacy of both the multi-scale feature analysis and adaptive selection mechanisms. This framework marks a significant advancement in the decoding of motor imagery EEG signals, positioning it for practical applications in real-world BCIs. Full article
Show Figures

Figure 1

15 pages, 2140 KB  
Article
Adaptive Management of Multi-Scenario Projects in Cybersecurity: Models and Algorithms for Decision-Making
by Vadim Tynchenko, Alexander Lomazov, Vadim Lomazov, Dmitry Evsyukov, Vladimir Nelyub, Aleksei Borodulin, Andrei Gantimurov and Ivan Malashin
Big Data Cogn. Comput. 2024, 8(11), 150; https://doi.org/10.3390/bdcc8110150 - 4 Nov 2024
Cited by 4 | Viewed by 2475
Abstract
In recent years, cybersecurity management has increasingly required advanced methodologies capable of handling complex, evolving threat landscapes. Scenario network-based approaches have emerged as effective strategies for managing uncertainty and adaptability in cybersecurity projects. This article introduces a scenario network-based approach for managing cybersecurity [...] Read more.
In recent years, cybersecurity management has increasingly required advanced methodologies capable of handling complex, evolving threat landscapes. Scenario network-based approaches have emerged as effective strategies for managing uncertainty and adaptability in cybersecurity projects. This article introduces a scenario network-based approach for managing cybersecurity projects, utilizing fuzzy linguistic models and a Takagi–Sugeno–Kanga fuzzy neural network. Drawing upon L. Zadeh’s theory of linguistic variables, the methodology integrates expert analysis, linguistic variables, and a continuous genetic algorithm to predict membership function parameters. Fuzzy production rules are employed for decision-making, while the Mamdani fuzzy inference algorithm enhances interpretability. This approach enables multi-scenario planning and adaptability across multi-stage cybersecurity projects. Preliminary results from a research prototype of an intelligent expert system—designed to analyze project stages and adaptively construct project trajectories—suggest the proposed approach is effective. In computational experiments, the use of fuzzy procedures resulted in an over 25% reduction in errors compared to traditional methods, particularly in adjusting project scenarios from pessimistic to baseline projections. While promising, this approach requires further testing across diverse cybersecurity contexts. Future studies will aim to refine scenario adaptation and optimize system response in high-risk project environments. Full article
Show Figures

Figure 1

Review

Jump to: Research

33 pages, 2850 KB  
Review
Network Traffic Analysis Based on Graph Neural Networks: A Scoping Review
by Ruonan Wang, Jinjing Zhao, Hongzheng Zhang, Liqiang He, Hu Li and Minhuan Huang
Big Data Cogn. Comput. 2025, 9(11), 270; https://doi.org/10.3390/bdcc9110270 - 24 Oct 2025
Viewed by 4027
Abstract
Network traffic analysis is crucial for understanding network behavior and identifying underlying applications, protocols, and service groups. The increasing complexity of network environments, driven by the evolution of the Internet, poses significant challenges to traditional analytical approaches. Graph Neural Networks (GNNs) have recently [...] Read more.
Network traffic analysis is crucial for understanding network behavior and identifying underlying applications, protocols, and service groups. The increasing complexity of network environments, driven by the evolution of the Internet, poses significant challenges to traditional analytical approaches. Graph Neural Networks (GNNs) have recently garnered considerable attention in network traffic analysis due to their ability to model complex relationships within network flows and between communicating entities. This scoping review systematically surveys major academic databases, employing predefined eligibility criteria to identify and synthesize key research in the field, following the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) methodology. We present a comprehensive overview of a generalized architecture for GNN-based traffic analysis and categorize recent methods into three primary types: node prediction, edge prediction, and graph prediction. We discuss challenges in network traffic analysis, summarize solutions from various methods, and provide practical recommendations for model selection. This review also compiles publicly available datasets and open-source code, serving as valuable resources for further research. Finally, we outline future research directions to advance this field. This work offers an updated understanding of GNN applications in network traffic analysis and provides practical guidance for researchers and practitioners. Full article
Show Figures

Figure 1

Back to TopTop