Big Data Analytics with Machine Learning for Cyber Security

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science & Information Systems, Bradley University, Peoria, IL 61625, USA
Interests: machine learning; IoT; cybersecurity; deep learning

E-Mail Website
Guest Editor
Department of Computer Science and Information Systems (CS&IS), Bradley University, Peoria, IL 61625, USA
Interests: applied machine learning; cryptography algorithms
Special Issues, Collections and Topics in MDPI journals
Department of Networks and Digital Media, Kingston University London, Kingston upon Thames, Surrey KT1 2EE, UK
Interests: cyber security; digital forensics; IoT; physical layer security; blockchain
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue focuses on big data analytics, the critical role of machine learning (ML) in it, and the possible security challenges in big data. In this data-driven era, organisations generate an unprecedented volume and variety of data from various sources such as hospitals, business transactions, social media interactions, IoT devices, sensors, and communication devices. Big data analytics refers to the process of extracting valuable insights and hidden patterns from large and complex datasets. When combined with machine learning (ML)/deep learning (DL), big data analytics becomes a meaningful and powerful tool for uncovering hidden patterns, predicting outcomes, and making data-driven decisions. The growing volume and variety of data generated from different sources pose significant challenges for traditional security apparatus. The successful combination of big data analytics with ML techniques offers a convincing solution to effectively detect, prevent, and respond to cyber threats in this complex environment or landscape. ML/DL algorithms can be trained to recognise normal behaviour and identify deviations that could signify suspicious activities or security breaches.

Big data analytics in cybersecurity involves processing and analysing massive datasets collected from various sources over time such as network traffic logs, system logs, application logs, sensor data, and security events. Similarly, we need to secure healthcare-related patient data in the internet of medical things (IoMT) against unauthorised access. The objective is to extract actionable insights and identify patterns that may indicate potential security issues or flaws, anomalies, malicious activities, or any other security-related concerns. Behavioural analytics is another aspect where ML/DL models come in handy. By analysing user behaviour, ML/DL algorithms can create profiles of normal activities and detect deviations that may indicate insider threats or compromised accounts. In this upcoming Special Issue, we invite submissions of original research or review articles on the topics and related areas listed below. We look forward to receiving your contributions as we aim to explore different research areas within (but not limited to) the following topics:

  1. Different security approaches of big data analytics;
  2. Privacy and security of big data using ML/DL/reinforcement learning/deep reinforcement learning;
  3. IoT and IoMT security;
  4. Security information and event management: tools, architecture, and methods;
  5. Cloud security analytics;
  6. Privacy-preserving data analysis;
  7. Predictive security analytics;
  8. Self-sovereign identity;
  9. Zero-day attacks and prevention methods;
  10. Open-source intelligence in cybersecurity applications;
  11. Cyberthreat intelligence and malware analysis;
  12. Big data security paradigms/architectures;
  13. Existing big data policy and protocols.

Dr. Babu Baniya
Dr. Sherif Abdelfattah
Dr. Deepak GC
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data analytics
  • cybersecurity
  • machine learning
  • deep learning
  • IoT

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 13081 KB  
Article
FedIFD: Identifying False Data Injection Attacks in Internet of Vehicles Based on Federated Learning
by Huan Wang, Junying Yang, Jing Sun, Zhe Wang, Qingzheng Liu and Shaoxuan Luo
Big Data Cogn. Comput. 2025, 9(10), 246; https://doi.org/10.3390/bdcc9100246 - 26 Sep 2025
Viewed by 363
Abstract
With the rapid development of intelligent connected vehicle technology, false data injection (FDI) attacks have become a major challenge in the Internet of Vehicles (IoV). While deep learning methods can effectively identify such attacks, the dynamic, distributed architecture of the IoV and limited [...] Read more.
With the rapid development of intelligent connected vehicle technology, false data injection (FDI) attacks have become a major challenge in the Internet of Vehicles (IoV). While deep learning methods can effectively identify such attacks, the dynamic, distributed architecture of the IoV and limited computing resources hinder both privacy protection and lightweight computation. To address this, we propose FedIFD, a federated learning (FL)-based detection method for false data injection attacks. The lightweight threat detection model utilizes basic safety messages (BSM) for local incremental training, and the Q-FedCG algorithm compresses gradients for global aggregation. Original features are reshaped using a time window. To ensure temporal and spatial consistency, a sliding average strategy aligns samples before spatial feature extraction. A dual-branch architecture enables parallel extraction of spatiotemporal features: a three-layer stacked Bidirectional Long Short-Term Memory (BiLSTM) captures temporal dependencies, and a lightweight Transformer models spatial relationships. A dynamic feature fusion weight matrix calculates attention scores for adaptive feature weighting. Finally, a differentiated pooling strategy is applied to emphasize critical features. Experiments on the VeReMi dataset show that the accuracy reaches 97.8%. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

20 pages, 4173 KB  
Article
AI-Based Phishing Detection and Student Cybersecurity Awareness in the Digital Age
by Zeinab Shahbazi, Rezvan Jalali and Maryam Molaeevand
Big Data Cogn. Comput. 2025, 9(8), 210; https://doi.org/10.3390/bdcc9080210 - 15 Aug 2025
Viewed by 2525
Abstract
Phishing attacks are an increasingly common cybersecurity threat and are characterized by deceiving people into giving out their private credentials via emails, websites, and messages. An insight into students’ challenges in recognizing phishing threats can provide valuable information on how AI-based detection systems [...] Read more.
Phishing attacks are an increasingly common cybersecurity threat and are characterized by deceiving people into giving out their private credentials via emails, websites, and messages. An insight into students’ challenges in recognizing phishing threats can provide valuable information on how AI-based detection systems can be improved to enhance accuracy, reduce false positives, and build user trust in cybersecurity. This study focuses on students’ awareness of phishing attempts and evaluates AI-based phishing detection systems. Questionnaires were circulated amongst students, and responses were evaluated to uncover prevailing patterns and issues. The results indicate that most college students are knowledgeable about phishing methods, but many do not recognize the dangers of phishing. Because of this, AI-based detection systems have potential but also face issues relating to accuracy, false positives, and user faith. This research highlights the importance of bolstering cybersecurity education and ongoing enhancements to AI models to improve phishing detection. Future studies should include a more representative sample, evaluate AI detection systems in real-world settings, and assess longer-term changes in phishing-related awareness. By combining AI-driven solutions with education a safer digital world can created. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

18 pages, 6190 KB  
Article
From Accuracy to Vulnerability: Quantifying the Impact of Adversarial Perturbations on Healthcare AI Models
by Sarfraz Brohi and Qurat-ul-ain Mastoi
Big Data Cogn. Comput. 2025, 9(5), 114; https://doi.org/10.3390/bdcc9050114 - 27 Apr 2025
Cited by 4 | Viewed by 1105
Abstract
As AI becomes indispensable in healthcare, its vulnerability to adversarial attacks demands serious attention. Even minimal changes to the input data can mislead Deep Learning (DL) models, leading to critical errors in diagnosis and endangering patient safety. In this study, we developed an [...] Read more.
As AI becomes indispensable in healthcare, its vulnerability to adversarial attacks demands serious attention. Even minimal changes to the input data can mislead Deep Learning (DL) models, leading to critical errors in diagnosis and endangering patient safety. In this study, we developed an optimized Multi-layer Perceptron (MLP) model for breast cancer classification and exposed its cybersecurity vulnerabilities through a real-world-inspired adversarial attack. Unlike prior studies, we conducted a quantitative evaluation on the impact of a Fast Gradient Sign Method (FGSM) attack on an optimized DL model designed for breast cancer detection to demonstrate how minor perturbations reduced the model’s accuracy from 98% to 53%, and led to a substantial increase in the classification errors, as revealed by the confusion matrix. Our findings demonstrate how an adversarial attack can significantly compromise the performance of a healthcare AI model, underscoring the importance of aligning AI development with cybersecurity readiness. This research highlights the demand for designing resilient AI by integrating rigorous cybersecurity practices at every stage of the AI development lifecycle, i.e., before, during, and after the model engineering to prioritize the effectiveness, accuracy, and safety of AI in real-world healthcare environments. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

49 pages, 17199 KB  
Article
Application of Symbolic Classifiers and Multi-Ensemble Threshold Techniques for Android Malware Detection
by Nikola Anđelić, Sandi Baressi Šegota and Vedran Mrzljak
Big Data Cogn. Comput. 2025, 9(2), 27; https://doi.org/10.3390/bdcc9020027 - 29 Jan 2025
Viewed by 1110
Abstract
Android malware detection using artificial intelligence today is a mandatory tool to prevent cyber attacks. To address this problem in this paper the proposed methodology consists of the application of genetic programming symbolic classifier (GPSC) to obtain symbolic expressions (SEs) that can detect [...] Read more.
Android malware detection using artificial intelligence today is a mandatory tool to prevent cyber attacks. To address this problem in this paper the proposed methodology consists of the application of genetic programming symbolic classifier (GPSC) to obtain symbolic expressions (SEs) that can detect if the android is malware or not. To find the optimal combination of GPSC hyperparameter values the random hyperparameter values search method (RHVS) method and the GPSC were trained using 5-fold cross-validation (5FCV). It should be noted that the initial dataset is highly imbalanced (publicly available dataset). This problem was addressed by applying various preprocessing and oversampling techniques thus creating a huge number of balanced dataset variations and on each dataset variation the GPSC was trained. Since the dataset has many input variables three different approaches were considered: the initial investigation with all input variables, input variables with high feature importance, application of principal component analysis. After the SEs with the highest classification performance were obtained they were used in threshold-based voting ensembles and the threshold values were adjusted to improve classification performance. Multi-TBVE has been developed and using them the robust system for Android malware detection was achieved with the highest accuracy of 0.98 was obtained. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

18 pages, 1889 KB  
Article
DBSCAN SMOTE LSTM: Effective Strategies for Distributed Denial of Service Detection in Imbalanced Network Environments
by Rissal Efendi, Teguh Wahyono and Indrastanti Ratna Widiasari
Big Data Cogn. Comput. 2024, 8(9), 118; https://doi.org/10.3390/bdcc8090118 - 10 Sep 2024
Cited by 6 | Viewed by 2675
Abstract
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced [...] Read more.
In detecting Distributed Denial of Service (DDoS), deep learning faces challenges and difficulties such as high computational demands, long training times, and complex model interpretation. This research focuses on overcoming these challenges by proposing an effective strategy for detecting DDoS attacks in imbalanced network environments. This research employed DBSCAN and SMOTE to increase the class distribution of the dataset by allowing models using LSTM to learn time anomalies effectively when DDoS attacks occur. The experiments carried out revealed significant improvement in the performance of the LSTM model when integrated with DBSCAN and SMOTE. These include validation loss results of 0.048 for LSTM DBSCAN and SMOTE and 0.1943 for LSTM without DBSCAN and SMOTE, with accuracy of 99.50 and 97.50. Apart from that, there was an increase in the F1 score from 93.4% to 98.3%. This research proved that DBSCAN and SMOTE can be used as an effective strategy to improve model performance in detecting DDoS attacks on heterogeneous networks, as well as increasing model robustness and reliability. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

Back to TopTop