Diverse Intrusion and Malware Detection: AI-Based and Non-AI-Based Solutions

In today’s interconnected world, the need for robust intrusion and malware detection and prevention has never been more critical [...]

In today's interconnected world, the need for robust intrusion and malware detection and prevention has never been more critical.Despite significant advances in developing intrusion detection systems (IDS), achieving holistic security continues to be an evolving challenge.The sophistication of cyber threats necessitates IDS models that are both robust and adaptable.
In 2010, a seminal work by Robin Sommer and Vern Paxson discussed the challenges of applying machine learning (ML) for anomaly detection [1].They highlighted limitations such as imbalanced training cases and high false-positive rates, as well as the importance of feature selection.In response, researchers have explored various ML algorithms to enhance IDS for detecting cyber attacks, such as spam classification [2,3], malware detection [4], and intrusion detection [5][6][7][8].
Since AI models require vast amounts of data for training, to effectively utilize these models in intrusion detection systems, we need to address two main challenges: feature selection and data imbalance [1].
Feature selection is the process of identifying the most relevant features for use in building a machine learning model.The accuracy of ML-based methods is heavily influenced by the quality of the feature space [20][21][22][23].Therefore, developing efficient feature selection techniques is crucial for optimizing detection accuracy.
Data imbalance is another significant challenge.IDS datasets often contain far fewer examples of malicious traffic compared to normal traffic.Training models on such imbalanced datasets can lead to poor detection performance [24].Techniques such as random oversampling and the Synthetic Minority Oversampling Technique (SMOTE) are commonly used to create balanced datasets from imbalanced data, thereby improving model accuracy [25][26][27].
While AI-based detection models have made significant strides, there remains a crucial need for robust non-AI-based solutions.The opaque nature of AI models, often referred to as "black boxes", can make it difficult to understand their decision-making processes.Non-AI approaches, based on well-defined rules and logic, provide greater explainability and transparency.These solutions can complement AI-based defenses by offering clear and understandable detection mechanisms, which are particularly valuable in scenarios like malware detection and identifying malicious behaviors.
A comprehensive IDS should leverage a diverse set of tools, incorporating both AI and non-AI solutions.By understanding and integrating the strengths and limitations of each approach, we can develop a more resilient and effective defense against the continuously evolving threat landscape.This Special Issue [28] is dedicated to advancing the field of intrusion and malware detection across various network environments, including future Internet architectures, 5G and beyond wireless networks, enterprises, data centers, edge and cloud networks, software-defined networking (SDN), optical networks, and IoT-scale networks.From the 23 manuscripts submitted to this Special Issue, 10 were rigorously reviewed and accepted for publication.These contributions, listed below, reflect the diverse and innovative approaches in both AI and non-AI realms.As shown in Table 1, this Special Issue addresses a wide spectrum of topics within intrusion and malware detection and prevention.These topics include feature reduction, feature selection, handling imbalanced data, addressing new threats, ensuring data privacy, and introducing new infrastructure for public key infrastructure (PKI).The proposed methods are evaluated using a variety of datasets, ensuring robust and comprehensive analysis.The majority of the contributions (1, 2, 3, 4, 5, 6, 7) focus on AI-based intrusion and malware detection and data privacy, while three specifically explore non-AI-based solutions.This balanced approach underscores the importance of integrating both AI and non-AI methodologies to develop more effective and transparent intrusion detection systems.
IDSs rely on large feature sets, but some features contain irrelevant and redundant information, which increases computational complexity and decreases accuracy.There are four papers in this Special Issue that address feature selection, feature reduction, and overfitting.Ghani et al., in their first paper, propose a hybrid dimensionality reduction system that combines feature selection and feature extraction.They employ the Recursive Feature Elimination (RFE) technique to identify and eliminate irrelevant or redundant features from the initial dataset.Subsequently, they use Principal Component Analysis (PCA) to transform the remaining features into a lower-dimensional representation while preserving the most important information.Their system successfully reduces the original 41 features to a more manageable set of 15 components.Importantly, the classification performance, using an ensemble of Support Vector Classifier (SVC), K-Nearest Neighbor (KNN), and Deep Neural Network (DNN) classifiers, remains robust, indicating that the reduced and transformed features do not significantly compromise the system's ability to detect network intrusions compared to using the full feature set.
In their second paper, Ghani et al. propose a deep learning-based approach for network intrusion detection utilizing a Feedforward Neural Network (FFNN).They focus on achieving high classification accuracy with a reduced feature vector.Their approach demonstrates that a smaller, more targeted feature vector can be equally effective in detecting network traffic anomalies within datasets like UNSW-NB15 and NSL-KDD.This not only improves classification accuracy but also reduces the computational power required for analysis.Ghosh et al. propose a statistical approach utilizing cluster-based entropy analysis on selected network traffic features.They focus on features such as packet size, interpacket interval, packet process time, and two additional Modbus application protocol header features: Modbus frame length and function code value.Their classification-based analysis reveals that incorporating the two Modbus-specific features along with the three TCP/IP features significantly improves classification accuracy for DOS attacks compared to MITM attacks.
Ahmadi et al. address the challenges of feature reduction and model overfitting in network intrusion detection.They conduct experiments using a subset of the CSE-CIC-2018 dataset, evaluating various feature reduction approaches, including Linear Regression, Boruta, Random Forest with IncMSE, Random Forest with IncNodePurity, LASSO, and autoencoders.To assess the effectiveness of each approach in mitigating overfitting, they calculate the Root-Mean-Squared Error (RMSE) between the training and testing datasets for each model combined with a Decision Tree classifier.Their findings reveal that the combination of a Decision Tree classifier and features reduced using autoencoders achieves the lowest RMSE, indicating the most effective reduction in overfitting among all the tested scenarios.
Data imbalance is another significant challenge in training machine learning models for IDS.Imbalanced datasets occur when there are significantly fewer examples of malicious attacks compared to normal network traffic, leading to detection inaccuracies.One paper specifically focuses on addressing this issue.
Abdelmoumin et al. investigate the impact of various techniques on data imbalance.Their work focuses on three main approaches: oversampling (increasing minority class examples), undersampling (decreasing majority class examples), and generating new synthetic samples for the minority class using generative methods.They evaluate these techniques by analyzing their impact on the performance and prediction accuracy of the models.They measure how well the trained models perform on balanced datasets compared to imbalanced ones, and assess the robustness of the models to new attacks that share similarities with existing ones.By investigating these techniques, Abdelmoumin et al. aim to identify the most effective methods for mitigating data imbalance and improving the overall performance and robustness of machine learning-based IDS.
There are two papers introducing noticeable threats: ScriptBlock Smuggling and data exfiltration.
ScriptBlock Smuggling is a novel threat that manipulates PowerShell and .NET environments to bypass the Antimalware Scan Interface (AMSI) on Windows operating systems.AMSI is crucial for malware detection, but ScriptBlock Smuggling exploits vulnerabilities to evade it.This threat hinges on manipulating ScriptBlocks, which are fundamental units of PowerShell code.By altering ScriptBlocks within their Abstract Syntax Tree (AST), attackers can create a dual representation.One representation caters to the compiler for normal execution, while the other is specifically designed to deceive antivirus software and log analysis tools.This allows malicious code to bypass AMSI detection and renders traditional memory patching bypass methods ineffective.
The research by Rose et al. delves into the inner workings of ScriptBlock creation within PowerShell, analyzes its built-in security features, and exposes critical limitations in AMSI's ability to scrutinize ScriptBlocks effectively.Furthermore, it explores the implications of log spoofing as an integral part of this evasion method.These findings highlight potential avenues for attackers to exploit these weaknesses, suggesting the emergence of a new class of techniques to bypass AMSI and manipulate logs.To address this growing threat, the paper proposes a synchronization strategy for ASTs, aiming to unify the processes of code compilation and malware scanning.This strategy could ultimately reduce the attack surface within PowerShell and .NET environments.
Data exfiltration is a cyberattack where unauthorized individuals steal or copy sensitive information.Examples include credit card numbers, Social Security numbers, and personal details exposed in the 2017 Equifax breach, where the data of 143 million Americans was compromised [29].Li et al. propose a novel approach to detect data exfiltration using network graphs.Their method leverages the concept of network topology, which maps the connections and data flow within a network.Then, the topology information is incorporated in a statistical model to detect anomalies.More specifically, hourly HTTP data are aggregated to construct graphs.Nodes represent source and destination IP addresses, while edges represent the total byte volume transferred between them.Nodes are then categorized as servers or hosts based on their port numbers, resulting in bipartite graphs.Exponential random graph models (ERGMs) are employed to convert the network's topological features into a time series.Finally, Autoregressive Moving Average (ARMA) is used to identify deviations in the time series, potentially indicating malicious exfiltration attempts.This approach offers valuable insights into network behavior and can aid cybersecurity analysts in making informed decisions alongside existing intrusion detection systems.
Muharti and Rawat propose a data analytics-driven network anomaly detection model uniquely complemented by a visualization layer, providing real-time insights into cyber attacks and their defenses.This approach utilizes network scanning tools and discovery services to visualize the network by identifying live IP-based devices.A data analyticsbased intrusion detection system scrutinizes all network connections, and mitigation measures are initiated by visually distinguishing malicious from benign connections using red and blue hues, respectively.
One paper addresses and mitigates the vulnerabilities of centralized certificate verification using blockchain technology.The centralization of existing PKI systems introduces significant vulnerabilities, as a compromised CA can issue unauthorized certificates and access sensitive information.Halder et al. address and mitigate these vulnerabilities through decentralized certificate verification using blockchain technology.They present a decentralized public key infrastructure (PKI) based on a distributed trust model, such as the Web of Trust (WoT) and blockchain technologies, to overcome issues like single points of failure and prevent tampering with existing certificates.Additionally, their infrastructure establishes a trusted key-ring network that decouples the authentication process from CAs, enhancing secure certificate issuance and accelerating the revocation process.Their experimental results demonstrate the effectiveness of this proposed system in practice, despite incurring additional overhead compared to conventional PKIs.
Finally, there is one paper addressing IoT data privacy leakage.Wang et al. consider the challenge of protecting the privacy of IoT devices by transforming time series datasets.The transformed datasets retain the intrinsic value of the original IoT data while maintaining data utility.This approach enables non-expert data owners to better understand and evaluate the potential device-level privacy risks associated with their IoT data, while simultaneously offering a reliable solution to mitigate their concerns about privacy violations.

Table 1 .
Analysis of the published contributions in the Special Issue.