Machine Learning for APT Detection

AL-Aamri, Abdullah Said; Abdulghafor, Rawad; Turaev, Sherzod; Al-Shaikhli, Imad; Zeki, Akram; Talib, Shuhaili

doi:10.3390/su151813820

Open AccessArticle

Machine Learning for APT Detection

by

Abdullah Said AL-Aamri

¹,

Rawad Abdulghafor

^1,2,*

,

Sherzod Turaev

^3,*

,

Imad Al-Shaikhli

¹,

Akram Zeki

¹ and

Shuhaili Talib

¹

Department of Computer Science, Faculty of Information and Communication Technology, International Islamic University Malaysia, Kuala Lumpur 53100, Malaysia

²

Faculty of Computer Studies (FCS), Arab Open University-Oman, Muscat P.O. Box 1596, Oman

³

Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain 15551, United Arab Emirates

^*

Authors to whom correspondence should be addressed.

Sustainability 2023, 15(18), 13820; https://doi.org/10.3390/su151813820

Submission received: 22 February 2023 / Revised: 13 July 2023 / Accepted: 10 August 2023 / Published: 16 September 2023

(This article belongs to the Special Issue Advances in Machine Learning Technology in Information and Cyber Security)

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, countries face a multitude of electronic threats that have permeated almost all business sectors, be it private corporations or public institutions. Among these threats, advanced persistent threats (APTs) stand out as a well-known example. APTs are highly sophisticated and stealthy computer network attacks meticulously designed to gain unauthorized access and persist undetected threats within targeted networks for extended periods. They represent a formidable cybersecurity challenge for governments, corporations, and individuals alike. Recognizing the gravity of APTs as one of the most critical cybersecurity threats, this study aims to reach a deeper understanding of their nature and propose a multi-stage framework for automated APT detection leveraging time series data. Unlike previous models, the proposed approach has the capability to detect real-time attacks based on stored attack scenarios. This study conducts an extensive review of existing research, identifying its strengths, weaknesses, and opportunities for improvement. Furthermore, standardized techniques have been enhanced to enhance their effectiveness in detecting APT attacks. The learning process relies on datasets sourced from various channels, including journal logs, traceability audits, and systems monitoring statistics. Subsequently, an efficient APT detection and prevention system, known as the composition-based decision tree (CDT), has been developed to operate in complex environments. The obtained results demonstrate that the proposed approach consistently outperforms existing algorithms in terms of detection accuracy and effectiveess.

Keywords:

APT; attacks; artificial intelligence; CDT

1. Introduction

In recent years, the concept of intelligence has become closely associated with learning abilities. Through the process of learning, an intelligent system can improve its performance in performing tasks by gaining experience and adapting to new challenges. Traditionally, building an intelligent system involved manually programming it to perform specific tasks such as playing chess, character recognition, or medical diagnosis. However, this manual approach has its limitations and cannot keep up with the rapid advancements in technology.

In today’s era, people rely heavily on computer systems for various tasks, seeking fast, convenient, and reliable communication channels between service providers and consumers [1]. Similarly, the technical and support teams face a significant increase in daily troubleshooting cases that require prompt and flawless resolution. Failure to address these cases promptly may lead to user frustration or further issues, such as data theft or service disruption [2,3]. Therefore, there is a growing need for intelligent systems to aid in accelerating daily tasks, including the detection and prevention of cyber threats [4].

Understanding the weaknesses of a network is crucial for detecting and mitigating cyberattacks effectively. The monitoring and defense teams must comprehend the motives and targeted information behind the attack to devise appropriate response strategies [5,6]. Among the various types of cyber threats, advanced persistent threats (APTs) pose a particularly significant challenge. APTs are prolonged and targeted cyberattacks that aim to gain unauthorized access to networks and remain undetected for extended periods. Instead of causing immediate damage, APTs focus on monitoring network activity and stealing sensitive data, making them especially dangerous for sectors such as national defense, industry, and finance [7,8].

APT groups employ advanced attack techniques, including exploiting “zero-day” vulnerabilities, spear-phishing, and other social engineering methods. These groups constantly adapt their malicious code and employ sophisticated concealment techniques to maintain access to the targeted network without detection. APT attacks are often backed by nation-states or organized crime groups seeking to gain a competitive advantage or financial profit [9,10,11]. Detecting APT attacks is challenging because they rely on behavioral patterns rather than easily recognizable signatures [7].

While APT attacks may be difficult to detect, data exfiltration serves as potential evidence of an attack. Cybersecurity professionals strive to identify anomalies in outgoing data as an indication of a network breach. With the increasing sharing of sensitive information and data transmission across digital environments, the assurance of information security and dependable data transmission becomes a critical concern, particularly in ad hoc architectures and smart cities [12].

Commercial companies also recognize the significance of digital content delivery in smart cities. Given the prevalence of digital data, it is crucial to focus on secure architecture and interfaces between network layers to mitigate APT attacks. Understanding process behaviors, and detecting and predicting APT behaviors are essential in delivering reliable services [7].

Building an intelligent environment within a computerized framework necessitates a robust and trustworthy system that ensures secure digital content delivery. This study aims to develop a dedicated system based on a secure architecture that leverages machine learning analytics and specialized hardware. Continuously monitoring the system’s performance is essential to assess its acceptability and success in society. While AI has seen its slow adoption in the digital commercial industry, it has made significant advancements in scientific services. This research defines a set of constraints to measure specific parameters and contribute to the advancement of available technology [13,14].

The main objectives of this research are as follows:

I.: Identify methodological issues that hinder the detection of APT attacks.
II.: Enhance existing standardized techniques for detecting APT attacks.
III.: Develop an efficient autonomous APT detection model.
IV.: Evaluate and compare the results of the autonomous APT detection model with currently used algorithms.

The contributions of this research can be summarized as follows:

-: Design an autonomous system for feeding up SNORT rules to detect APT attacks.
-: Develop a highly accurate prediction model for APT attacks.
-: Design an algorithm for detecting threats by analyzing data at rest.

With these contributions, this research aims to advance the technology available to humanity and improve the detection and prevention of APT attacks.

2. Related Works

Previous research on advanced persistent threat (APT) attack detection methods has identified limitations such as the inability to detect real-time attacks and a high rate of false attack detection. APT attacks utilize advanced techniques to remain hidden for extended periods, making them difficult to detect using traditional intrusion detection systems (IDS) that rely on fixed signatures or identifiers. Researchers have found that machine learning (ML) techniques show promise in effectively and accurately detecting APT attacks [15,16,17,18,19].

ML for Detecting APT:

A literature review of studies on using ML for APT detection reveals that many approaches utilize a combination of features, including network traffic data, system logs, and user behavior, to train ML models. Some studies employ supervised learning methods such as decision trees and random forests, while others explore unsupervised techniques such as clustering and anomaly detection.

Unsupervised learning approaches have demonstrated the ability to accurately identify known APT attacks and uncover previously unknown attacks. Supervised learning methods have achieved high accuracy in detecting APTs, although they may struggle with balancing false-positive and false-negative rates.

Overall, the literature suggests that ML can be an effective tool for detecting APTs due to its ability to analyze large amounts of data and identify attack patterns. However, the choice of features, algorithms, and evaluation methods should be tailored to the specific attack scenario, taking into consideration the trade-off between detection accuracy and false alarms.

SVM:

Horng et al. [20] presented an intrusion detection system utilizing a triple-structured approach based on support vector machines (SVM). Their method employed feature selection, SVM, and a hierarchical clustering algorithm to enhance SVM performance. The clustering algorithm provided more relevant training examples to the SVM, reducing training time and improving overall accuracy. The reported accuracy rate of 95.72% surpassed that of other intrusion detection systems. However, it remains unclear if this technique has been tested on diverse datasets or network environments, limiting the generalizability and robustness of the results.

Hasani, Othman, and Mousavi [21] proposed a hybrid model combining the linear genetic algorithm and bee algorithms to improve cyber threat detection rates and reduce false alarms. They found that the SVM algorithm effectively addressed intrusion detection system challenges. The study utilized a dataset from which four sample datasets containing 4000 random records were extracted. The feature subcategory provided by LGP_BA demonstrated superior data representation. It is worth noting that the study only focused on SVM and could benefit from incorporating additional algorithms.

In another study [22], researchers proposed a hybrid approach combining SVM and the genetic algorithm (GA). This approach achieved improved accuracy (98.33%) compared to that when using SVM alone. However, the addition of the GA algorithm might lead to longer processing times and increased complexity in implementation.

Gupta and Shrivastava [23] proposed a method combining bee colony and SVM algorithms to enhance intrusion detection accuracy. The experimental results showed an average accuracy rate of 88.46%, which is relatively low compared to that of other studies reporting accuracy rates over 90%.

Al-Yaseen, Othman, and Nazri [24] developed a multi-level hybrid model that combined SVM and extreme learning machine (ELM) for real intrusion detection problems. This approach demonstrated high efficiency and achieved the best accuracy performance with a score of 95.75% compared to that of other studies. However, the study only employed two algorithms (SVM and ELM) and a single dataset, which may have limited the generalizability of the results to detect other types of intrusions or attacks. Additionally, the study focused solely on accuracy as a performance measure, neglecting other important factors such as false-positive rates and detection time.

Kaveh and Dadras [25] proposed a solution addressing real-time issues using a three-stage structure combining K -nearest neighbors (K-NN), naïve Bayes, and the one-class support vector machine (OSVM). The K-NN algorithm proved efficient for solving multi-label classification problems, achieving an accuracy of 95.77%.

NSL-KDD Dataset SVM Detection System:

Salama [21] investigated advanced threats using the NSL-KDD dataset and combined a deep belief network (DBN) technique with SVM. The combined method achieved a 92.84% detection rate with 40% dataset training, outperforming the non-combined DBN and SVM. However, this study was conducted on a non-live extracted dataset.

Chu and colleagues [26] used the NSL-KDD dataset to detect attacks and applied principal component analysis (PCA) for dataset size reduction. The SVM algorithm with a radial basis function (RBF) kernel yielded better performance compared to that of other classification algorithms, achieving a detection accuracy rate of 97.22%. It would be valuable to determine if the proposed method can detect different types of attacks or is specific to certain ones.

Perez and colleagues [27] proposed an intrusion detection approach using ML algorithms (SVM, J48, naïve Bayes, and decision tree) to classify the KDD-99 dataset. Information gain was employed for feature selection. The experiments revealed that the J48 decision tree method combined with the AdaBoost method achieved the highest accuracy of 97% compared to that of other methods. However, this approach was solely tested on the KDD-99 dataset, which may not represent other datasets or network environments, limiting its generalizability. Additionally, the use of a single feature selection method and a limited number of algorithms may impact its effectiveness in different scenarios.

Rakha and colleagues [28] proposed a deep learning-based method for intrusion detection using the NSL-KDD dataset. Their iterative neural network model demonstrated improved performance and accuracy (83.28%) compared to those of traditional methods such as SVM, ANN, and J48. However, this approach was tested on a specific dataset and may not perform as well on other data types or in different environments. Additionally, deep learning models can be computationally expensive and require a large amount of data for training, which may not be feasible for some organizations.

NSL-KDD dataset:

Another study [22] focused on feature selection for intrusion detection systems (IDS) using the NSL-KDD dataset. The study aimed to extract irrelevant and redundant features that could hinder the detection process and degrade system performance. IG (information gain), CFS (correlation-based feature selection), and GR (gain ratio) were evaluated to build an effective and efficient IDS. The proposed method employed the decision tree classifier algorithm for assessing the feature reduction method. The results demonstrated higher efficiency for IDS using this method. However, the study did not consider time as a crucial parameter in threat exploration, and the classifier’s true- and false-positive rates (TPR and FPR) were not specified.

Furthermore, the authors of ref. [29] developed a multi-layer hybrid machine learning IDS using different classification algorithms, including decision trees, naïve Bayes, and multilayer perceptron neural networks, to detect anomalies in traffic. The NSL-KDD dataset was employed in these investigations, with naïve Bayes achieving the best accuracy. The decision tree classifier (J48) achieved an accuracy of approximately 65% for probe attacks and 82% for DoS attacks. This study focused on anomaly traffic flow and did not address threats during data at rest.

Additionally, the authors of ref. [30] evaluated the performance of the NSL-KDD dataset for detecting persistent threats using the artificial neural network (ANN). The intrusion detection rate achieved was 81.2%, with an accuracy of 79.9% for attack-type classification. The approach also achieved a higher detection accuracy rate of 75.49% compared to that of five classes (i.e., Probe, DoS, U2R, R2L attacks, and normal status) and the self-organization map (SOM) method. However, the study did not extensively explore supervised ML algorithms, which could enhance the accuracy percentage and detection rate.

Singh, Kumar, and Singla [10] developed a method for intrusion detection using the online sequential extreme learning machine (OS-ELM) algorithm. They utilized alpha and beta profiling to reduce the time complexity of irrelevant attribute selection and decrease the training dataset’s size. The proposed technique exhibited effectiveness in detecting network attacks, achieving high accuracy, low false positive rates, and fast detection times. However, this technique may not handle large and complex datasets effectively, as it relies on alpha and beta profiling to reduce dataset size and time complexity. Additionally, the technique is based on a specific machine learning algorithm, the online sequential extreme learning machine (OS-ELM), which may not be suitable for all datasets and applications. This may limit the generalizability and applicability of the technique in different contexts and environments.

A notable study on the Morris worm [31] provided a comprehensive overview of its origins and effects from the perspective of wishful thinking.

To conclude, one limitation of the K -nearest neighbors (K-NN) approach is its reliance on a single classification algorithm, which may not be suitable for all data types and could lead to lower accuracy in certain datasets. Furthermore, the approach heavily relies on feature selection, which may not always be effective in identifying relevant features for classification, resulting in lower accuracy rates.

In the context of Advanced Persistent Threat (APT) detection, a variety of technologies have been explored to address deficiencies in current methods [32,33,34,35,36,37,38,39,40]. Table 1 provides an overview of these technologies, showcasing their impact on APT detection and the specific methodologies they employ.

3. Methodology

The objective of this study is to develop an advanced persistent threat (APT) detection system using machine learning techniques to distinguish between normal and abnormal network traffic patterns. One advantage of this approach is that, even if the APT attack has a unique signature that is unknown, the abnormal behavior that occurs after the attack will deviate from standard traffic patterns, enabling the system to detect and respond to the anomaly. We employ commonly used supervised machine learning techniques, such as the support vector machine (SVM) and k-means, implemented through MS Azure Cloud (Figure 1), to achieve real-time and accurate anomaly detection. The process is illustrated in the diagram below.

Research questions serve as a guide for conducting a literature review and provide answers and solutions to the research problem. The research questions for this study are as follows:

What methodological issues could hinder the detection of APT attacks?
How can the existing standardized techniques for detecting APT attacks be enhanced?
How can an efficient autonomous APT detection model be developed?
How can the results of the autonomous APT detection model be evaluated and compared to those of the currently used algorithms?

3.1. Dataset Preprocessing

Data cleaning: This step involves removing missing, duplicate, or irrelevant data from the dataset. The WEKA tool was utilized to eliminate non-valuable values from the dataset by applying the One-Anova test.

Data transformation: This step includes normalizing or scaling the data to ensure that all features have the same weight in the analysis. MS Azure Cloud was used to convert the raw data into a .CSV format for further processing. This allowed for the easy importation and manipulation of the data for use in an ML-based APT detection system.

Data reduction: This step involves reducing the dimensionality of the dataset by removing any irrelevant or redundant features. In this case, the data were further filtered by date to decrease the number of logs generated by the system.

Data integration: This step involves combining multiple datasets or sources of data to create a more comprehensive dataset for analysis. This process provides a consistent and accurate view of data across an organization, which can help support better decision-making, improve operational efficiency, and enable data-driven business strategies. An example of this stage is illustrated in Figure 2.

Data discretization: This step involves converting continuous data into categorical data for easier analysis. It is carried out by dividing the data into intervals or bins and assigning each value to the corresponding bin. Data discretization aims to reduce the number of values in the dataset and make it more manageable. Various algorithms, such as k-means clustering or decision trees, are applied in this research to perform data discretization.

Data splitting: This step involves dividing the dataset into training, validation, and testing sets for model evaluation and selection.

Feature selection: This step involves identifying and selecting the essential features from the dataset that will be used in the ML model.

Data augmentation: This step involves generating new samples from the existing dataset to increase the dataset’s size and improve the model’s performance.

The Figure 3 illustrates the steps taken to process the dataset using the ML-SVM algorithm.

3.2. Dataset Used

The data integration method refers to the process of combining data from multiple sources into a single, unified view. This can include techniques such as data warehousing, data federation, and data virtualization. On the other hand, data discretization methods are techniques used to convert continuous data into discrete or categorical data. This can include techniques such as binning, clustering, and decision trees.

For this research, a dataset consisting of 57 features of network packet files recorded at various time intervals was used. The dataset underwent preprocessing steps, including the removal of non-valuable values using the One-Anova test in WEKA and converting the raw data into a .CSV format using MS Azure Cloud. Additionally, the dataset was customized by date to reduce the amount of logs generated by the system.

The dataset features include the sequence number, time, source and destination IP addresses, protocol, length of the packet, and additional packet details. It is divided into five main categories: live system logs, journal logs, and different network appliance logs.

To analyze APT attacks, the raw data files are preprocessed. Using provided filter rules, each pcap data file is loaded, the corresponding filtering rule is applied, and the attack data are exported to one CSV file while the normal data is saved to another CSV file. The attacks are labeled, the two CSV files are combined, and any redundant data are removed. This process is repeated for all pcap files. The resulting CSV data files range between 20 MB and 463 MB in size after labeling and dropping redundant data. Figure 4 provides details of this process.

Details of the dataset:

-: Number of instances: The exact number of instances in the dataset is not specified in the given text.
-: Number of labels: The number of labels or classes in the dataset is not mentioned.
-: Features: The dataset contains 57 features related to network packet files, including the sequence number, time, source and destination IP addresses, protocol, packet length, and additional packet details.
-: Categories: The dataset is divided into five main categories, which are live system logs, journal logs, and different network appliance logs.

The dataset underwent various preprocessing steps, including data cleaning, data transformation, data reduction, data integration, data discretization, data splitting, feature selection, and data augmentation.

The provided figures (Figure 4 and Figure 5) illustrate examples of the used datasets in MS Azure, including wireless capture data.

It is important to note that the exact size and specific details of the dataset, such as the number of instances and labels, are not provided in the given text. More information about the dataset may be available in the original research paper or the specific dataset source.

3.3. Model Development

This study proposes a live-case attack simulation for detecting advanced persistent threats (APTs) using expert systems and machine learning solutions. The approach aims to be highly sensitive to changes and live flows in order to effectively detect APTs. The process includes various steps, such as univariate and linear regression analysis, feature selection and extraction, and the utilization of real-life APT history to identify anomalous entries in a given flow. Different datasets from both literature reviews and local frameworks are employed. The process also involves manually labeling captured anomalous points and utilizing a decision tree-like model, which is fed to the naïve bayes algorithm to generate a decision rule for APT detection. This rule is then applied to an open-source intrusion detection system (IDS) for real-time APT detection.

A live-case attack is performed where the flow is captured in real-time and sent to the monitoring environment. This process involves the following steps and is graphically presented in Figure 6:

-: Flow capturing (preliminary dataset)
-: Tiny manual classification (signature-based and behavior scenarios to classify)
-: Feature extraction and selection (data cleaning and suspicious features extraction)
-: Training and model generation (supervised learning and deep learning algorithms for higher detection accuracy)
-: Rule generation and IDS feed

The process starts with capturing the flow in real-time and sending it to the monitoring environment. Next, small manual classification is performed to classify the packets based on their signature and behavior scenarios. This is followed by feature extraction and selection, which involves data cleaning and extracting suspicious features. The model is then trained using supervised learning and deep learning algorithms to achieve higher detection accuracy. The final step is to generate rules and feed them to the IDS, which is used for real-time APT detection and response. The entire process is visually represented in Figure 6.

Multiple studies were examined, and practical tests were conducted. These tests involved summarizing the conclusion of a recognizable feature to automate the generation of rules for digital anomaly detection. These rules were then integrated with pre-set network filters. Consequently, the SNORT filter was employed, and the syntax of the rules was adjusted to enhance APT detection. New datasets were employed to evaluate the outcomes of the defined framework using the proposed methodology.

To assess the efficacy of this research, specialized datasets containing real APT cases were employed. Ultimately, a comparative analysis of the results and their effectiveness was necessary to substantiate the success of the research. The outlined methodology is illustrated in Figure 7 below.

The following tools and programs can be utilized for the analysis and development of the research:

-: WEKA: A popular open-source machine learning software used for data preprocessing, cleaning, transformation, and feature selection. It provides a collection of algorithms for data mining and analysis.
-: MS Azure Cloud: A cloud computing platform offering various services for data storage, processing, and analysis. In this research, it is used for data transformation, such as converting raw data into a CSV format.
-: SVM (support vector machine): A supervised machine learning algorithm used for classification and regression analysis. It can be applied in this research for anomaly detection and distinguishing between normal and abnormal network traffic patterns.
-: k-means: A clustering algorithm used for unsupervised machine learning. It can be utilized in this research for data reduction and clustering analysis.
-: Python: A popular programming language with various libraries and frameworks for data analysis and machine learning. Python can be used for implementing the proposed APT detection model and developing the algorithms.
-: SNORT: An open-source network intrusion detection and prevention system. The research mentions the design of an autonomous process of feeding up SNORT rules for APT detection.
-: NEMS (nano-electromechanical systems): As mentioned in the text, NEMS can be implemented to enhance digital content delivery within data networks. It involves specialized hardware and software adjustments.
-: Decision trees: A machine learning algorithm that uses a tree-like model for making decisions or predictions. Decision trees can be used for data discretization and feature selection.
-: Data analysis tools: Various data analysis tools and techniques can be employed for analyzing the generated datasets, including statistical analysis, visualization tools, and data exploration methods.

It is important to note that the specific tools and programs used may vary depending on the researchers’ preferences and the specific requirements of the study.

3.4. Justification of ML Techniques and Use of Network Traffic Data

In order to ensure that the developed model is specific to attacks of advanced persistent threat (APT), despite relying on network traffic data, several measures have been implemented. These measures aim to capture the distinctive characteristics of APT attacks and enhance the model’s ability to differentiate them from other types of attacks.

Firstly, the dataset used in this study is carefully selected to primarily consist of network traffic data known to contain APT attacks. This dataset can be obtained from sources that have experienced APT attacks or can include simulated APT attacks. By focusing on APT-specific data, we aim to capture the unique patterns and behaviors associated with these attacks.

To further enhance the specificity of the model, a rigorous process of feature engineering is conducted. Features are selected based on their relevance to APT attack behaviors in network traffic. These features can include indicators of compromise (IOCs), known APT attack patterns, and behavior-based metrics that differentiate APT attacks from other types of attacks. By incorporating these APT-specific features, the model is trained to identify and detect the characteristic patterns indicative of APT attacks.

Additionally, the dataset used for training the model undergoes a meticulous labeling process carried out by domain experts with expertise in APT attacks. These experts possess the knowledge and experience to accurately identify instances of APT attacks within the network traffic data. Their expert labeling helps establish the ground truth for training the model and ensures that it learns to recognize APT-specific behaviors.

Furthermore, the trained model is thoroughly evaluated and validated using various metrics and techniques. It is tested on different datasets, including unseen data or data from different environments, to assess its effectiveness in specifically detecting APT attacks. The evaluation process involves comparing the model’s performance against known APT attack instances as well as other types of attacks to ensure its specificity and robustness.

By implementing these measures, including careful dataset selection, APT-specific feature engineering, expert labeling, and rigorous evaluation, the model is designed to be specific to APT attacks despite relying on network traffic data. These steps enable the model to capture the unique characteristics and behaviors exhibited by APT attacks, effectively distinguishing them from other types of attacks.

In summary, the combination of dataset selection, APT-specific feature engineering, expert labeling, and thorough evaluation ensures that the developed model is tailored to detect APT attacks accurately and reliably based on network traffic data.

4. Results and Discussion

The proposed enhanced algorithm, called CDT, was evaluated and compared to existing algorithms in the literature using a dataset. The results, as shown in Table 2, indicate that the proposed model outperformed the existing algorithms in both detecting malicious attacks and benign attacks. For example, the precision of the proposed model in detecting malicious attacks was 96%, which was consistent with that of existing algorithms such as PRISM, JRip, and OneR. However, the proposed model had higher precision in detecting benign attacks, with a value of 50% compared to that of 0% for OneR, 10% for JRip, and 13.6% for PRISM. The average precision estimate for the proposed model was 94.3%, which was higher than that of the existing algorithms. The evaluation of the CDT algorithm was conducted using the WEKA software (Ver: 3.8.6) and the output of algorithm number 3, which is described in detail in the research.

This research contributes to the body of knowledge in several ways. Prior to presenting the added value of this research, it is important to first understand how the currently available algorithms function. Existing algorithms, as reported in the literature, work as classification methods. For example, once trained, existing algorithms can determine whether APT attacks are benign or malicious. However, these algorithms are limited in the sense that they cannot provide insights into the breadth and depth of these attacks. In this research, we addressed this gap by proposing a three-stage model.

In the first stage, we used the algorithm as traditionally implemented, which detects whether APT is detected or not detected. This stage is referred to as the pattern evaluation phase. The output of the first stage becomes the input for the second stage. In the second stage, the research proposed an algorithm that analyzes the nature of the detected attack in depth. This is achieved by determining remarkable points of ARP, which offer nine different patterns, each reflecting the nature of the attack. These patterns signify the breadth and depth of the attack, such as the peak, noise, plate, and level change.

Once the outputs from the second stage are produced in the form of patterns, the third stage is automatically executed. In the third stage, a robust classification problem is performed to correlate and compare different scenarios. These scenarios are then inputted to SNORT, and the rules associated with these scenarios become trained data that accumulate as more attacks are received. The correlation and saving of these rules against the patterns are carried out using CDT.

Therefore, it can be said that this research addresses a machine learning problem because the proposed algorithm continuously detects attacks, saves possible behaviors or scenarios, and accurately classifies the nature of an APT attack.

5. Conclusions and Future Work

5.1. General Remarks

In summary, this study provided an overview of the recent research on using machine learning techniques for detecting advanced persistent threats (APTs). A variety of ML algorithms were reviewed, and the results showed that there is still potential for further improvement in developing intrusion detection systems with ML. However, there are still limitations and challenges that need to be addressed in order to improve the performance and accuracy of these models.

The results indicated that, on average, the proposed approach outperformed the existing algorithms reported in the literature. For example, the precision estimate for detecting whether or not the attack was malicious for the proposed model (CDT) was 96%, which was consistent with the precision estimates of existing algorithms: PRISM, 96.9%; JRip, 96%; and OneR, 96%. However, the proposed model surpassed the existing algorithms when it came to detecting whether the attack was benign or not. In this scenario, the precision of CDT was 50% compared to that of 0% for OneR, 10% for JRip, and 13.6% for PRISM. Overall, the average scores indicate that the proposed model outperformed the existing algorithms. For example, the average precision estimate for the proposed model was 94.3% compared to that of the existing algorithms, with values of 93.7%, 92.6%, and 92.1% for PRISM, JRip, and OneR, respectively. The evaluation of the CDT algorithm was achieved by adopting the outputs of algorithm number 3 to the NB tree standard using the WEKA software.

5.2. Implications for Sustainability

This research carries profound implications for sustainability. The study centers around the enhancement of advanced persistent threat (APT) detection through the innovative algorithm known as CDT. APTs are intricate and persistent cyber-attacks capable of inflicting substantial harm across diverse sectors, including socio-economic and scientific domains. By elevating APT detection, our research indirectly bolsters sustainability by reinforcing the security and resilience of pivotal systems and infrastructures.

The following points indicate how our research aligns with sustainability objectives:

Enhanced cybersecurity: APTs pose significant threats to various industries, including those related to sustainable development. By developing a more robust and accurate detection algorithm (CDT), our research strengthens cybersecurity measures. This, in turn, safeguards critical infrastructures that play a crucial role in sustainable socio-economic development.
Protection of sensitive data: APT attacks often target sensitive data, including environmental research, social welfare information, and scientific data. By improving the detection of APTs, our research aids in protecting this valuable information, thereby contributing to sustainability efforts in various sectors.
Mitigation of environmental risks: Cyber-attacks can disrupt systems responsible for monitoring and managing environmental conditions. By providing a more effective APT detection mechanism, our research indirectly contributes to mitigating potential risks to the environment, ensuring the continuity of data collection and analysis for sustainable environmental practices.
Resilience of sustainable infrastructure: Sustainable development relies on resilient infrastructures that can withstand potential threats. APTs can compromise the stability and reliability of critical systems. Our research helps in identifying and preventing these cyber threats, thereby enhancing the resilience of sustainable infrastructures.
Support for sustainable policies and laws: As our research offers insights into the breadth and depth of APT attacks, it can inform policymakers and legislators in developing effective policies and laws related to cybersecurity and sustainable development. By having a more comprehensive understanding of cyber threats, decision-makers can implement measures to protect sustainable initiatives.

5.3. Future Work

For future work, there are several important points that researchers can consider based on the efforts made in this research. Firstly, the development of the three algorithms can be extended to search for more than one type of attack simultaneously, not just APT. Secondly, the identification of attacks can be enhanced by monitoring the movements made by employees within the organization and correlating them with other events extracted from computers and networks. This would involve creating a new methodology in which attacks can be known or detected before they occur through future anticipation.

Author Contributions

Conceptualization, A.S.A.-A. and R.A.; methodology, A.S.A.-A., R.A. and I.A.-S.; validation, A.S.A.-A.; formal analysis, A.S.A.-A.; investigation, A.S.A.-A., R.A., S.T. (Sherzod Turaev) and I.A.-S.; resources, A.S.A.-A. and I.A.-S.; writing—original draft, A.S.A.-A.; writing—review and editing, A.S.A.-A., Rawad Abdulgafor, S.T. (Sherzod Turaev), A.Z. and S.T. (Shuhaili Talib); visualization, A.S.A.-A., R.A., S.T. (Sherzod Turaev), A.Z. and S.T. (Shuhaili Talib); supervision, R.A., I.A.-S., A.Z. and S.T. (Shuhaili Talib); project administration, S.T. (Sherzod Turaev) and R.A.; funding acquisition, S.T. (Sherzod Turaev). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the United Arab Emirates University, UAEU Strategic Research Grant G00003676 (Fund No.: 12R136) through Big Data Analytics Center.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Not Applicable.

Acknowledgments

The authors would like to thank United Arab Emirates University for funding this work under UAEU-ZU Joint Research Grant G00003715 (12T034) through Emirates Center for Mobility Research. Also, the authors would like to thank the Arab Open University, Oman.

Conflicts of Interest

The authors declare no conflict of interest.

References

Czum, J.M. Dive into Deep Learning. J. Am. Coll. Radiol. 2020, 17, 637–638. [Google Scholar] [CrossRef]
Ahmad, W.; Rasool, A.; Javed, A.R.; Baker, T.; Jalil, Z. Cyber security in IoT-based cloud computing: A comprehensive survey. Electronics 2022, 11, 16. [Google Scholar] [CrossRef]
Groenendaal, J.; Helsloot, I.; Reuter, C. Towards More Insight into Cyber Incident Response Decision Making and its Implications for Cyber Crisis Management. In Proceedings of the ISCRAM 2022 Conference Proceedings–19th International Conference on Information Systems for Crisis Response and Management, Tarbes, France, 22–25 May 2022. [Google Scholar]
Bajao, N.A.; Sarucam, J.-A. Threats Detection in the Internet of Things Using Convolutional neural networks, long short-term memory, and gated recurrent units. Mesopotamian J. Cybersecur. 2023, 2023, 22–29. [Google Scholar] [CrossRef]
Mijwil, M.; Filali, Y.; Aljanabi, M.; Bounabi, M.; Al-Shahwani, H. The Purpose of Cybersecurity Governance in the Digital Transformation of Public Services and Protecting the Digital Environment. Mesopotamian J. Cybersecur. 2023, 2023, 1–6. [Google Scholar] [CrossRef]
Al-Mohannadi, H.; Mirza, Q.; Namanya, A.; Awan, I.; Cullen, A.; Disso, J. Cyber-attack modeling analysis techniques: An overview. In Proceedings of the 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud), Vienna, Austria, 22–24 August 2016; pp. 69–76. [Google Scholar] [CrossRef]
Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
Al-Matarneh, E.M. Advanced Persistent Threats and Its Role in Network Security Vulnerabilities. Int. J. Adv. Res. Comput. Sci. 2020, 11, 11–20. [Google Scholar] [CrossRef]
Tsochev, G.; Trifonov, R.; Nakov, O.; Manolov, S.; Pavlova, G. Cyber Security: Threats and Challenges. In Proceedings of the 2020 International Conference Automatics and Informatics (ICAI), Varna, Bulgaria, 1–3 October 2020; Available online: https://ieeexplore.ieee.org/abstract/document/9311369/ (accessed on 28 May 2023).
Sharma, A.; Gupta, B.B.; Singh, A.K.; Saraswat, V. Orchestration of APT malware evasive manoeuvers employed for eluding anti-virus and sandbox defense. Comput. Secur. 2022, 115, 102627. [Google Scholar] [CrossRef]
Hakonen, P. Detecting Insider Threats Using User and Entity Behavior Analytics. 2022. Available online: https://www.theseus.fi/handle/10024/786079 (accessed on 28 May 2023).
Ashrafuzzaman, M.; Chakhchoukh, Y.; Jillepalli, A.A.; Tosic, P.T.; de Leon, D.C.; Sheldon, F.T.; Johnson, B.K. Detecting Stealthy False Data Injection Attacks in Power Grids Using Deep Learning. In Proceedings of the 2018 14th International Wireless Communications and Mobile Computing Conference (IWCMC 2018), Limassol, Cyprus, 25–29 June 2018; pp. 219–225. [Google Scholar] [CrossRef]
Ameen, N.; Tarhini, A.; Shah, M.H.; Madichie, N.; Paul, J.; Choudrie, J. Keeping customers’ data secure: A cross-cultural study of cybersecurity compliance among the Gen-Mobile workforce. Comput. Hum. Behav. 2021, 114, 106531. [Google Scholar] [CrossRef]
Chamola, V.; Kotesh, P.; Agarwal, A.; Naren; Gupta, N.; Guizani, M. A Comprehensive Review of Unmanned Aerial Vehicle Attacks and Neutralization Techniques. Ad Hoc Netw. 2021, 111, 102324. [Google Scholar] [CrossRef]
Scherr, C.L.; Aufox, S.; Ross, A.A.; Ramesh, S.; Wicklund, C.A.; Smith, M. What people want to know about their genes: A critical review of the literature on large-scale genome sequencing studies. Healthcare 2018, 6, 96. [Google Scholar] [CrossRef]
Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
Brogi, G. Real-time detection of Advanced Persistent Threats using Information Flow Tracking and Hidden Markov Models to Cite This Version: HAL Id: Tel-01793752 Real-Time Detection of Advanced Per- Sistent Threats Using Information Flow Tracking and Hidden Markov 2018. Available online: https://theses.hal.science/tel-01793752/ (accessed on 28 May 2023).
Zhao, M.J.; Driscoll, A.R.; Sengupta, S.; Fricker, R.D.; Spitzner, D.J.; Woodall, W.H. Performance evaluation of social network anomaly detection using a moving window-based scan method. Qual. Reliab. Eng. Int. 2018, 34, 1699–1716. [Google Scholar] [CrossRef]
Gu, J.; Kong, R.; Sun, H.; Zhuang, H.; Pan, F.; Lin, Z. A novel detection technique based on benign samples and one-class algorithm for malicious PDF documents containing JavaScript. In Proceedings of the International Conference on Computer Application and Information Security (ICCAIS 2021), Riyadh, Saudi Arabia, 18–20 March 2021; p. 62. [Google Scholar] [CrossRef]
Horng, S.-J.; Su, M.-Y.; Chen, Y.-H.; Kao, T.-W.; Chen, R.-J.; Lai, J.-L.; Perkasa, C.D. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst. Appl. 2011, 38, 306–313. [Google Scholar] [CrossRef]
Salama, M.A.; Eid, H.F.; Ramadan, R.A.; Darwish, A.; Hassanien, A.E. Hybrid Intelligent Intrusion Detection Scheme. In Soft Computing in Industrial Applications; Springer: Berlin/Heidelberg, Germany, 2011; pp. 293–303. [Google Scholar] [CrossRef]
Hasan, A.M.; Nasser, M.; Ahmad, S.; Molla, K.I. Feature Selection for Intrusion Detection Using Random Forest. J. Inf. Secur. 2016, 7, 129–140. [Google Scholar] [CrossRef]
Gupta, M.; Shrivastava, S.K. Intrusion Detection System based on SVM and Bee Colony. Int. J. Comput. Appl. 2015, 111, 27–32. [Google Scholar] [CrossRef]
Al-Yaseen, W.L.; Othman, Z.A.; Nazri, M.Z.A. Real-time multi-agent system for an adaptive intrusion detection system. Pattern Recognit. Lett. 2017, 85, 56–64. [Google Scholar] [CrossRef]
Kaveh, A.; Dadras, A. Structural damage identification using an enhanced thermal exchange optimization algorithm. Eng. Optim. 2018, 50, 430–451. [Google Scholar] [CrossRef]
Joshi, J.; Rinal, D.; Patel, J. Diagnosis and Prognosis Breast Cancer Using. Int. J. Eng. Res. Gen. Sci. 2014, 2, 315–323. Available online: www.ijergs.org (accessed on 28 May 2023).
Yilmaz, A.A. Intrusion Detection in Computer Networks using Optimized Machine Learning Algorithms. In Proceedings of the 2022 3rd International Informatics and Software Engineering Conference (IISEC), Ankara, Turkey, 15–16 December 2022. [Google Scholar] [CrossRef]
Rakha, T.; Gorodetsky, A. Review of Unmanned Aerial System (UAS) applications in the built environment: Towards automated building inspection procedures using drones. Autom. Constr. 2018, 93, 252–264. [Google Scholar] [CrossRef]
Aziz, A.S.A.; Hassanien, A.E.; Hanaf, S.E.-O.; Tolba, M. Multi-layer hybrid machine learning techniques for anomalies detection and classification approach. In Proceedings of the 2013 13th International Conference on Hybrid Intelligent Systems (HIS 2013), Gammarth, Tunisia, 4–6 December 2013; pp. 215–220. [Google Scholar] [CrossRef]
Ingre, B.; Yadav, A. Performance analysis of NSL-KDD dataset using ANN. In Proceedings of the 2015 International Conference on Signal Processing and Communication Engineering Systems, Guntur, India, 2–3 January 2015; pp. 92–96. [Google Scholar] [CrossRef]
Jajoo, A. A Study on the Morris Worm. arXiv 2021, arXiv:2112.07647. Available online: https://arxiv.org/abs/2112.07647 (accessed on 28 May 2023).
Marchetti, M.; Pierazzi, F.; Guido, A.; Colajanni, M. Countering Advanced Persistent Threats through security intelligence and big data analytics. In Proceedings of the International Conference on Cyber Conflict, CYCON, Washington, DC, USA, 21–23 October 2016; Volume 2016, pp. 243–261. [Google Scholar] [CrossRef]
Of, I.J. Research in Computer Applications and Robotics. Crit. Rev. Cryptogr. 2014, 2, 113–118. [Google Scholar]
Trifonov, R.; Manolov, S.; Yoshinov, R.; Tsochev, G.; Pavlova, G. Artificial Intelligence Methods for Cyber Threats Intelligence. Int. J. Comput. 2017, 2, 129–135. Available online: https://www.iaras.org/iaras/home/cijc/artificial-intelligence-methods-for-cyber-threats-intelligence (accessed on 28 May 2023).
Li, X.; Jiang, H. Artificial intelligence technology & engineering applications. Appl. Comput. Electromagn. Soc. J. 2017, 32, 381–388. [Google Scholar]
Poola, I. The Best of the Machine Learning Algorithms Used in Artificial Intelligence. Int. J. Adv. Res. Comput. Commun. Eng. 2017, 6, 187–194. [Google Scholar]
Adams, C.; Tambay, A.A.; Bissessar, D.; Brien, R.; Fan, J.; Hezaveh, M.; Zahed, J. Using Machine Learning to Detect APTs on a User Workstation. Int. J. Sens. Netw. Data Commun. 2019, 8, 3. [Google Scholar]
Ghafir, I.; Hammoudeh, M.; Prenosil, V.; Han, L.; Hegarty, R.; Rabie, K.; Aparicio-Navarro, F.J. Detection of advanced persistent threat using machine-learning correlation analysis. Future Gener. Comput. Syst. 2018, 89, 349–359. [Google Scholar] [CrossRef]
Abdullah, T.A.; Ali, W.; Abdulghafor, R. Empirical study on intelligent android malware detection based on supervised machine learning. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 215. [Google Scholar] [CrossRef]
Berrada, G.; Cheney, J.; Benabderrahmane, S.; Maxwell, W.; Mookherjee, H.; Theriault, A.; Wright, R. A baseline for unsupervised advanced persistent threat detection in system-level provenance. Future Gener. Comput. Syst. 2020, 108, 401–413. [Google Scholar] [CrossRef]

Figure 1. Supervised ML techniques through MS Azure Cloud.

Figure 2. Data integration example (The Software: WEKA Ver. 3.8.6).

Figure 3. Dataset process using the ML-SVM algorithm.

Figure 4. Example of used datasets in MS Azure (Online Access).

Figure 5. Example of wireless capture data.

Figure 6. Innovative methodology of a live-case attack.

Figure 7. APT detection methodology phases.

Table 1. Technologies for APT detection deficiency.

	AI Impact	Experiment Analysis (Theories) or Operational Fields	Methodological Issues
[32]	Multi-Factor Approaches Where Big Data Analytics Methods are Applied	Regular access patterns DGA analysis Blacklist filtering	Data at rest analysis
[33]	Applications of Visual Nets Can Keep on in Cyber Security	Visual nets Expert systems Intelligent agents Searching Learning Constraint finding	Data flow sampling
[34]	Operational Intelligence Studies	Doctrine of active defense. Derive features suitable for behavioral interpretation and validation. Build and optimize an ensemble of classifiers based on trained models	Network flow behavior vs. anomalies detection
[35]	Technology Framework Upon Four Layers	AI models and algorithms AI Techniques	Data flow sampling Data flow capturing
[36]	AI and ML for Social Structures	Machine learning and cognitive systems Supervised learning algorithms Artificial neural networks Algorithms and complex optimizations	Autonomous behavior detection Autonomous anomalies detection
[37]	ML on User Workstation	Learn a user’s behavior Detect APT activity as an anomaly in that behavior Red team automation (RTA) scripts to simulate an APT attack	Data flow sampling Net behavior and anomaly detection
[38]	Novel MachineLearning-Based System Entitled MLAPT	Threat detection Alert correlation Attack prediction with a prediction accuracy of 84.8%	Penetration alerts Data at rest sampling
[39]	Machine Learning Techniques as an Alternative Method	Static, dynamic, and hybrid Android malware detection approaches classify k-nearest neighbors (K-NN), decision tree (DT), support vector machine (SVM), random forest (RF), naïve bayes (NB), and logistic regression (LR)	Performance comparison of supervised ML algorithms to detect Android malware
[40]	Unsupervised Streaming Anomaly Detection Algorithm Based on Process Behavior	Mining provenance data to analyze and identify causal relationships among system activities Streaming anomaly detection technique based on a simple algorithm called attribute value frequency (AVF)	Deployment techniques

Table 2. Comparison between existing algorithms and the proposed optimized algorithm.

	PRISM			JRip			OneR			CDT
	Precision	Recall	F-Measure	Precision	Recall	F-Measure	Precision	Recall	F-Measure	Precision	Recall	F-Measure
Benign	0.136	0.268	0.18	0.1	0.023	0.038	0	0	0	0.5	0.047	0.085
Malicious	0.969	0.931	0.95	0.96	0.991	0.975	0.96	0.998	0.978	0.961	0.998	0.979
AVG	0.937	0.905	0.92	0.926	0.952	0.938	0.921	0.958	0.939	0.943	0.96	0.943

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

AL-Aamri, A.S.; Abdulghafor, R.; Turaev, S.; Al-Shaikhli, I.; Zeki, A.; Talib, S. Machine Learning for APT Detection. Sustainability 2023, 15, 13820. https://doi.org/10.3390/su151813820

AMA Style

AL-Aamri AS, Abdulghafor R, Turaev S, Al-Shaikhli I, Zeki A, Talib S. Machine Learning for APT Detection. Sustainability. 2023; 15(18):13820. https://doi.org/10.3390/su151813820

Chicago/Turabian Style

AL-Aamri, Abdullah Said, Rawad Abdulghafor, Sherzod Turaev, Imad Al-Shaikhli, Akram Zeki, and Shuhaili Talib. 2023. "Machine Learning for APT Detection" Sustainability 15, no. 18: 13820. https://doi.org/10.3390/su151813820

APA Style

AL-Aamri, A. S., Abdulghafor, R., Turaev, S., Al-Shaikhli, I., Zeki, A., & Talib, S. (2023). Machine Learning for APT Detection. Sustainability, 15(18), 13820. https://doi.org/10.3390/su151813820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning for APT Detection

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Dataset Preprocessing

3.2. Dataset Used

3.3. Model Development

3.4. Justification of ML Techniques and Use of Network Traffic Data

4. Results and Discussion

5. Conclusions and Future Work

5.1. General Remarks

5.2. Implications for Sustainability

5.3. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI