DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology

Zhang, Shuqin; Xue, Xiaohang; Su, Xinyu

doi:10.3390/electronics14020257

Open AccessArticle

DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology

by

Shuqin Zhang

^1,†,

Xiaohang Xue

^1,*,†

and

Xinyu Su

²

¹

School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China

²

School of Cyberspace Security, Information Engineering University, Zhengzhou 450007, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(2), 257; https://doi.org/10.3390/electronics14020257

Submission received: 1 December 2024 / Revised: 27 December 2024 / Accepted: 8 January 2025 / Published: 9 January 2025

(This article belongs to the Special Issue AI-Based Solutions for Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

As the Industrial Internet of Things (IIoT) increasingly integrates with traditional networks, advanced persistent threats (APTs) pose significant risks to critical infrastructure. Traditional Intrusion Detection Systems (IDSs) and Anomaly Detection Systems (ADSs) are often inadequate in countering sophisticated multi-step APT attacks. This highlights the necessity of studying attacker strategies and developing predictive models to mitigate potential threats. To address these challenges, we propose DeepOP, a hybrid framework for attack sequence prediction that combines deep learning and ontological reasoning. DeepOP leverages the MITRE ATT&CK framework to standardize attacker behavior and predict future attacks with fine-grained precision. Our framework’s core is a novel causal window self-attention mechanism embedded within a transformer-based architecture. This mechanism effectively captures local causal relationships and global dependencies within attack sequences, enabling accurate multi-step attack predictions. In addition, we construct a comprehensive dataset by extracting causally connected attack events from cyber threat intelligence (CTI) reports using ontological reasoning, mapping them to the ATT&CK framework. This approach addresses the challenge of insufficient data for fine-grained attack prediction and enhances the model’s ability to generalize across diverse scenarios. Experimental results demonstrate that the proposed model effectively predicts attacker behavior, achieving competitive performance in multi-step attack prediction tasks. Furthermore, DeepOP bridges the gap between theoretical modeling and practical security applications, providing a robust solution for countering complex APT threats.

Keywords:

attack prediction; ATT&CK framework; ontology; transformer

1. Introduction

In recent years, advanced persistent threat (APT) attacks targeting the Industrial Internet of Things (IIoT) have significantly increased [1]. Under the trend of Industry 4.0, the increasing interconnectivity between the IIoT and traditional networks has made critical infrastructure more vulnerable to attacks, such as the Ukrainian power grid attack [2], which highlights the real risks posed by APTs. Advanced persistent threats (APTs) achieve the attacker’s ultimate goals by executing multi-stage intrusion steps. Unlike traditional cyberattacks, APT attacks can effectively bypass intrusion detection systems. Traditional threat defense systems deployed in the IIoT, such as IDSs (Intrusion Detection Systems) and ADSs (Anomaly Detection Systems), suffer from high false positive rates and response delays, making them insufficient to prevent damage from advanced attacks [3,4,5]. Under the current asymmetry between attack and defense, we must abandon the concept of real-time detection and defense from the past. Only by predicting attackers’ behaviors in advance can we effectively prevent them from achieving their ultimate attack goals.

Multiple studies have explored attack prediction. For example, the study [6] introduced an uncertainty-aware attack graph incorporating the dynamic characteristics of exploitation probability to build a predictive model for assessing future risks. Ref. [7] used maritime supply chain infrastructure data to identify potential attack paths in the network topology and applied multi-stage collaborative filtering of recommendation systems to predict future attack paths. Against the backdrop of the expanding data scale related to cybersecurity incidents, attack prediction research has increasingly adopted data-driven approaches [8]. For instance, ref. [9] used DDoS traffic data to predict the timing of future attacks, while [10] defined multiple stages of APT attacks and used a Hidden Markov Model (HMM) to calculate the optimal state sequence of an ongoing multi-step attack, reflecting the attacker’s strategies and goals. Studies such as [11] mapped system logs to the tactical phases of the ATT&CK framework to predict malicious behavior paths, while [12] constructed a dataset from real-world APT cases to predict specific attack techniques.

However, these studies on attack prediction still have several limitations. Methods based on attack graphs [6,7] rely on modeling specific scenarios, and they become ineffective when the attack scenarios change. The study [9] predicts the potential timing of future attacks based on the occurrence time data of DDoS attacks; however, in real-world cyberattacks, the attack behavior often lacks a fixed temporal pattern, making this approach challenging to adapt to complex attack scenarios. Although [10] uses a Hidden Markov Model to predict multi-step attack state sequences, the prediction process is not based on a universal, standardized framework to model the attacker’s behavior, which limits the further utilization of the prediction results.

In addition, the method of generating attack knowledge graphs (TAKGs) based on logs [11] maps malicious system logs to the tactical phases of the ATT&CK framework and predicts attack paths by analyzing the tactical route. However, this method is limited to modeling tactical-level paths and fails to delve into causal relationship modeling at the specific attack technique level. Furthermore, it relies on malicious log data generated from specific scenarios, which limits its applicability in real-world applications. Another approach [12] extracts ATT&CK attack techniques from real APT cases to construct a dataset, providing a foundation for predicting specific attack techniques. However, during the dataset construction, this method simply arranges the attack sequence according to the order of the tactical phases to which the attack techniques belong, ignoring the causal relationships among different attack techniques. Additionally, it may incorrectly merge multiple parallel attack paths from threat reports into a single sequence, leading to inaccuracies in the prediction results.

To overcome the aforementioned issues, we propose DeepOP: a hybrid framework for MITRE ATT&CK sequence prediction via deep learning and ontology. This study integrates the MITRE ATT&CK framework [13] and utilizes deep learning and ontology reasoning to model and predict APT attack behaviors in a generalized and standardized manner. Unlike previous methods, DeepOP can refine causal relationship modeling at the level of specific attack techniques and effectively address multi-step dependencies in complex attack sequences through multi-scale dependency analysis.

Specifically, this study maps the cyber threat intelligence (CTI) from real APT attack cases to the Tactics, Techniques, and Procedures (TTPs) defined by the ATT&CK framework and extracts causal relationships between attack techniques as well as possible parallel attack sequences using ontology reasoning to construct a dataset for predicting specific attack techniques. This dataset not only overcomes the dependency on specific network environments in existing methods but also accurately represents causal relationships and parallel paths in real APT cases. Secondly, this study proposes an attack prediction model based on causal window attention (CWA). This model can capture local causal dependencies and global pattern features in APT attacks, achieving comprehensive modeling of attacker behavior patterns by integrating the dependency features of different window sizes.

The contributions of this paper are as follows:

(1): We developed the Cybersecurity Ontology for Attack Sequence Extraction (COASE) and an ontology-based attack sequence knowledge reasoning method, enabling the automated extraction of attack sequences with causal relationships, thereby providing a theoretical foundation for attack behavior modeling.
(2): By extracting data from real APT attack cases and integrating ontology reasoning techniques, we generated a dataset that reflects the causal relationships and parallel attack paths among different attack techniques in cyber threat intelligence, providing standardized data support for fine-grained attack prediction research.
(3): We designed an attack prediction model incorporating causal window attention (CWA), capable of capturing local causal relationships and global features in APT attack sequences, offering an efficient and accurate solution for complex multi-step attack prediction tasks.
(4): We conducted extensive experiments on the constructed dataset, and the results demonstrate that the proposed model significantly outperforms existing methods in multi-step attack prediction tasks, showcasing substantial advantages in prediction accuracy and robustness.

2. Related Work and Background

2.1. Modeling APT Attack

In recent years, numerous methods for modeling cyberattacks have emerged [14]. Early on, the Cyber Kill Chain model proposed by Lockheed Martin [15] gained widespread application in both academia and industry. This model decomposes APT attacks into seven stages: Reconnaissance, Weaponization, Delivery, Exploitation, Installation, Command and Control, and Actions on Objectives. By modeling the attack lifecycle, the Kill Chain model analyzes how attackers adjust subsequent attacks based on prior results and suggests that defenders can disrupt an attack by breaking any link in the chain.

The MITRE ATT&CK framework adopts the TTP (Tactics, Techniques, and Procedures) methodology defined by the U.S. NIST to establish a universal cybersecurity language. Compared with the Cyber Kill Chain model, ATT&CK expands attack stages into multidimensional “tactics” and refines them into specific “techniques”. Each attack technique is accompanied by a unique identifier, detailed description, and tool support, clearly expressing the attacker’s behavior. The ATT&CK framework addresses the lack of detailed descriptions in traditional methods by mapping attack events to specific tactics and techniques, aiding defenders in designing targeted defense strategies to disrupt attack chains or increase attack difficulty.

Figure 1 illustrates the process of modeling an attack by an Iranian APT group using the MITRE ATT&CK framework. First, the attacker established an initial foothold on the U.S. government’s VMware Horizon server using the “Exploitation of Public-Facing Application” (T1190) technique under the Initial Access (TA0001) tactic. Next, the attacker executed the “Command and Scripting Interpreter” (T1059) operation under the Execution (TA0002) tactic, using PowerShell commands to add exclusion rules to Windows Defender and gain command execution privileges. To achieve Persistence (TA0003), the attacker used the “Scheduled Task” (T1053) technique to create a scheduled task, RuntimeBrokerService.exe, which executed RuntimeBroker.exe daily with system privileges, maintaining long-term access to the target system.

To evade detection, the attacker employed the “Impair Defenses” (T1562) technique under the Defense Evasion (TA0005) tactic, adding exclusion rules to Windows Defender so that the defense system would ignore tool downloads in a specific folder. Then, the attacker used the “Credentials from Password Stores” (T1555) technique under the Credential Access (TA0006) tactic, extracting OS credentials with a password cracking tool to further escalate privileges. During the Discovery (TA0007) phase, the attacker utilized the “System Network Configuration Discovery” (T1016) technique to gather internal network information, preparing for lateral movement (TA0008). Subsequently, the attacker achieved lateral movement via RDP using the “Remote Services” (T1021) technique, gaining further internal network privileges, and ultimately controlled the target network through the “Proxy” (T1090) technique under the Command and Control (TA0011) tactic.

The ATT&CK framework provides a visual representation of attacker actions, allowing standardized predictions to assist defenders in making more informed defense decisions.

2.2. Attack Prediction

Attack prediction has always been an important topic in cybersecurity research, aiming to forecast attackers’ behaviors before they achieve their objectives.

Early studies [6,7] introduced attack graph-based methods to evaluate attack paths and predict future risks. For example, the uncertainty-aware attack graph [6] integrated IDS alerts and intrusion response to dynamically update attack probabilities, thereby improving prediction accuracy. However, these methods often rely on specific scenarios and are difficult to adapt to evolving attack patterns. Study [9] designed an SVM-based DDoS attack prediction system to predict future attacks; study [10] utilized Hidden Markov Models (HMMs) for multi-step attack prediction. Although these methods show some potential in capturing continuous attack behaviors, their applicability is limited due to assumptions based on time or probability.

Some studies train attack prediction models using raw alert data or traffic data. For example, study [16] proposed the concept of the Network Entity Reputation Database System (NERDS), which evaluates the likelihood of future attacks for each IP address based on real alert datasets and creates predictive blacklists to issue warnings before attacks occur. However, this method can only predict malicious entities’ IP addresses and cannot further predict other alert fields. Study [17] proposed a GRU-based deep learning approach that extends the prediction scope by predicting the multi-class attributes (e.g., protocol type, attack time, and attack category) of subsequent alerts based on previous alert sequences. Study [18] developed a similarity-based aggregation algorithm to correlate and aggregate alerts, addressing model training issues caused by excessive duplicate alerts in previous studies, and trained a Transformer-based model to handle variable-length sequences for attack prediction. Additionally, this study proposed a threat estimation method to assess the level of threats a system might face in the future.

In more complex scenarios, study [19] proposed the GNN-AP framework to predict potential attack targets. The framework uses an encoder–decoder model to construct an attack scenario graph from alarm log data, extract attack sequences, and generate attack target graphs by incorporating Communication-Based Train Control (CBTC) device topology information. Subsequently, it utilizes Graph Neural Networks to transform attack prediction into a link prediction problem, thereby identifying attackers’ intentions. However, this method relies on specific scenario network topology models and lacks cross-scenario generalizability. Study [20] proposed the DeepAG framework, which first uses Transformer models to model semantic information in system logs to detect potential APT attack sequences and then combines LSTM networks and OOV text processors to predict attack paths, achieving high performance. However, this study overlooked the possibility of some non-malicious attacks being detected as anomalies in system logs, affecting the reliability of prediction results.

Although the aforementioned studies predicted attack behaviors from various perspectives, their results often lack a universal standardized framework, with attack representations being overly abstract. For example, some studies focus on the timing of attacks [9], while others concentrate on predicting malicious entities [16]. Study [11] proposed the CL-AP2 framework, which generates Temporal Attack Knowledge Graphs (TAKGs) and combines Transformer and reinforcement learning methods to predict the next attack tactic path, providing a more comprehensive depiction of attack paths from malicious logs. However, this method still requires substantial manual effort for specific attack technique predictions. Study [21] collected datasets from the MITRE ATT&CK framework regarding APT attack organizations and software, inferred technical correlations between attack techniques using hierarchical clustering methods, and utilized these correlations to predict subsequent attacker behaviors. Additionally, study [12] constructed a Bayesian network attack prediction model by combining structural and parameter learning, extracting attack sequence datasets from historical APT attack cases for cross-scenario attack technique predictions. Although these studies laid a data foundation for specific attack technique predictions, their dataset construction did not consider causal relationships between attack techniques or potential parallel attack paths, which may lead to erroneous attack descriptions.

To address these challenges, this paper integrates the MITRE ATT&CK framework with deep learning techniques to propose a standardized and generalized attack prediction method that effectively models causal relationships and parallel paths between attack techniques, providing strong support for more accurate multi-step attack prediction.

2.3. Threat Intelligence Extraction and Ontology Reasoning

The importance of cyber threat intelligence (CTI) is increasingly recognized, as it aims to help organizations effectively respond to cyber threats by analyzing attackers’ behavior patterns and objectives. However, since CTI data are often presented in unstructured forms (e.g., threat reports), manually parsing these reports is time-consuming and prone to errors. Therefore, researchers have developed various methods to extract valuable information from CTI and structure it for analysis.

Natural language processing (NLP) has been widely applied in threat intelligence extraction. SECCMiner [22], proposed by Niakanlahiji et al., is a system combining tokenization and part-of-speech tagging, capable of extracting key phrases of APT techniques from threat reports. Tools like EXTRACTOR [23] focus on leveraging text summarization and semantic role labeling techniques to extract attack behavior graphs from CTI reports and generate source graphs for threat hunting.

Ontologies, as a formal knowledge representation method, play a crucial role in threat intelligence extraction. For example, Husari et al. proposed TTPDrill [24], which uses a MITRE ATT&CK-based threat action ontology to extract threat behaviors from CTI reports and map them to specific techniques. However, this method’s reliance on term similarity may lead to ambiguities, such as “encoded files” and “obfuscated files” being mapped to multiple techniques. AttacKG [25], developed by Li et al., uses a graph alignment algorithm to match attack graphs in CTI reports with MITRE templates, thereby generating Technical Knowledge Graphs (TKGs). These methods significantly improve the accuracy and coverage of threat behavior analysis by aggregating multi-source threat intelligence.

Existing research on threat intelligence extraction and ontology reasoning provides significant inspiration for this study. However, most methods still exhibit deficiencies in addressing causal relationships between techniques. This paper constructs a new ontology for attack sequence extraction and combines it with reasoning rules to extract different attack sequences from cyber threat intelligence.

3. Materials and Methods

In this paper, we aim to extract real attack sequences with causal relationships from CTI reports to construct a dataset and train an attack prediction model to achieve predictions at the level of specific attack techniques. As shown in Figure 2, our work involves the following processes:

(1): Data Collection: the initial stage involves using web crawlers to collect CTI from various sources as the data foundation.
(2): Knowledge Extraction: CTI is processed to identify specific cybersecurity attack techniques and related sentences, and extract key “query nodes” of the ontology.
(3): Causal Reasoning: ontology reasoning is used to connect different causally related attack techniques to form complete attack event representations, which are stored in a graph database.
(4): Attack Prediction: in the final stage, we extract all attack sequences to construct the dataset, which provides the necessary data foundation for predicting specific attack techniques. We designed an attack prediction module within the DeepOP framework, which leverages a novel window-based causal attention mechanism to extract multi-scale information from attack sequences, enhancing performance in multi-step attack prediction.

Further details will be provided in the following sections.

3.1. Data Collection

In our preliminary research, we found no dataset that maps real-world APT attacks to attack sequences in ATT&CK tactics and techniques. Although there are some public datasets [26,27] generated through simulated APT attacks, these datasets are created in virtual environments set up by researchers, reflecting only a limited number of APT attack scenarios and relying on specific network environments. Refs. [12,21] built ATT&CK TTP sequence datasets from different sources; however, these studies simply ordered TTPs by ATT&CK tactical stages to form attack sequences, without considering the causality between attack techniques or potential parallel attack paths, which might lead to misrepresentation of real APT attacks. Our research goal is to predict attackers’ actions at the technical level. Therefore, we collected cyber threat intelligence from multiple sources and processed it in conjunction with publicly available resources from the MITRE organization to construct a dataset that truly reflects APT attack sequences. To achieve this, we developed web crawlers tailored to different websites and forums to meet the varying requirements for resource scraping.

We first analyzed resources reviewed by security experts, including 1500 threat reports annotated with ATT&CK attack techniques by threat intelligence platforms such as OSINT and AlienVault, as well as public resources from the MITRE ATT&CK knowledge base. These reviewed contents usually have higher data quality and reliability compared to other sources. They provide a mapping relationship between attack descriptive statements and the ATT&CK information they contain, as shown in Figure 3, from which we extracted 17,302 samples of attack descriptions corresponding to attack techniques, covering almost all MITRE ATT&CK attack techniques with a coverage rate of 93.2%. These data provide the foundational basis for retraining the TTP extraction model.

Additionally, we collected approximately 2200 original threat reports from 2006 to 2024 from the APT CyberCriminal Campaign Collections [28] repository and various blogs. The inclusion of more security reports aimed to gather more unreviewed attack scenarios to enhance the diversity and applicability of the dataset. We performed deduplication processing on repeated references in the threat report content. Moreover, the original threat reports often contained a large amount of information unrelated to TTP extraction (such as advertisements, images, HTML tags, etc.), so we performed necessary preprocessing on these reports, cleaning and removing irrelevant content to improve data quality.

3.2. Information Extraction from CTI

After preliminary cleaning, the original threat reports still contain numerous descriptions unrelated to attack techniques, posing challenges for accurately extracting attack sequences from the reports. Currently, some open-source tools (e.g., TRAM [29] and rcATT [30]) can extract ATT&CK TTPs (Tactics, Techniques, and Procedures) from text. However, these tools have certain limitations in practice: TRAM supports the extraction of only about fifty ATT&CK attack techniques, with limited coverage; rcATT performs poorly when extracting content related to attacker goal phases (e.g., TA0010 and TA0040). Furthermore, the TTP extraction models of these tools are primarily trained on older versions of MITRE ATT&CK, which may not fully cover critical techniques in the latest threat reports.

To address these issues, we collected attack description to technique mapping data from expert-reviewed sources and retrained the rcATT model. Similar to the objectives of [31], our approach integrates the latest version of the MITRE ATT&CK framework and utilizes a more comprehensive dataset to broaden the model’s coverage, enabling it to adapt to the framework’s continuous updates and expanded recognition capabilities. The dataset’s distribution is presented in Table 1, covering all attack stages.

After retraining, the rcATT model can identify over 200 attack techniques, a significant increase from the original 100 or so, including previously unsupported tactics such as TA0010 and TA0040. Using this improved rcATT model, we extracted TTPs and their corresponding attack description statements more efficiently from the original threat reports. Once the correspondence between these statements and the relevant techniques was confirmed, we further extracted valid entities and relationships from the threat reports and connected them to the COASE ontology. This step provides critical data support for the task of constructing causal attack sequences.

Notably, attack description statements often contain numerous IoCs (Indicators of Compromise) related to current cybersecurity events, such as registry keys, IP addresses, and email addresses. We designed regular expression patterns, as shown in Table 2, to efficiently extract IoC entity types. Additionally, we compared existing natural language processing techniques applied to entity–relationship extraction [23,25] and ultimately chose the CTI entity–relationship extraction tool Extractor [23], based on Semantic Role Labeling (SRL), to complete the entity–relationship extraction tasks in threat reports. The Extractor tool demonstrated high precision and strong relevance in parsing specific entities and relationships in the cybersecurity domain, better meeting the study’s requirements for data accuracy and contextual consistency.

3.3. Ontology Inference

The COASE is an ontology developed in this study to extract effective attack technique utilization sequences from threat reports, enabling the representation of various entities and relationships in threat reports using formally described concepts [32]. Although [24,33] attempted to create cybersecurity ontologies linking ATT&CK knowledge, they failed to capture the essential relationships between TTPs and the interactions of different threat entities. The COASE proposed in this study comprises seven top-level classes: tactic, technique, IoC, attacker, asset, and artifact. Figure 4 illustrates the relationships between different top-level classes in the ontology developed in this study.

Ontology reasoning allows us to infer potential relationships based on known cybersecurity information, aiding in uncovering causal relationships between different attack techniques in threat reports and extracting attack sequences representing real attacks. We designed reasoning rules using Semantic Web Rule Language (SWRL) [34] to describe the direct or indirect relationships between different attack techniques in threat reports. In this study, we designed reasoning rules to extract attack sequences with causal relationships from threat reports and differentiate parallel attack paths.

The reasoning rules and their explanations are as follows:

\begin{matrix} R 1 : & Technique (tech 1) \land affects (tech 1, asset) \land contains (asset, artifact) \\ \land involves (tech 2, artifact) \to cause (tech 1, tech 2) \end{matrix}

If attack technique A affects an asset, and another attack technique B involves an artifact contained within the asset, then attack technique A will lead to attack technique B. For example, in a security incident, the attacker first used technique T1566 (phishing email) to successfully compromise an endpoint. This compromise resulted in the installation of malware (artifact), which was subsequently used with technique T1021 (lateral movement) to access other parts of the network. The malware, as a result of the phishing attack, enabled lateral movement.

\begin{matrix} R 2 : & Technique (tech 1) \land related (tech 1, ioc) \land uses (tech 2, ioc) \\ \to cause (tech 1, tech 2) \end{matrix}

If attack technique A is related to an IoC, and attack technique B subsequently uses this IoC, then attack technique A may lead to attack technique B. For example, in attack analysis, deploying malware (technique T1204) was found to generate specific network signals (IoCs). Subsequent evasion detection techniques (technique T1070), which modify or delete logs, used these IoCs to avoid detection, directly connecting malware deployment to log operations in a causal chain.

\begin{matrix} R 3 : & Technique (tech 1) \land usesTool (tech 1, tool) \land Tool (tool) \\ \land interacts (tool, artifact) \land Artifact (artifact) \land involves (tech 2, artifact) \\ \land Technique (tech 2) \to cause (tech 1, tech 2) \end{matrix}

If attack technique A uses a tool, and the tool interacts with an artifact on an asset, and attack technique B involves that artifact, then attack technique A may lead to attack technique B. For example, an attacker used technique T1203 (tool exploitation) to exploit a vulnerability, resulting in the creation of an abnormal system process (artifact). This artifact was subsequently exploited by technique T1068 (privilege escalation) to gain higher system privileges, demonstrating how the first technique indirectly facilitated the execution of the second through the use of a tool.

\begin{matrix} R 4 : & Technique (tech 1) \land Technique (tech 2) \land related (tech 1, ioc) \\ \land related (tech 2, ioc) \land affects (tech 1, asset 1) \land affects (tech 2, asset 2) \\ \land swrlb : notEqual (asset 1, asset 2) \to parallelPaths (tech 1, tech 2) \end{matrix}

If different techniques affect different assets through the same IoC, these techniques belong to parallel attack paths. For example, in a system, T1498 (DDoS attack) affects online services (Asset A1), while T1566 (phishing) targets the internal email system (Asset A2). These two attack techniques do not belong to the same attack sequence.

\begin{matrix} R 5 : & Technique (tech 1) \land Technique (tech 2) \land affects (tech 1, asset 1) \\ \land affects (tech 2, asset 2) \land swrlb : notEqual (asset 1, asset 2) \\ \to parallelPaths (tech 1, tech 2) \end{matrix}

If two techniques affect different assets, these two techniques may belong to parallel attack paths. For example, an APT group used T1498 (denial-of-service attack) targeting external network services, while simultaneously employing T1485 (data destruction) against internal database systems. These two attacks affected different asset environments.

\begin{matrix} R 6 : & Attacker (attacker) \land uses (attacker, tech 1) \land affects (tech 1, asset) \\ \land Technique (tech 2) \land affects (tech 2, asset) \land earlier (tech 1, tech 2) \\ \to partOfAttackPath (tech 1, tech 2) \end{matrix}

If an attacker employs a certain technique, and this technique directly affects an asset while having a sequential relationship with another technique that also affects the same asset, it indicates that the first technique may lead to the second. For example, an attacker may first use T1040 (network sniffing) to capture credentials on a network device, and then the same attacker may use these credentials to execute T1195 (supply chain compromise), thereby impacting the same network system and establishing a direct sequential attack path.

We will map the inference results of threat reports onto a knowledge graph and extract multiple attack sequences used by attackers to achieve different attack objectives. Based on these causally related real-world APT attack sequences, we construct a dataset for prediction at the level of specific attack techniques.

3.4. Attack Prediction Module

In attack prediction tasks, the multi-step behavior of attackers often exhibits multi-scale dependency and complex causal relationships. To address this, this paper introduces a novel attack prediction mechanism within the DeepOP framework, leveraging causal window attention to model attack behavior sequences and enhance multi-step attack prediction performance.

The overall architecture of the proposed attack prediction model is shown in Figure 5a. It is designed based on a general encoder–decoder framework, including one attack sequence embedding layer and stacked multiple encoders and decoders [35]. To improve the training efficiency of the model and facilitate the construction of deeper networks, residual connections [36] and layer [37] normalization are introduced in the architecture to ensure the consistency of output dimensions at each layer.

The attack sequence is composed of multiple technical and tactical steps. Suppose an attack sequence is represented as

s_{i} = {t_{1}^{(i)}, t_{2}^{(i)}, \dots, t_{n}^{(i)}}

, where

t_{n}^{(i)}

represents the ATT&CK tactics and techniques label of the i-th step of the attack. This paper uses Word2Vec [38] to perform hierarchical embedding of the sequence, encoding the two-level labels of ATT&CK tactics and techniques separately.

Word2Vec is an embedding method widely used in natural language processing (NLP), which, compared to traditional one-hot encoding, can effectively capture the semantic correlations between labels. Specifically, a two-level label set

L = {l_{1}, l_{2}, \dots, l_{h}}

is constructed, where tactical labels are

L_{T A} = {l_{1}, l_{2}, \dots, l_{j}}

and technical labels are

L_{T E} = {l_{1}, l_{2}, \dots, l_{k}}

. Using Word2Vec to embed the two-level labels yields a dense matrix representation:

H = [h_{1}, h_{2}, \dots, h_{n}], h_{i} \in R^{d}

(1)

where

h_{i}

represents the d-dimensional embedding representation of each label. To capture the positional information of the sequence, this paper introduces positional encoding in the embedding layer. The positional encoding vectors are constructed based on sine and cosine functions of different frequencies:

P E_{(i, 2 k)} = sin (i / {10, 000}^{(2 k / d)}), P E_{(i, 2 k + 1)} = cos (i / {10, 000}^{(2 k / d)})

(2)

where i represents the position in the sequence and k represents the dimension index. Positional encoding allows the model to capture the sequential relationships between different positions in the sequence. By using hierarchical tactical encoding and positional encoding, we obtain the final embedding of the attack sequence:

E_{e m b} = H + P E

(3)

The attack prediction model consists of an encoder and a decoder, where the encoder part is used to capture high-level semantic representations of the attack sequence. This paper employs Temporal Multi-Head Attention to calculate the correlation between different positions within the sequence. The decoder generates the prediction sequence in an autoregressive manner. Unlike the traditional attention mechanism, this paper introduces causal window attention (CWA) in the decoder to ensure temporal causality during sequence generation. Each step of the decoder relies only on the previously generated historical sequence, preventing leakage of future information.

Multi-head attention integrates different self-attention mechanisms. For the j-th self-attention head, different weight matrices

W_{Q}

,

W_{K}

, and

W_{V}

are used to convert word vectors into query vectors Q, key vectors K, and value vectors V needed for attention computation:

h e a d_{j} = A t t e n t i o n (Q W_{j}^{Q}, K W_{j}^{K}, V W_{j}^{V})

(4)

Then, the outputs of attention are concatenated to obtain the final representation of the i attention heads:

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{i}) W^{T}

(5)

where

W^{T} \in R^{(d \times d)}

represents the final projection matrix.

In the encoder’s Temporal Multi-Head Attention, scaled dot-product is used to compute attention scores.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{m o d e l}}}) V

(6)

The encoder output contains the contextual semantic representation of the attack sequence

z = {t_{z 1}, t_{z 2}, \dots, t_{z n}}

, where

t_{i}

represents the hidden representation of the attack technique used in the i-th step.

In the decoder, we introduced a modification to the traditional attention mechanism. Different attack techniques within the attack sequence exhibit causality. We replaced the traditional decoder attention with causal window attention. Figure 5b shows the architecture of causal window attention with h attention heads and

n_{c w}

scale windows. In this example, we set

h = 6

and

n_{c w} = 3

, dividing h attention heads into

n_{c w}

window groups and performing self-attention calculation across windows of different scales to capture multi-scale information within the attack sequence.

Specifically, the attack sequence feature input

x \in R^{(N \times d)}

is feature-mapped. First, the input sequence x is divided into sub-sequences of different windows based on window size

n_{c w_{i}}

:

S p l i t (x, c w_{i}) = {X_{c w_{i}}^{(1)}, X_{c w_{i}}^{(2)}, \dots}

(7)

The window attention is computed for each

x_{c w_{i}} \in R^{(c w_{i} \times d)}

. Finally, the features of each window scale are concatenated to form the output of window attention:

Y_{W A} = Concat ({[{Attention}_{c w_{1}} (x_{c w_{1}})]}_{i = 1}^{k})

(8)

To capture the dependencies of the attack sequence at different scales, causal attention is used within different windows to ensure causality between attack sequences, which is utilized for the prediction of the next attack step. To achieve this, we apply the causal mask matrix to the Softmax function:

M_{l} = \{\begin{matrix} 1 & if t^{'} < t \\ - \infty & if t^{'} \geq t \end{matrix}

(9)

This ensures that an attack phase t is only connected with the previous attack phase

t^{'}

. We apply the mask to the attention score matrix and perform Softmax normalization on the final weighted attention scores to obtain the expression for causal window attention:

C a u s a l A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{m o d e l}}} + M_{l}) V

(10)

The multi-scale dependencies of the attack sequence are extracted through causal window attention with different window sizes. Small windows capture local detailed features, while large windows model global long-term dependencies. Combining both allows the model to simultaneously focus on local causal features and global patterns.

After passing through L layers of encoders and decoders, the final prediction results are mapped to the target category space through a linear transformation, generating predictions for the next possible attack technique or tactic label in the attack sequence. To optimize model performance, cross-entropy loss is used as the training objective, which is defined as follows:

L_{C E} = - \sum_{t = 1}^{T} y_{t} log {\hat{y}}_{t}

(11)

where

y_{t}

is the true label and

{\hat{y}}_{t}

is the predicted probability by the model. By minimizing the cross-entropy loss, the model can more accurately accomplish the multi-step attack prediction task.

4. Experiments

In this section, we comprehensively evaluate the performance of the DeepOP framework by benchmarking it against several state-of-the-art attack prediction methodologies. The evaluation is aimed at assessing DeepOP’s attack prediction capabilities, computational efficiency, and robustness across different scenarios. Additionally, we conduct ablation studies to isolate and evaluate the contributions of the causal window attention (CWA) module and validate the effectiveness of the ontology reasoning module through a detailed case study.

To establish a robust comparative analysis, we select baseline models including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bayesian networks. To evaluate the impact of the CWA module in the attack prediction model, we perform ablation experiments using a orginal Transformer by removing the CWA module. This comprehensive experimental setup allows us to demonstrate DeepOP’s superior performance across various attack prediction tasks and highlight the critical roles of its key components.

The following subsections detail the dataset construction, experimental settings, computational complexity analysis, comparative performance evaluation, and ontology module validation.

4.1. Dataset

To validate the DeepOP framework’s effectiveness in multi-step attack prediction, this paper constructs a dataset of attack sequences based on real APT attack cases through ontological reasoning. The dataset is derived from publicly available APT attack case reports, comprehensively covering multiple actual attack scenarios, and maps unstructured threat descriptions to the policies and techniques of the MITRE ATT&CK framework to generate standardized sequence data suitable for attack prediction.

Each attack sequence is structured based on the MITRE ATT&CK framework and contains explicit tactics and techniques labels. The dataset presented in this study covers all MITRE tactics phases. To ensure the validity of the sequences, we set a filter condition of minimum sequence length in the data preprocessing to remove the invalid short sequences. Input sequences that are too short cannot adequately express the correlation between attack techniques, so only instances containing five or more attack phases are retained to constitute the dataset. This filtering strategy ensures the model learns the complex dependencies and temporal causal properties between attack techniques.

4.2. Experiments Settings

The experiments in this paper were conducted on an NVIDIA (Santa Clara, CA, USA) GeForce RTX 4080 GPU and Python 3.9.2, and the model was trained using the ADAM optimizer. The dataset is divided into training, test, and validation sets in the ratio of 8:1:1. Hyperparameters such as initial learning rate and batch size are optimized by the grid search method, and the hyperparameter configurations with the best model performance are recorded in multiple rounds of testing. In addition, to avoid model overfitting to the training set, we adopt an early stopping strategy during model training.

It should be noted that the attack sequence dataset used in this paper differs in format from those used in previous studies due to the significant difference in the performance of attack prediction in different dataset formats. In comparing performance with other models, this paper replicates several prediction methods for similar tasks, including a GRU-based deep learning model [17], a prediction method that incorporates Bayesian networks [12], and the DeepAG framework [20], which uses bi-directional deep learning for attack graph construction. The GRU model is well known for its ability to learn security dependencies in alert sequences, the Bayesian network approach for multi-step attack prediction through causal analysis, and DeepAG, the current SOTA for sequence prediction tasks, combines Transformer and LSTM models for APT attack sequence detection and path prediction with excellent performance. To further verify the effectiveness of the causal window attention(CWA) mechanism, this paper also implements the attack prediction task on the original Transformer model [35] and conducts comparative experiments.

In this study, we compare DeepOP’s performance with other methods using four classical evaluation metrics. These metrics include accuracy, precision, recall, and F1-score. It is worth noting that the F1-score, as a combined evaluation metric of accuracy [39] and recall, is more suitable for measuring the global performance in the prediction task.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(12)

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

R e c a l l = \frac{T P}{T P + F N}

(14)

F 1 S c o r e = \frac{2 \times (P r e c i s i o n \times R e c a l l)}{P r e c i s i o n + R e c a l l}

(15)

where TP is the number of correct predictions in the positive sample, TN is the number of incorrect predictions in the negative sample, FP is the number of correct predictions in the negative sample, and FN is the number of incorrect predictions in the positive sample. This paper presents the attack prediction model as Algorithm 1.

Algorithm 1 Training based on attack prediction model.

Input:

Training dataset ( $D_{t r a i n}$ ), Validation dataset ( $D_{v a l}$ ), Test dataset ( $D_{t e s t}$ )
Number of layers (L), Number of attention heads (h), Window size ( $n_{c w}$ )
Embedding dimension (d), Learning rate ( $η$ ), Number of epochs (E)

Output: Trained DeepOP attack prediction model

Process:

1:: begin
2:: Initialize model parameters
3:: Set up Adam optimizer and weighted cross entropy loss function
4:: for epoch $= 1$ to E do
5:: for each attack sequence in $D_{t r a i n}$ do
6:: Embed attack sequence and incorporate positional encoding
7:: Pass embeddings through the encoder to obtain contextual representations
8:: Initialize decoder input with start token
9:: Divide the decoder input sequence into different windows
10:: Perform self-attention within each window using causal masks
11:: Concatenate attention outputs from all windows to form decoder representations
12:: Generate and append predictions to the decoder input
13:: Compute cross-entropy loss between predictions and true labels
14:: Backpropagate loss and update model parameters using the optimizer
15:: end for
16:: Evaluate model performance on $D_{v a l}$ by computing validation loss and accuracy
17:: if validation loss does not improve for a predefined number of epochs then
18:: Trigger early stopping
19:: Exit training loop
20:: end if
21:: end for
22:: Select the best model based on validation performance
23:: Evaluate the selected model on $D_{t e s t}$
24:: Compute final evaluation metrics and generate reports
25:: end

4.3. Complexity Analysis

In this study, the proposed causal window attention (CWA) module is employed to model multi-scale dependencies and complex causal relationships within attack behavior sequences. To evaluate the performance of the CWA module in terms of computational resources, we analyze its computational complexity. For an attack sequence represented as

x \in R^{N \times d}

, the computational complexity is given by

Ω (CWA) = h \cdot (4 N d^{2} + 2 N \cdot \frac{d}{n_{c w}} \sum_{i = 1}^{n_{c w}} c w_{i}^{2})

(16)

where h denotes the number of attention heads, d represents the feature dimension,

n_{c w}

is the number of window groups, and

c w_{i}

indicates the window size of the i-th window group. Since both

n_{c w}

and

c w_{i}

are constants, the computational complexity of the CWA module scales linearly with the sequence length N, maintaining an overall complexity of

O (N)

. Compared to the traditional Multi-Head Self-Attention (MSA) mechanism with a computational complexity of

O (N^{2})

, the CWA module significantly reduces the demand for computational resources.

4.4. Comparison with Other Methods

To evaluate the efficacy of the DeepOP framework in the task of next-step attack prediction, this study conducts a comparative analysis against several existing methodologies, including LSTM [20] (the current SOTA method), GRU [17], and Bayesian networks [12]. Furthermore, to assess the contribution of the causal window attention (CWA) mechanism, an ablation study was performed by removing the CWA module and comparing the performance with the original Transformer model [35]. Given the variability in attack sequence lengths, the experiment involves predicting the subsequent attack technique at each time step of every attack sequence and computing the average prediction performance across all time steps. Table 3 presents the accuracy, precision, recall, and F1-score for each method.

The experimental results reveal that DeepOP surpasses existing methods in all evaluation metrics, demonstrating substantial performance enhancements. Specifically, DeepOP’s F1-score is 0.894, a 3.35% improvement compared to the SOTA method LSTM (F1-score of 0.865) and a 5.9% improvement compared to the GRU model (F1-score of 0.835). These improvements are primarily attributed to the causal window attention (CWA) mechanism’s ability in DeepOP to effectively model multi-scale temporal dependencies and causal relationships within attack sequences. In addition, the traditional Bayesian network exhibits markedly inferior performance across all metrics (F1-score of 0.614), primarily due to its limitations in modeling intricate attack sequences.

In the ablation study, the Transformer model without the CWA module achieves an F1-score of 0.861, which is marginally lower than that of LSTM (0.865) and significantly below DeepOP (0.894). This finding underscores the critical role of the CWA module in enhancing model performance, highlighting its effectiveness in capturing complex dependencies inherent in attack sequences.

Moreover, DeepOP excels in recall, achieving a score of 0.978, which is notably higher than that of LSTM (0.972) and GRU (0.913). This demonstrates DeepOP’s superior capability in comprehensively capturing attack behaviors. Additionally, the precision rate improves to 0.822, indicating that DeepOP effectively balances high recall with reduced false alarms, thereby optimizing the overall F1-score. Collectively, these results substantiate the superior performance of the DeepOP framework, particularly its robust predictive capability in complex and variable attack sequence scenarios.

In order to fully evaluate the performance of the models in a multi-step prediction task, we designed an experiment aimed at testing the performance of each model with different numbers of prediction steps. Figure 6 shows the trend in the F1-score of each model with the prediction step J (one to six) in the multi-step attack prediction task. As the prediction step length increases, the performance of all models shows different degrees of degradation, which indicates that the multi-step prediction task places higher demands on the model’s ability to capture long-range dependencies and contextual information. In contrast, DeepOP outperforms the other models, with the most minor performance degradation at all prediction steps.

Specifically, DeepOP’s F1-score remains at 0.767 at J = 6, while LSTM, Transformer, and GRU drop to 0.647, 0.663, and 0.617, respectively; the Bayesian network performs the worst at 0.393. DeepOP’s performance advantage is more evident at

J \geq 4

. The results of the ablation experiments demonstrate a significant decrease in the model’s multi-step attack prediction performance after removing the causal window attention mechanism, which further demonstrates the important role of this mechanism in enhancing the robustness of the model and reducing the performance decay. This finding underscores the crucial role of the causal window attentional mechanism, which can effectively capture temporal causality in sequences and reduce errors in the information transfer process in the multi-step prediction task. The experimental results show that DeepOP exhibits stronger robustness in the long-step prediction task, and its lower performance decay verifies the model’s usefulness in complex multi-step attack scenarios.

To test the performance of the model under inaccurate detection systems, we designed experiments with different detection failure rates (0–60%) to evaluate the robustness of DeepOP against other models (LSTM, GRU, original Transformer, and Bayesian networks). This study replicates scenarios of detection system failures by simulating data loss within attack sequences. The missing rate is defined as the proportion of attack techniques in the sequence that the system overlooks. For example, a failure rate of 30% indicates that approximately 30% of the attack techniques in the sequence were not detected. In the experiments, we employ a random sampling method to remove certain attack techniques from the original attack sequences based on a predefined probability, thereby simulating varying degrees of detection failure. Furthermore, to minimize biases introduced by randomness, multiple independent simulations are conducted for each detection failure rate, and the average of all experimental results is used as the final performance evaluation metric.

Figure 7 shows the results of our experiments. The experiments use the F1-score to measure the model’s predictive ability in the presence of incomplete input data. The results show that the F1-score of all models decreases as the detection failure rate increases. Still, DeepOP always maintains the best performance, with its F1-score decreasing from 0.894 (0% detection failure rate) to 0.526 (60% detection failure rate), significantly smaller than other models. In contrast, the F1-score of the original Transformer and LSTM drops to 0.421 and 0.452, respectively. At the same time, GRU and Bayesian networks perform even worse, with their F1-scores dropping to 0.397 and 0.272, respectively, which suggests that DeepOP exhibits greater robustness in dealing with missing inputs, especially in the extreme case of a high detection failure rate (50–60%), where the F1-score drops from 0.894 (0% detection failure rate) to 0.526 (60% detection failure rate). In (50–60%) extreme conditions, it still significantly outperforms other methods. This result further validates the effectiveness of the causal window attention mechanism in modeling multi-step attack sequences.

4.5. Ontology Module Validation

To validate the contribution of the ontology reasoning module, we selected a real-world advanced persistent threat (APT) attack report, titled Frankenstein [40], as an experimental case study. Through the application of ontology reasoning and knowledge graph mapping on this security report, we aim to demonstrate the effectiveness of the ontology module in parsing and reconstructing complex, multi-step attack sequences. The Frankenstein attack report provides a detailed account of a sophisticated APT operation, segmented into two primary phases.

Phase One outlines the attacker’s strategy of sending phishing emails to entice victims into downloading and executing malicious files, thereby implanting malicious code for persistent control. This phase culminates in the completion of data exfiltration through Command and Control (C2) communications. Phase Two details a series of actions carried out by malicious scripts to acquire additional payloads, enhancing the attack’s persistence and effectiveness.

Figure 8 illustrates the results of extracting information from the Frankenstein security report and mapping it to a knowledge graph. The mapped results indicate that the blue path represents the first phase of the attack process, encompassing the entire attack chain from the initial phishing email to data exfiltration. Conversely, the pink path depicts the second phase, showcasing the series of actions undertaken by the malicious script to obtain supplementary attack payloads.

The analysis presented in Figure 8 demonstrates that ontology reasoning and knowledge graph mapping enable the accurate reconstruction of the complete attack path, thereby elucidating the attacker’s strategies and tactics.

Table 4 compares the extraction results of our attack sequence extraction method with existing methods [12]. Existing methods form attack sequences by ordering the extracted attack techniques according to tactical phases.

As shown in the table, in a complex scenario such as Frankenstein [40], which involves multiple attack paths, the existing methods are unable to extract effective attack sequences and have limitations in dealing with multi-path attacks. In contrast, our ontology reasoning and knowledge graph mapping approach successfully extracts multiple attack paths from the security report, demonstrating superior capability in attack sequence extraction and complex path processing. These findings unequivocally affirm the effectiveness of the ontology module in parsing complex attack activities.

5. Conclusions

In this study, we proposed DeepOP, a hybrid framework that integrates deep learning and ontology for predicting multi-step attack sequences. By leveraging the MITRE ATT&CK framework, we provided a structured and standardized approach for modeling attacker behavior at a fine-grained level, addressing limitations in existing prediction methods. The contributions of this research include (1) the development of a cybersecurity ontology for extracting causally connected attack sequences; (2) the construction of a dataset reflecting real-world APT attack scenarios, enriched with causal relationships and parallel paths between attack techniques; and (3) the introduction of a causal window attention mechanism in a transformer-based architecture to capture both local and global dependencies in attack sequences effectively.

Extensive experiments demonstrated the superior performance of DeepOP in multi-step attack prediction tasks. Unlike baseline models, including LSTM, GRU, Bayesian networks, and the original transformer, DeepOP consistently achieved higher accuracy, recall, precision, and F1 scores across various scenarios. Furthermore, robustness experiments under increasing detection failure rates validated the stability and reliability of DeepOP, even when input data were incomplete or noisy. These results highlight the practical applicability of DeepOP in addressing the challenges posed by advanced persistent threats (APTs) in the industrial IoT and beyond.

Future work will focus on extending the dataset to include more diverse attack scenarios and integrating real-time streaming data for online attack prediction. Additionally, exploring advanced causal inference methods and domain adaptation techniques will enhance the framework’s generalizability and performance in dynamic, real-world environments.

Author Contributions

S.Z. oversaw the overall progress of the project. X.X. designed the research methodology, developed the model, and analyzed the experimental results. X.S. conducted literature reviews, explored relevant issues, and refined the language. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by The Young Scientists Fund of the National Natural Science Foundation of China (62302540) and The Henan Province Science and Technology Research Project (242102210133).

Data Availability Statement

Researchers can obtain the data by contacting the corresponding author via email.

Acknowledgments

The authors would like to thank all those who have contributed in this area and the anonymous reviewers for their valuable comments and suggestions, which have improved the presentation of this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

Alladi, T.; Chamola, V.; Zeadally, S. Industrial control systems: Cyberattack trends and countermeasures. Comput. Commun. 2020, 155, 1–8. [Google Scholar] [CrossRef]
Khan, R.; Maynard, P.; McLaughlin, K.; Laverty, D.; Sezer, S. Threat analysis of blackenergy malware for synchrophasor based real-time control and monitoring in smart grid. In Proceedings of the 4th International Symposium for ICS & SCADA Cyber Security Research 2016, ICS-CSR, Belfast, UK, 23–25 August 2016; pp. 53–63. [Google Scholar]
Friedberg, I.; Skopik, F.; Settanni, G.; Fiedler, R. Combating advanced persistent threats: From network event correlation to incident detection. Comput. Secur. 2015, 48, 35–57. [Google Scholar] [CrossRef]
Zohrevand, Z.; Glässer, U. Should i raise the red flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms. arXiv 2019, arXiv:1904.06646. [Google Scholar]
Khan, I.A.; Pi, D.; Khan, Z.U.; Hussain, Y.; Nawaz, A. HML-IDS: A hybrid-multilevel anomaly prediction approach for intrusion detection in SCADA systems. IEEE Access 2019, 7, 89507–89521. [Google Scholar] [CrossRef]
GhasemiGol, M.; Ghaemi-Bafghi, A.; Takabi, H. A comprehensive approach for network attack forecasting. Comput. Secur. 2016, 58, 83–105. [Google Scholar] [CrossRef]
Polatidis, N.; Pimenidis, E.; Pavlidis, M.; Papastergiou, S.; Mouratidis, H. From product recommendation to cyber-attack prediction: Generating attack graphs and predicting future attacks. Evol. Syst. 2020, 11, 479–490. [Google Scholar] [CrossRef]
Sun, N.; Zhang, J.; Rimba, P.; Gao, S.; Zhang, L.Y.; Xiang, Y. Data-driven cybersecurity incident prediction: A survey. IEEE Commun. Surv. Tutor. 2018, 21, 1744–1772. [Google Scholar] [CrossRef]
Huang, L. Design of an IoT DDoS attack prediction system based on data mining technology. J. Supercomput. 2022, 78, 4601–4623. [Google Scholar] [CrossRef]
Holgado, P.; Villagrá, V.A.; Vazquez, L. Real-time multistep attack prediction based on hidden markov models. IEEE Trans. Dependable Secur. Comput. 2017, 17, 134–147. [Google Scholar] [CrossRef]
Liu, Y.; Guo, Y. CL-AP2: A composite learning approach to attack prediction via attack portraying. J. Netw. Comput. Appl. 2024, 230, 103963. [Google Scholar] [CrossRef]
Kim, Y.; Lee, I.; Kwon, H.; Lee, K.; Yoon, J. BAN: Predicting APT Attack Based on Bayesian Network With MITRE ATT&CK Framework. IEEE Access 2023, 11, 91949–91968. [Google Scholar] [CrossRef]
Strom, B.E.; Applebaum, A.; Miller, D.P.; Nickels, K.C.; Pennington, A.G.; Thomas, C.B. Mitre ATT&CK: Design and Philosophy; Technical Report; The MITRE Corporation: McLean, VA, USA, 2018. [Google Scholar]
Al-Mohannadi, H.; Mirza, Q.; Namanya, A.; Awan, I.; Cullen, A.; Disso, J. Cyber-attack modeling analysis techniques: An overview. In Proceedings of the 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), Vienna, Austria, 22–24 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 69–76. [Google Scholar]
Naik, N.; Jenkins, P.; Grace, P.; Song, J. Comparing attack models for IT systems: Lockheed Martin’s Cyber Kill Chain, MITRE ATT&CK Framework and Diamond Model. In Proceedings of the 2022 IEEE International Symposium on Systems Engineering (ISSE), Vienna, Austria, 24–26 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–7. [Google Scholar]
Bartos, V.; Zadnik, M.; Habib, S.M.; Vasilomanolakis, E. Network entity characterization and attack prediction. Future Gener. Comput. Syst. 2019, 97, 674–686. [Google Scholar] [CrossRef]
Ansari, M.S.; Bartoš, V.; Lee, B. GRU-based deep learning approach for network intrusion alert prediction. Future Gener. Comput. Syst. 2022, 128, 235–247. [Google Scholar] [CrossRef]
Wang, W.; Yi, P.; Jiang, J.; Zhang, P.; Chen, X. Transformer-based framework for alert aggregation and attack prediction in a multi-stage attack. Comput. Secur. 2024, 136, 103533. [Google Scholar] [CrossRef]
Zhao, J.; Tang, T.; Bu, B.; Li, Q. Graph neural network-based attack prediction for communication-based train control systems. CAAI Trans. Intell. Technol. 2024, 1–13. [Google Scholar] [CrossRef]
Li, T.; Jiang, Y.; Lin, C.; Obaidat, M.S.; Shen, Y.; Ma, J. Deepag: Attack graph construction and threats prediction with bi-directional deep learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 740–757. [Google Scholar] [CrossRef]
Al-Shaer, R.; Spring, J.M.; Christou, E. Learning the associations of MITRE ATT&CK adversarial techniques. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Held Virtually, 29 June–1 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–9. [Google Scholar]
Niakanlahiji, A.; Wei, J.; Chu, B.T. A natural language processing based trend analysis of advanced persistent threat techniques. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2995–3000. [Google Scholar]
Satvat, K.; Gjomemo, R.; Venkatakrishnan, V. Extractor: Extracting attack behavior from threat reports. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), Held Virtually, 6–10 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 598–615. [Google Scholar]
Husari, G.; Al-Shaer, E.; Ahmed, M.; Chu, B.; Niu, X. Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources. In Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, 4–8 December 2017; pp. 103–115. [Google Scholar]
Li, Z.; Zeng, J.; Chen, Y.; Liang, Z. AttacKG: Constructing technique knowledge graph from cyber threat intelligence reports. In Proceedings of the European Symposium on Research in Computer Security; Springer: Berlin/Heidelberg, Germany, 2022; pp. 589–609. [Google Scholar]
Myneni, S.; Jha, K.; Sabur, A.; Agrawal, G.; Deng, Y.; Chowdhary, A.; Huang, D. Unraveled—A semi-synthetic dataset for Advanced Persistent Threats. Comput. Netw. 2023, 227, 109688. [Google Scholar] [CrossRef]
Bagui, S.S.; Mink, D.; Bagui, S.C.; Ghosh, T.; Plenkers, R.; McElroy, T.; Dulaney, S.; Shabanali, S. Introducing uwf-zeekdata22: A comprehensive network traffic dataset based on the mitre att&ck framework. Data 2023, 8, 18. [Google Scholar]
Monitor, C. APT Cyber Criminal Campaign Collections. 2024. Available online: https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections (accessed on 30 November 2024).
Orbinato, V.; Barbaraci, M.; Natella, R.; Cotroneo, D. Automatic mapping of unstructured cyber threat intelligence: An experimental study (practical experience report). In Proceedings of the 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), Charlotte, NC, USA, 31 October–3 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 181–192. [Google Scholar]
Legoy, V.; Caselli, M.; Seifert, C.; Peter, A. Automated retrieval of att&ck tactics and techniques for cyber threat reports. arXiv 2020, arXiv:2004.14322. [Google Scholar]
Marchiori, F.; Conti, M.; Verde, N.V. Stixnet: A novel and modular solution for extracting all stix objects in cti reports. In Proceedings of the 18th International Conference on Availability, Reliability and Security (ARES), Benevento, Italy, 29 August–1 September 2023; ACM: New York, NY, USA, 2023; pp. 1–11. [Google Scholar]
Lei, G.; Ruibin, S.; Yu, T. Research on key technologies of ontology based threat modeling for cyber range. J. CAEIT 2020, 15, 1139–1144. [Google Scholar]
Rastogi, N.; Dutta, S.; Zaki, M.J.; Gittens, A.; Aggarwal, C. Malont: An ontology for malware threat intelligence. In Proceedings of the International Workshop on Deployable Machine Learning for Security Defense; Springer: Berlin/Heidelberg, Germany, 2020; pp. 28–44. [Google Scholar]
Wang, F.; Zhang, Y.; Luo, X. Semantic query of ontology knowledge base based on SQWRL. Comput. Technol. Dev. 2017, 2, 24–29. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lei Ba, J.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Church, K.W. Word2Vec. Nat. Lang. Eng. 2017, 23, 155–162. [Google Scholar] [CrossRef]
Ahn, G.; Kim, K.; Park, W.; Shin, D. Malicious file detection method using machine learning and interworking with MITRE ATT&CK framework. Appl. Sci. 2022, 12, 10761. [Google Scholar] [CrossRef]
Corporation, M. Frankenstein Campaign (C0001), Online Resource. 2023. Available online: https://attack.mitre.org/campaigns/C0001/ (accessed on 27 April 2024).

Figure 1. Modeling APT attacks with ATT&CK.

Figure 2. An overview of DeepOP.

Figure 3. Mapping attack description statements to attack techniques.

Figure 4. COASE ontology and relations.

Figure 5. Overall architecture of attack prediction model.

Figure 6. Performance comparison at different stages of an attack.

Figure 7. Performance comparison across different missing rates.

Figure 8. Security report mapping to knowledge graph.

Table 1. Dataset composition.

Tactic	TA0001	TA0002	TA0003	TA0004	TA0005	TA0006	TA0007	TA0008	TA0009	TA0011	TA0010	TA0040	Total
Number	583	1776	2471	2432	4759	1562	2517	572	1257	1659	357	425	17,302

Note: The total is not the sum of the row because a single sample may belong to multiple categories.

Table 2. Regular expressions used in IoC protection.

Class	Regular Expression
IP	(?:[0-9]1,3\.)3[0-9]1,3(?:\/[0-9]1,2)?
URL	h[tx]2ps?:\/\/(?:[a-zA-Z0-9\-._ %!$&’()*+,;=:@\/\ [\]]+)
Domain	((?:[a-zA-Z0-9-]+\.)+(?!exe\|dll)[a-zA-Z]2,4)
Email	[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]2,4
Registry	(HKLM\|HKCU\|HKCR\|HKU\|HKCC)\\[\\A-Za-z0-9-_]+

Table 3. Prediction performance comparison.

Method	Accuracy	Precision	Recall	F1-Score
LSTM [20]	0.867	0.779	0.972	0.865
GRU [17]	0.841	0.769	0.913	0.835
Bayesian network [12]	0.608	0.549	0.695	0.614
Transformer [35]	0.857	0.783	0.956	0.861
Proposed model	0.898	0.822	0.978	0.894

Table 4. Comparison of attack sequence extraction methods.

Proposed model	AS1: TA0001.T1566-TA0002.T1204-TA0002.T1203-TA0003.T1547-TA0011.T1071
	TA0010.T1041
	AS2: TA0005.T1218-TA0005.T1027-TA0007.T1082-TA0011.T1573-TA0010.T1041
BAN [12]	TA0001.T1566-TA0002.T1204-TA0002.T1203-TA0003.T11547-TA0005.T1218-
	TA0005.T1027-TA0007.T1082-TA0011.T1573-TA0011.T1071-TA0010.T1041

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Xue, X.; Su, X. DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology. Electronics 2025, 14, 257. https://doi.org/10.3390/electronics14020257

AMA Style

Zhang S, Xue X, Su X. DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology. Electronics. 2025; 14(2):257. https://doi.org/10.3390/electronics14020257

Chicago/Turabian Style

Zhang, Shuqin, Xiaohang Xue, and Xinyu Su. 2025. "DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology" Electronics 14, no. 2: 257. https://doi.org/10.3390/electronics14020257

APA Style

Zhang, S., Xue, X., & Su, X. (2025). DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology. Electronics, 14(2), 257. https://doi.org/10.3390/electronics14020257

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology

Abstract

1. Introduction

2. Related Work and Background

2.1. Modeling APT Attack

2.2. Attack Prediction

2.3. Threat Intelligence Extraction and Ontology Reasoning

3. Materials and Methods

3.1. Data Collection

3.2. Information Extraction from CTI

3.3. Ontology Inference

3.4. Attack Prediction Module

4. Experiments

4.1. Dataset

4.2. Experiments Settings

4.3. Complexity Analysis

4.4. Comparison with Other Methods

4.5. Ontology Module Validation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI