Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets

: In recent years, advanced persistent threat (APT) attacks have become a significant network security threat due to their concealment and persistence. Correlation analysis of APT groups is vital for understanding the global network security landscape and accurately attributing threats. Current studies on threat attribution rely on experts or advanced technology to identify evidence linking attack incidents to known APT groups. However, there is a lack of research focused on automatically discovering potential correlations between APT groups. This paper proposes a method using attack behavior patterns and rough set theory to quantify APT group relevance. It extracts two types of features from threat intelligence: APT attack objects and behavior features. To address the issues of inconsistency and limitations in threat intelligence, this method uses rough set theory to model APT group behavior and designs a link prediction method to infer correlations among APT groups. Experimental results on publicly available APT analysis reports show a correlation precision of 90.90%. The similarity coefficient accurately reflects the correlation strength, validating the method’s efficacy and accuracy.


Introduction
APT attacks refer to the persistent and targeted intrusions of an advanced group against specific targets.Unlike generic hacker attacks, APT attacks aim to attack infrastructure and steal sensitive intelligence.They exhibit a strong national strategic intention, which seriously threatens a nation's cyberspace security [1].In recent years, APT attacks have demonstrated a clear trend towards cyberwarfare.Some APT groups have started collaborating and sharing intelligence to pursue common strategic goals.In 2010, the "Stuxnet" incident [2] set a precedent for nation-state APT attacks.U.S. and Israeli intelligence agencies planned and developed this incident [3].Germany and France contributed supply chain intelligence, and Dutch agents facilitated payload delivery [4].In December 2016, the U.S. Department of Homeland Security attributed attacks on the Democratic National Committee and U.S. election campaigns to coordinated efforts by APT28 and APT29, linked to Russian intelligence agencies [5].In 2019, Symantec reported APT34 infrastructure use during Operation Waterbug [6].Understanding the correlation between APT groups can help to discover the potential collaboration and information sharing between APT groups and provide direction for APT attack discovery and traceability.
Existing research [7][8][9][10][11][12] mainly focuses on APT attack profiling, detection, and attribution.These studies typically discover and trace attack behaviors by analyzing the characteristics of specific APT groups.Few studies have focused on analyzing the correlations between APT groups.Some methods [13][14][15][16] explore correlations between attack behaviors to uncover associations between incidents.However, individual incident relevance is subject to change and does not fully represent APT group attack patterns.More accurate measurement requires assessing multiple attack characteristics comprehensively.
In industry, APT group correlation analysis is largely a manual process carried out by security analysts, heavily reliant on their expertise.Analysts establish correlations by examining attack resources like IP addresses, domains, vulnerabilities, or by comparing similarities in signatures and code strings.However, expert-based methods can be biased and may not handle the growing scale of attacks effectively.Analyzing APT group correlations has several challenges.Firstly, fragmented information makes effective organization difficult, leading to errors in correlation analyses.Secondly, knowledge-based methods relying on expert experience require constant knowledge base updates, consuming time and resources and causing delays in relation disclosures.Lastly, there is a lack of an effective calculation method to measure APT group correlations.
In this paper, we introduce an APT group correlation analysis method using attack behavior patterns and rough set theory.According to reference [17], attackers are defined based on key indicators such as tradecraft, infrastructure, malware, intention, and external information.This paper synthesizes fragmented information from multiple dimensions to create a sample set of APT groups.We extract dynamic and static behavior features from APT malware to gather information about tradecraft, infrastructure, and malware characteristics.We also gather APT security events to obtain information about intentions.To address the issue of inconsistent and inaccurate information in raw data, we employ rough set theory, a mathematical tool used to handle imprecise and incomplete information [18], to model APT group behavior patterns.To uncover potential correlations, we design a link prediction method, which is crucial for revealing hidden relations [19].
The main contributions of this paper are as follows: (1) This paper presents an APT group knowledge model, generated semi-automatically from threat intelligence.It integrates the features of attack objects and attack behaviors for more comprehensive APT group profiling.
(2) This paper proposes an innovative method for APT group correlation analysis, leveraging rough set theory.To our knowledge, this is the first academic study automating the analysis of APT group correlations.The method dynamically creates APT group behavior patterns by approximating upper and lower bounds and considers fuzzy behavior to design a link prediction method for correlation inference.The correlation precision is up to 90.90%.
(3) This paper analyzes the development and evolution trends of APT attacks by observing the changes in correlations among different groups in a dataset covering 14 years from 2008 to 2022.
The remainder of the paper is structured as follows.Section 2 discusses related work on attack attribution and cyber security incident correlation analyses.Section 3 presents our APT group correlation analysis approach.In Section 4, we provide a detailed summary of our approach's experimental results.Section 5 showcases the evolution patterns of APT group correlations through case studies.Finally, Section 6 concludes the paper and outlines future work and prospects.

Related Work
This section introduces some related works about attack attribution and existing methods for cyber security incident correlation analyses.The limitations of these existing methods are further analyzed and discussed.

Attack Attribution
Attribution refers to the process of attributing cyber attack actions to a specific entity, attacker, or group.Researchers typically utilize various attack-related data to model attackers, such as malicious code, indicators of compromise (IOCs), and tactics, techniques, and procedures (TTPs).They then employ association rules or classification algorithms to identify attackers.
Malicious code serves as a crucial tool in APT attacks.Using malicious code to identify threat actors holds significant importance in attack tracing.Several studies have proposed feature dimensions for malicious code attribution from different perspectives.Son et al. [20] proposed nine factors for relation analysis of malware, considering perspectives such as attack propagation, malware, and attack sources.Hamed et al. [21] developed twelve views from four perspectives: opcode, bytecode, system calls, and titles.Parunak [22] defined malware behaviors as event sequences generated by a specific grammar and estimated the similarity between different ransomware based on the generated strings.Kida et al. [23] used the NATO phonetic alphabet as embedded fuzzy hashes to identify similarities.Liras et al. [24] extracted dynamic and static features from malware for identifying APT attacks.These features are only for specific APT groups and cannot cover all APT groups networks.Several studies have improved malware attribution models.Li et al. [25] proposed a model that integrates SMOTE and random forest algorithms for dealing with imbalanced data from different groups.Dib et al. [26] designed a deep learning model that integrates multi-layer features from both strings and images.Wang et al. [27] used semantic analysis to identify attack groups.Black et al. [28] proposed a contextual similarity technique that strengthens the results of function similarity.However, malicious code only exists in specific stages of the APT attack process, and a single malware instance cannot provide a comprehensive view of the entire attack.
IOCs serve as forensic evidence of potential intrusions of a host system or network.Different groups have their own unique IOC libraries.Zhao et al. [29] developed an automated IOC extraction method based on word embedding and a syntactic dependency analysis of threat texts for identifying IOC domains.However, the large number of IOCs are prone to frequent changes, with attackers often altering their IOCs within a short timeframe.
TTPs are the most important feature to distinguish APT groups as they reflect the behavior patterns of attackers.Compared to malicious code and indicators, patterns are more resistant to change.FireEye [30] used the existing knowledge of TTPs to cluster unknown groups using cosine similarity.Noor et al. [31] utilized latent semantic analysis methods to index threat intelligence into TTPs and trained machine learning classifiers for APT group identification.Kim et al. [32] extracted TTPs from sandbox reports and calculated their correlation with threat actors using vector similarity.They also introduced IOCs to refine the correlation results.While TTPs are valuable for attacker profiling, they cannot be directly derived from data and require expert judgment.This process consumes significant manpower and time, leading to delays in threat intelligence.
APT attacks typically involve multiple views and encompass various correlations, such as those between attack targets, attack resources, and attack behaviors.However, existing attack attribution methods often focus on describing the attacker from a singular dimension.This limited perspective hinders a comprehensive understanding of the attacker's profile and can lead to deviations or even errors in correlation analysis results.

APT Group Correlation Analysis
Some studies have explored correlation analysis methods for cyber security incidents.In 2017, Perl [13] introduced multi-type attributes for attack correlation and suggested different similarity calculation methods based on attribute types.Rezapour et al. [14] proposed combining blacklist and victim similarity measures to assess attacker similarity.Karafili et al. [15] developed an automated argumentation-based reasoner using both technical and social evidence.Xu et al. [16] proposed a dependence probability algorithm to analyze and quantify the correlation between network attack traces.However, these methods heavily rely on pre-defined feature sets or association rules based on expert knowledge, which may not adapt well to the dynamic changes in the network environment.
The utilization of rough set theory in this study enables the definition of behavior patterns for APT groups, facilitating the dynamic expansion of new attack knowledge.Furthermore, this paper surpasses the limitations of inherent association rules by introducing a link prediction method to construct an APT group relation network.

Methodology
This section contains a description of an approach for modeling the behavior patterns of APT groups and a method for analyzing the correlation between groups.The model is structured into three principal components: APT knowledge representation generation, construction of behavior patterns of APT groups, and APT group correlation analysis.The model framework is shown in Figure 1.The APT knowledge representation generation module collects security events and malware samples from available threat intelligence sources.This module extracts features related to attack objects and attack behaviors exhibited by APT groups from the collected samples and conducts knowledge representation to facilitate the comprehensive profiling of APT groups.
The module for the construction of the behavior patterns of APT groups utilizes rough set operators to establish the behavior pattern of APT groups.This module employs the concept of upper and lower approximations and constructs precise domains, fuzzy domains, and irrelevant domains to effectively describe the behavior patterns of APT groups.
The APT group correlation analysis module introduces a measurement method based on link prediction to quantify the similarity between different APT groups.This module enables the detection of potential hidden correlations and facilitates the construction of the APT group relation network.

Knowledge Representation of APT Groups
This paper combines attack objects and behavior features to profile APT groups, defining the knowledge representation of APT groups.Definition 1. Knowledge representation of APT groups (Group) refers to the knowledge representation system used to describe the behavior pattern of APT groups, including the feature representation of an attack object, which is composed of security event feature sequences, and the feature representation of attack behavior, which is composed of malware feature sequences.

Group = Object Behavior
(1) Definition 2. Feature representation of attack objects (Object) refers to a knowledge representation system composed of a set of security event feature sequences.Each feature sequence consists of a set of security event attributes.
The feature representation of an attack object is defined as the following four-tuple [33]: Let D = {g 1 , g 2 , ..., g K } denote the set of APT groups that launched these security events.
, where V a is the set of values of attribute a, and f o : R o → V o is an information or a description function.
Take the example of a reported security event in threat intelligence."In 2017, researchers from ESET uncovered Gazer, a new malware tool used by the infamous threat actor Turla to spy on embassies and consulates in Europe".In this case, the feature sequence of this security event can be represented as e 1 = [id, Organization, Diplomacy, Europe, Malware, Gazer, None, None, 2017, Turla].Definition 3. Feature representation of attack behavior (Behavior) refers to a knowledge representation system composed of a set of malware feature sequences.Each feature sequence consists of a set of malware attributes.
The feature representation of attack behavior is defined as the following four-tuple: R b is a finite nonempty set of attack behavior attributes and the subset C b is condition attribute set.C b = {time, Static, Dynamic, Vulnerability} is the attribute set of malware.Thus,

•
Static denotes the static feature set of malware, including attributes such as the import and export functions of executable files, APIs, language types, resource items, etc.These features provide insights into the coding intentions and language preferences exhibited by malicious code.

•
Dynamic denotes the dynamic feature set of malware, including attributes such as commands, files, registry entries, processes, and dynamic link libraries observed during the execution of malware samples.These features aid in refining the understanding of the attack process.

•
Vulnerability denotes the vulnerability feature set of malware.• time denotes the date on which the malware was initially detected in VirusTotal (www.virustotal.com,accessed on 15 August 2023).
, where V a is the set of values of attribute a.Each value in V a is defined as follows: Dynamic feature values are defined as the occurrence frequency divided by 100, enabling the differentiation of high-, medium-, and low-frequency features.The static and vulnerability features are encoded as either 0 or 1, depending on their presence.The time feature corresponds to the year when the malware was initially detected.
f b : R b → V b is the information or a description function.
The malware behavior features collected from VirusTotal often contain significant redundancy.To address this issue, this paper utilizes a feature selection algorithm based on mutual information [34] to reduce the size of the conditional attribute set.Mutual information provides a measure of the dependency between condition attributes and decision attributes.For any condition attribute c i ∈ C b and decision attribute set D, the mutual information of c i and D is defined as Here, c denotes the attribute value in c i , g j denotes the APT group categories in D, and p(c, g j ) represents the joint probability distribution of condition attribute c i and the decision attribute set D, while p(c) and p(g j ) denote the marginal probability distribution of condition attribute c i and the decision attribute set D, respectively.When I(c i , D) = 0, c i and D are independent of each other.When the mutual information between two random variables is large, the two variables are dependent.The mutual information values between each condition attribute c i and the decision attribute D are calculated and sorted.The top κ condition attributes with the highest mutual information values are selected to form the condition attribute subset, thus achieving a reduction in the condition attribute set size.The effectiveness of attribute reduction is examined in Section 4.2.

Behavior Pattern Construction of APT Groups Based on Rough Sets
In threat intelligence, experts typically attribute APT security events and malware based on their experience, resulting in raw data inaccuracies and inconsistencies.APT attacks involve various techniques and tactics, and thus it is hard to create an accurate formula to describe APT group behavior.This paper takes inspiration from Loia et al.'s work [35] and uses rough set theory to define APT group behavior patterns.One of the basic ideas of rough set theory is to discover knowledge through the classification of the equivalence relation and the classification of the approximation of the target [33] The equivalence class of an object x is denoted by [x] ind(A) , or simply [x] A and [x], if no confusion arises.The pair (U, [x] ind(A) ) is called an approximation space.
Further, we define the upper and lower approximate sets of the sample set of group g k .
S(g k ) = {x ∈ U|D(x) = g k } denotes the sample set of the group g k .apr(S(g k )) represents the upper approximation set of S(g k ), which encompasses the unique feature sequences specific to APT group g k .apr(S(g k )) represents the lower approximation set of S(g k ), which encompasses all feature sequences that can be associated with APT group g k .When apr(S(g k )) ̸ = apr(S(g k )), we refer to this as a rough set.
According to the rough membership [36] of each feature sequence in the sample set S(g k ), the calculation is performed as shown in Equation (9).The APT group sample sets are divided into three distinct regions, forming the behavior patterns of the APT groups.
Definition 5. Behavior pattern of APT groups (Pattern) refers to the approximate description of the behavior of APT groups, which is determined by the rough membership of the feature sequence.It consists of three components: the precise domain, the fuzzy domain, and the irrelevant domain.
Prec denotes the precise domain, which consists of the unique feature sequences belonging to a specific APT group.
Fuzz represents the fuzzy domain, containing feature sequences associated with multiple APT groups.These sequences do not only represent a single APT group but also overlap with others.Their presence in the fuzzy domain indicates behavior similarities between APT groups.
Irr denotes the irrelevant domain, which consists of feature sequences that are entirely unrelated to the behavior patterns of group g k .
Based on Equation (9) and Equations ( 11)-( 13), the precise domain, fuzzy domain, and irrelevant domain are calculated as follows:

Correlation Measurement Method Based on Link Prediction
The analysis of APT group correlations relies on an effective similarity calculation function.In reference [37], a general form of the rough set similarity measurement function is proposed, as shown in Equation ( 17): Considering the proportion of similar feature sequences within fuzzy domains of different groups, we use the Jaccard coefficient as a direct correlation measure, which is calculated as follows: Given the limitations of threat intelligence feeds, it is possible that some correlations remain undiscovered through direct behavior analysis.Link prediction addresses the challenge of predicting the existence of unknown connections based on uncertain structural information within a network [38], which can estimate the likelihood of a connection between two nodes that are not currently linked within the known network [39].Consider an undirected network G(V, E), where V is the node set and E is the edge set, with n = |V| and m = |E| representing the number of nodes and edges, respectively.The goal is to identify missing links or predict the emergence of future links from the set of non-existing ones.
Lu et al. [40] have summarized different node similarity indices in link prediction.Table 1 shows a partial list of these indices.

Name Equation
Common Neighbors (CN) Hub-Promoted Index (HPI) denotes the set of neighbors of x. 2 k x is the degree of node x. 3 path ⟨l⟩ xy is the set of all paths with length l connecting x and y. β is a free parameter controlling the path weights.
Considering that attackers who share common collaborators are more inclined to cooperate, this paper incorporates the network structure of APT groups, using the number of neighboring nodes and path weights to devise a new node similarity index.Figure 2 shows the connected sub-network graph, with a diameter of 3, obtained after expanding the pair of groups (g i , g j ) by one layer.Let us define the set of all paths passing through nodes g i , g k , and g j as path(g i , g k , g j ).Each path in this set consists of a series of adjacent node pairs, denoted as p = {< u, v >, . . ., < x, y >, < y, z > |u, v, x, y, z ∈ D}.The weight of the entire path is determined by dividing the minimum distance in the path by the number of hops.For instance, considering Figure 2, the weight of the path (g i → g 5 → g j ) is calculated as 0.3/2 = 0.15.
The common neighbor correlation coefficient CN(g i , g j ) between two group nodes is defined for all g k ∈ (Γ(g i ) ∩ Γ(g j )).It is calculated by summing the weights of all paths passing through their common neighbor nodes, considering the node pair < g i , g j > as the start and end point.Since the scales of the local networks where different nodes are located may vary, this method normalizes the calculation results by dividing them by the total number of common neighbors.The calculation of CN(g i , g j ) is as follows: The similarity between APT groups is determined by CN(g i , g j ) and DIR(g i , g j ).
Sim(g i , g j ) = CN(g i , g j ) + DIR(g i , g j ) 2 In order to better show the correlation between different APT groups, we define the APT group relation network as N = (D, E), where D represents the node set consisting of APT groups, and E = {Sim(g i , g j )|g i , g j ∈ D, Sim(g i , g j ) > γ} represents the edge set.The weight of each edge corresponds to the similarity between the related groups.This network provides insights into the level of correlation among different APT groups and reveals clusters of related groups.

Data Gathering and Preprocessing
To validate this method, a large dataset of samples from 76 well-known APT groups was gathered, covering the period from 2008 to 2022.This dataset was composed of two sub-datasets: the security event sample set and the malware sample set.
Regarding security event samples, 432 of them were collected from HACKMAGED-DON (https://www.hackmageddon.com,accessed on 15 August 2023) websites.Furthermore, we also gathered 1164 additional security event samples obtained from APT technical reports.These reports were gathered from an open-source technical report platform (https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections,accessed on 15 August 2023).Hence, a total of 1596 APT security event samples were considered.Based on the feature representation of the attack object, we manually extracted security event samples and transformed them into structured data.
The malware sample set was composed of two sub-datasets that were later joined: a first dataset including a set of APT malware samples, and a second dataset with a collection of "general" malware samples.The samples of these two datasets were analyzed from static, dynamic, and vulnerability points of view using VirusTotal.VirusTotal is a renowned and trusted virus scanner that analyzes nearly one million distinct files daily using over 50 different tools [41].APT malware samples were collected from the APT technical reports mentioned above, and this dataset consists of 3219 samples from 76 well-known APT groups.General malware samples are defined as those that, up to this point, are not known to belong to any APT group.A total of 12,510 general malware samples were crawled from VirusShare (https://www.virusshare.com/,accessed on 15 August 2023), and they can be classified as follows: up to 8025 instances of malware were mainly considered trojan, 2261 were considered adware, 1231 were worms, 246 were downloaders, 673 were viruses, 23 were hacktools, 15 were ransomware, and 25 were bankers.In this classification, the names given to each sample by VirusTotal were used.Considering all the information obtained from static and dynamic analyses, we obtained a total of 162,750 features from the corresponding analysis reports.However, this includes some repetitive and redundant features.We performed a data preprocessing procedure as follows: (1) Feature normalization was applied to standardize the format and merge features that represent the same operation.(2) Features irrelevant to the attack were deleted.The preprocessed feature set forms the attribute set of attack behavior feature representation.It consists of 42,090 features distributed among 47 categories, which include 6 static feature categories, 40 dynamic feature categories, and vulnerability features.

Feature Verification and Analysis
To mitigate the curse of dimensionality caused by high-dimensional data spaces in the attribute set of attack behavior feature representation, we employed a feature selection algorithm based on mutual information to reduce the attribute set.
The goal of these experiments is to assess the selected features' ability to represent various APT group behavior patterns.All instances from the malware sample set were used in these experiments.We designed four classification tasks to compare feature selection algorithms based on TF-IDF, high-frequency words, and mutual information in order to evaluate their attribution accuracy and correlation results.Classification tasks are shown in Table 2.The dataset was randomly divided into a training set and a testing set, with four-fifths of the dataset used for training and one-fifth for testing.We trained a K-nearest neighbor (KNN) model for sample attribution.The number of neighbors was varied between 1 and 10 (we experimented with 3, 5, and 10 neighbors), and the best accuracy during cross-validation was obtained for k = 3.In the correlation experiment, we focused solely on the direct correlation coefficient to measure the similarity between APT groups.The correlation threshold was γ = 0.2.The results are shown in Table 3.The method based on high-frequency words only considers the frequency of features, and does not consider the correlation between features and decision attributes, resulting in poor results.In experiments A and B, the TF-IDF method achieves the best results, with an F1 value as high as 96.41%.This is because when the number of categories is small and the data are relatively balanced, the TF-IDF method can filter out features with a low frequency in the entire dataset by calculating the feature frequency and weight, which improves the generalization ability of the classifier.However, after adding the APT group classification, a sample imbalance problem occurs.The TF-IDF method may delete important features in some small sample categories, affecting the performance of the classifier and causing confusion in different APT groups, which reduces the correlation precision.However, the method based on mutual information can more accurately select the features highly related to the decision attributes, so it can better distinguish different classes and has the best effect in both attribution experiments and correlation experiments.
After performing attribute reduction based on mutual information, the attribute set retained a total of 172 features.The correlation among these attributes is displayed in Figure 3. Certain attributes exhibit strong correlations.This is often because they are frequently combined to facilitate the execution of an attack process.The following examples illustrate the reasons for the formation of strongly correlated attribute groups.A strongly correlated group of attributes corresponds to an office macro virus attack.The attack process is as follows: (1) The attacker executes winword.exe to launch Microsoft Word software.(2) They invoke vbe.dll, vbeintl.dll,riched20.dll,x86_microsoft.vc.crt, and other libraries to execute the macro code.(3) Subsequently, dwmapi.dlland desktop.iniare utilized to modify the registry settings in hku\\software\\policies\\microsoft\\control panel\\desktop, thereby hijacking desktop shortcuts.Additionally, (4) malicious payloads are downloaded to the %AppData% directory.( 5) Upon clicking the shortcut, the malware initiates execution, employing rpcrtremote.dllfor remote connection and communication.(6) Finally, registry entries such as hku\\software\\microsoft\\office\\word\\resiliency \\startupitems are modified to eliminate traces and clear the history information of unsuccessful Office opening attempts records.

Domain
The relations network between different APT groups and various types of general malware is shown in Figure 4. General malware exhibits fewer correlations with APT groups.The correlation between viruses and certain APT groups may be due to virus attacks being a common penetration method of APT groups; these groups have previously employed similar viruses.Since this paper primarily focuses on the correlation among APT groups, the subsequent experiments do not consider general malware samples.

Correlation Analysis
To assess the effectiveness of the correlation measurement method based on rough set theory, this study compiled a dataset consisting of 79 pairs of APT group relationships that were disclosed in public reports by reputable security firms.These relationships encompassed various aspects, including, but not limited to, homogeneity, cooperation, and affiliation with the same intelligence organization, among others.However, it is important to clarify that we do not determine the veracity of these relationships.There is potential for both malicious actors deliberately misleading attribution and for those investigating to intentionally manipulate the information.This paper does not express opinions about this.The paper's objective is to explore the key technologies used in APT group correlation analysis using publicly available data and demonstrate their effectiveness through presenting public findings.
Drawing upon the APT group correlation analysis concepts presented in previous relevant studies and various widely used inter-class distance measurement methods, we conducted three comparative experiments.
1. Average distance method: the Euclidean distance is used to assess the similarity between two samples.The distance between two groups is calculated as which represents the sum of the squares of the distances between different samples within each pair of groups, and this value is then averaged.The similarity between two groups is determined as 2. Center distance method: We utilize k-means to cluster the sample space into 100 clusters.We select the top three clusters in each group, apply the central theorem to determine the centers of the different clusters, and subsequently employ the average distance method to calculate the similarity between the groups.
3. Set similarity method: without applying rough sets, we calculate a similarity measure based on link prediction for the entire APT sample set.
It is important to emphasize that the dataset in this paper represents only a subset of the data within the realm of threat intelligence.As a result, this experiment evaluates precision as the primary performance metric.We set the threshold range for the correlation coefficient to be [0.1, 0.9], selecting the association scheme with the highest precision for each method.The results are shown in Table 4. Line graphs depicting the association precision and the number of relationships were plotted based on the threshold variations, as illustrated in Figure 5.
As can be seen from this figure, the method based on the center distance has the worst effect.This is because each APT group manifests multiple behavior patterns, and only considering the cluster center distance ignores many similar behaviors.The methods based on the average distance and the set similarity take into account every sample within the APT group dataset.However, the inclusion of unique or irrelevant samples can lead to increased variance, a lower organization similarity, and a higher error.The method based on rough sets only focuses on the fuzzy behavior patterns of APT groups, which results in a higher similarity between related groups and can ensure greater precision when identifying more relationships.This paper compares the HPI and HDI in Table 1, as well as the correlation results without using the link prediction method.The correlation between the number of recognized relationships and precision based on different similarity indices is shown in Figure 6.
As shown in Figure 6, since the HPI and HDI only consider the number of common neighbors and do not take into account the correlation coefficient between groups, they tend to enhance the correlation between weakly related groups, which can increase false positives.Compared with the method without introducing neighbor correlation coefficient, the new similarity coefficient proposed in this paper has higher precision when the number of identified correlated group pairs is the same, which proves that the number of false correlations is less.Moreover, as the similarity threshold decreases, the number of correlated group pairs that can be obtained increases.Compared to the method that does not introduce the neighbor correlation coefficient (the green line in Figure 6), the precision of the proposed method (the red line in Figure 6) decreases less, which further proves that this method can eliminate some false predictions.The APT group relation network generated from the behavior patterns of APT groups is shown in Figure 7.To achieve better clarity in the visualization, we utilize consistent colors to indicate APT groups belonging to the same country.It can be observed that the calculation results effectively quantify the degree of correlation between different APT groups.Related groups demonstrate a high intra-cluster similarity and a greater number of connections, while the inter-cluster similarity remains relatively low.
The experimental results are presented in Table 6.When comparing the correlation results based on single-dimensional and multi-dimensional features, we observe that the precision based on security event samples is 66.66%, the precision based on malware samples is 86.66%, and the fusion of features achieves a precision of 90.90%.This method successfully identifies 20 true positive group pairs, surpassing the sum of correlation results from the two individual dimensions.These findings indicate that multi-dimensional features effectively complement APT group characteristics from various perspectives.Upon careful examination of the experimental results presented in Table 6, it is evident that certain group pairs are classified as false positives due to the lag in threat intelligence.For instance, in our experiment, the group pair <APT33, MUDDYWATER> was identified during the [2016][2017][2018] period.However, during this timeframe, no security company reported any correlation between these groups.It was only in 2019 when a security blog disclosed that their infrastructures overlapped [42].Manual analysis by security experts often requires a significant amount of time, making it challenging to identify correlations immediately.Further investigation revealed that out of the nine false positive group pairs identified during [2016][2017][2018], six were subsequently disclosed to have potential correlations several years later, while only three pairs did not exhibit any apparent correlation.This demonstrates that our method can detect APT group correlations more promptly than expert analysis, even without an extensive knowledge base.

Case Study
To further illustrate the proposed method, we present a case study that exemplifies the process of APT group correlation and evolution analyses.

Correlation Analysis
Using APT29 and CosmicDuke as examples, we conducted a study of their attack processes and extracted a similar attack process to create a TTP attack graph, as depicted in Figure 8. Applying the method proposed in this paper to identify the common fuzzy domain between CosmicDuke and APT29, we constructed a TTP attack graph based on their feature sequences, as illustrated in Figure 9.
Figure 9 illustrates several key aspects of their attack behavior.First, the attackers utilize ntdll.dll to load shared modules and execute malicious payloads (T1129).They also employ software restriction policies to repeatedly execute malicious payloads for persistence (T1543.003).Furthermore, they employ masquerading techniques (T1036), sleep commands (T1497.003),and hidden files (T1564.001) to evade defenses.Finally, they gather information about system devices (T1120) and time (T1124).After comparing with Figure 8, we can conclude that Figure 9 is a subset of Figure 8.This demonstrates that the simplified attack behavior feature representation in this paper can still preserve the original behavior patterns and their correlations with other groups.

Temporal Evolution Analysis
The temporal evolution of the relation network reflects the potential changes in the behavior patterns of APT groups over time.Groups that consistently exhibit similar behavior patterns over an extended period are more closely related.Figure 10 illustrates the evolutionary graph of the APT group relation network, using a set of prominent APT groups as an example.
By analyzing the changes in the APT group relation network, several observations can be made.From 2008 to 2011, which represents the initial emergence and development of APT groups, there were isolated instances of APT-related malicious samples and attack incidents.However, no clear correlations were observed among the various groups during this period.Subsequently, from 2012 to 2015, specific APT groups began to form.There were exchanges and cooperation among APT groups sharing the same national background.Transitioning from the period to 2016 to 2018, the relationships between APT groups became more chaotic.Security companies conducted extensive analyses and disclosed numerous APT attacks, which resulted in the exposure of significant attack tools.The similarity of attack behaviors among APT groups sharply increased.Using coordinated attacks to launch cyber warfare has gradually become a trend in the game of great powers.Finally, from 2019 to 2022, the majority of APT groups had developed their own distinctive attack patterns and tendencies, and the relationships tended to become more stable.Take the Russian and Iranian APT groups as an example.Russian APT groups appeared earlier; they have been operating since at least 2008.Being a branch of APT29, CosmicDuke exhibits a strong correlation with APT29, while both of them have weaker correlations with APT28.Iranian APT groups emerged after 2014, and INSIKT GROUP stated that there was overlap between APT33, Charming Kitten, and MUDDYWATER before 2019 [42].In 2019, the NSA [43] reported that Russian APT groups had a substantial number of attack tools of Iranian origin.Concurrently, a significant volume of APT34's tool codes were leaked, and a considerable number of MuddyWater's tool codes were offered for sale [44].The Russian APT group Turla also targeted some of APT34's infrastructure and leveraged APT34's infrastructure for their own attack activities [6].This allowed the APT groups in these two countries to establish certain ties over a long period of time.

Conclusions
This paper presents a novel approach to define the behavior patterns of APT groups using rough set theory, thereby enabling the measurement and analysis of correlations between APT groups.By extracting the feature representation of attack objects and attack behaviors from threat intelligence, we construct a behavior pattern model for APT groups.Furthermore, we propose a similarity calculation method based on link prediction to quantify the correlations between different attack groups.The validity and precision of our method are verified by comparing the obtained correlations with those disclosed by security companies.Through this case study, we explore correlations from various perspectives and examine the temporal evolution of APT group correlations.In future research, we plan to address the following aspects: (1) Incorporating the malicious software behavior sequences as a component of the attack behavior features and exploring the contexts between different attack steps.This will capture the potential correlations that may be overlooked by discrete features.(2) Conducting an analysis of the semantic-level similarity of security events and aggregating similar events to provide a comprehensive understanding of the security landscape.(3) Designing a graph evolution algorithm to automate the analysis of the evolving APT group relation network.By addressing these future research directions, we aim to further enhance our understanding of APT group behaviors, correlations, and evolution, ultimately contributing to the advancement of cybersecurity knowledge practices.

Definition 4 .
. The equivalence relation is an indiscernible relation.An indiscernible relation can be defined as follows.Indiscernible relation.Given a subset of attribute set A ⊆ R, an indiscernible relation ind(A) on the universe U can be defined as follows, ind

Figure 2 .
Figure 2. Analysis diagram of node correlation.

Figure 3 .
Figure 3. Heat map of the conditional attribute set of attack behavior feature representation.

Figure 4 .
Figure 4.The relation network between different APT groups and various types of general malware.

Figure 5 .
Figure 5.The correlation between the number of recognized relationships and precision based on different correlation analysis methods.

Figure 6 .
Figure 6.The correlation between the number of recognized relationships and precision based on different similarity indices.

Figure 7 .
Figure 7. APT group relation network generated from behavior patterns of APT groups.

Figure 8 .
Figure 8. TTP attack graph generated from similar attack processes of CosmicDuke and APT29.

Figure 9 .
Figure 9. TTP attack graph generated from common fuzzy behavior feature sequences of CosmicDuke and APT29.

1 :Figure 10 .
Figure 10.Evolution graph of the APT group relation network.
{e 1 , e 2 , ..., e M } is the set of security event feature sequence objects, also called a universe.R o is a finite nonempty set of attack object attributes and the subsets C o and D are called the condition attribute set and the decision attribute set, respectively.C o = {time, target, target_class, location, method, tool, carrier, vulnerability}.
• carrier denotes the means utilized by attackers to propagate their attacks, including the vectors for malware delivery and vulnerability exploitation and so on.•vulnerability denotes the CVE number of the vulnerability exploited by the attacker.• time denotes the disclosure time of security events.In cases where a specific time is not available, the time stated in the technical report takes precedence.

Table 2 .
Setting of the malware attribution experiment for APT groups.

Table 3 .
Attribution and correlation results for different feature selection methods.

Table 4 .
Experimental results based on different correlation analysis methods.

Table 5 .
Data distribution at different time stages.

Table 6 .
Evaluation analysis results.