1. Introduction
Cyber Threat Intelligence (CTI) focuses on the collection and analysis of information concerning current and potential attacks that threaten the security of an organization or its assets. It is a relatively new approach for securing information systems that aims to improve the inefficiencies of traditional defence mechanisms by contributing to the provision of managed security services. As such, CTI is an essential component that offers both proactive and reactive security (also known as incident response and forensics) to protect systems against a plethora of attacks, even against initially undetected compromises, while shortening the window between compromise and detection if protection mechanisms fail [
1].
Despite the robustness of CTI security implementations, the proliferation and sophistication of new threats, such as malware and advanced persistent threats (APTs), challenges their effectiveness. For example, Symantec found that “zero-day” exploits are currently in circulation, most commonly on the dark web, for 300 days on average before identification [
2], while the mean time of malware instances being undetected on compromised systems is 101 days [
3]. Mandiant’s report [
3] also indicates that four out of ten organisations do not detect such compromises themselves, which can increase incident discovery and response complexity.
At the same time, the new era of data protection legislation, such as in the case of the European Union where the stricter law framework of the General Data Protection Directive (GDPR) has already been enforced, incident response and investigation time become even more critical as undetected compromises and the subsequent incident response may have an adverse financial effect on organisations. Moreover, the complexity, heterogeneity, correlation and sheer volume of data are some of the challenges digital forensics now face [
4]. To this extent, the emergence of Digital Forensic Readiness (DFR) is considered a promising approach towards addressing several of these challenges.
Driven by the need to further improve the effectiveness and applicability of DFR, this paper demonstrates that coupling DFR practices with CTI could successfully improve Key Performance Indicators (KPIs) related to the volume, correlation and complexity problems that traditional digital forensics approaches face. The contribution and focus of this paper are illustrated in
Figure 1. Considering an investigation scope and all digital evidence (S), the evidence related to a given case would be subset E, whereas subset T represents the evidence collected during the triage phase. In an ideal scenario, the triage set will contain information only related to the case, filtering out all the “noise”. However, in a real case scenario, following the execution of the triage process, the following outcomes are expected:
A subset of digital evidence items identified by triage (true positives or “hits”)
A subset of items not relevant to the case, but included in the triage subset (false positives)
A subset of items relevant to the case, but not identified by the triage (false negatives or “misses”).
Due to the nature and purpose of the triage, there is a bias and preference towards reducing false negatives ) at the expense of increasing the false positives. In other words, a forensic analyst would be in favour of influencing the triage criteria in order to include as much digital evidence as possible rather than risk missing them. In this paper, we argue that this trade-off can be improved by bringing correlation forward to the triage phase.
Building on our proposed threat intelligence informed DFR model [
5,
6], this paper evaluates its effectiveness by empirically evaluating it through scenarios based on real attack data. The results of the experiments are indicative of the contribution of an enhanced DFR model, and they indicate that it is capable of identifying and limiting the key causes of security incidents in an efficient period of time.
The rest of this paper is structured as follows:
Section 2 summarizes previous research about DFR.
Section 3 provides an overview of the research conducted by the authors and the model they proposed.
Section 4 describes the evaluation methodology, including the setup of the model in a real-world setting.
Section 5 presents, evaluates and discusses the results of the experiments. Finally,
Section 6 summarizes the benefits of this research and explores future research directions.
5. Results
As previously mentioned, all the datasets used during the experiments contained network data. Therefore, the first challenging task was to identify accurately the malware instances contained in the network data through a series of correlations that existed in the LIDB. The findings of the experiments were verified with the Malware Capture Facility Project since the project’s creators provided information about the malware-executable files inside that network traffic. Experiment outputs containing multiple possible outcomes were subject to ranking procedures. The ranking algorithm consulted the LIDB, evaluated their relationships and assigned them a score, highlighting the ones that produced each dataset’s contents. The algorithm consulted the LIDB, evaluated IoC relationships and ordered these relationships accordingly, thus pointing out the most likely IοCs included in the dataset.
The experiments revealed that in 176 of 205 experiments, the proposed model accurately identified the malicious files that were related to the corresponding dataset, considering these results as True Positives (TP). Thus, the algorithm initially reached a positive rate of 86.85% and a negative rate of 14.15% (
Table 1). In most cases, the results yielded an IoC subset of a relatively small cardinality. Having a limited number of possible IοC candidates to evaluate as malware is highly desirable because it assists digital forensic analysts to find the root cause of an incident easily.
Figure 7 shows that only a few experiments returned an increased number of malevolent alternatives; however, in most cases, the scoring algorithm identified the most relevant and prioritised it correctly.
The detection success rate can be considered adequate, as this was the outcome of a lightweight triaging algorithm; hence, the trade-off between computational cost and performance can be acceptable, as presented in the performance analysis below. However, failure to identify the malware instances deserved further investigation since a triage process should, by design, show preference to higher false positives than false negatives.
Acknowledging that time is an essential parameter for evaluating whether a forensic readiness scheme is efficient [
7], a performance analysis was conducted. Of all the experiments the authors ran, the most time-consuming one lasted for 251.2 s, which can be considered an acceptable timeframe given the number of records checked and the computing resources used. The average execution time of the experiments was 95.6 s. Having more entries in both LIDB and ALDB databases can slightly increase the investigation time. However, such an increase is trivial compared to traditional digital forensic approaches, which often require a great deal of time and involve processes with high computational costs to reach similar results [
41].
Figure 8 shows how the amount of data within the ALDB and the levels of their repeatability shaped the speed of the proposed model.
The remaining 29 malware instances that were not correctly identified by the TIM were further examined to establish the reason for the mismatches. The study revealed the following interesting cases:
TIPs did not report any malicious network IoCs in 10 of these 29 experiments.
The TIM did not return any results (no hash value) in 12 of these 29 experiments.
The TIM returned false results (different malware hash value) in seven of these 29 experiments.
The evaluation of the first case indicated that the algorithm behind the TIM did not correctly identify any files as malicious because the malware included in the dataset did not produce any network IoCs, so there was no potential connection with the IoCs in the LIDB. It is more likely that the malware had a different target, for example hosts, and not any network activity. To this extent, these results can be considered as True Negatives (TN).
In the second case, the TIM returned zero results because even though the malware in question did produce potential IoCs, these IoCs were not included and were missing from the ALDB. Thus, these results could be considered as False Negatives (FN).
In the remaining third case, the TIM managed to find correlations between the LIDB and the ALDB; however, it failed to identify the malware instances correctly. The reason for this is that it correctly correlated the produced malware’s IoCs to a different malware than the one actually included in the dataset originally, thus identifying a larger family of malware than the original dataset had predicted. This category can be characterized as False Positives (FP).
Despite the misidentification, the correlations among IoCs that exist in the LIDB can reveal additional substantial evidence, such as the presence of droppers, downloaders, backdoors or similar types of malicious activity.
Figure 9 presents how the proposed model finally managed to identify the root cause of an incident correctly, despite its initial inability, by employing host-based indicators (mutexes in this case) and their correlation. In this scenario, the proposed model initially tagged that “Malware 2” was the cause of the incident; however, a deeper analysis revealed that “Malware 1” and “Malware 2” belong to the same family. This observation is particularly interesting, as it highlights the importance of studying “near misses”.
As stated earlier, the results of the experiments were assigned a category flag (TP, TN, FP, FN), which can be further expressed with the confusion matrix presented in
Figure 10. Based on this information, the calculated accuracy, precision and recall, which are given by Equations (1)–(3), respectively, are as follows:
Another metric indicating the benefits of the proposed approach is the number of audit log entries that are forwarded to the IESS. Experiments showed that trials deriving true positive results (85.85% of all the experiments) decreased the number of the audit log records that were forwarded to the IESS by 90.44%, meaning that fewer than 10% would be considered interesting for further analysis should an incident occur. If false positives, false negatives and true negatives are included in the above calculation, meaning that none of their network data are filtered out, this increases the number of entries that may need additional analysis to 31.38%. Despite such an increase, the volume of data the proposed model managed to save remained high, indicating the usefulness it can provide to forensic analysts.
Finally, it is important to highlight that in data analytics, a system uses training data to define the ground truth [
42]. In our approach, however, the ground truth was already known as any collected hash value was considered malicious. Thus, the challenging task of the proposed was is to maximise its true positive results, meaning to identify correctly the presence of a malicious file from any IoCs that may not directly relate to it.
6. Discussion
A novel model leveraging CTI to improve the levels of operational DFR was presented in this paper. To achieve this, the model contained three independent modules, with two of them working proactively and the third used during a forensic triage process. The primary function of the model was the establishment of an inventory containing a limited subset of log records with a high-precision list of cyber-threats, with the aim to minimize the time and consequently the cost of potential digital forensic investigations.
The applicability and effectiveness of the aforementioned model was evaluated through a series of 205 experiments that employed well-known published datasets. In particular, the results indicated that in most cases, the model managed to identify accurately the malware instances that infected the target systems.
In terms of effectiveness and performance, the model managed to limit significantly the network data that may require further analysis by at least 70%. From a digital investigation perspective, this is considered substantial, as it can help digital forensic analysts to examine the contents of audit log files in considerably less time. For example, should we consider that an investigator has to identify critical information in a log file containing entries, then the time complexity for searching this file is Limiting the percentage of entries by , where yields a time complexity of signifies that analysis time decreases in proportion to unrelated data. Moreover, it is highlighted that the proactive analysis of the experiments lasted approximately 100 s on average, proving the speediness of the entire process and thus highlighting even more the benefits it can provide to post-mortem analysis.
It should be noted that the above time was achieved due to the architecture of the proposed model, which pre-processed the information it received. Populating both LIDB and ALDB with trustworthy information is potentially time-consuming; however, such a process can run in the background at predefined intervals without interrupting systems’ operations and, thus, cannot remarkably affect the overall efficiency of the proposed model.
Last but not least, the authors considered the cost implications of deployment and forensic investigation. In particular, the cost of a forensic analysis project containing several activities can be calculated by Equation (4) [
43], where
n is the number of activities,
t the time to complete the
ith activity,
v the volume of the
ith activity and
c the capacity cost of the
ith activity. The cost of the capacity refers to the cost of the resources used to perform the activities, such as salaries of employees, equipment and technology costs, rental of office space and any other costs incurred [
44].
Given that the proposed approach did not significantly affect the capacity cost, as it did not require high-end technological resources for the average organisation, it became clear that the total activity cost was proportional to the time and volume of incident investigation, and thus decreased.
In conclusion, the proposed CTI-informed DFR model presented and evaluated in this paper can significantly enhance the DFR state of organizations, as it managed to improve effectively the following KPIs:
Decrease the volume of information an analyst needs to examine
Minimise the time of a forensic investigation
Limit the cost of forensic analysis
Determine the root cause of an incident in a timely manner and with high precision
Identify relevant threats that may have affected the security posture of an organization.