4.1. Experimental Setup and Data Acquisition
This section provides the details of the experimental setup and data collection process. The benign and malware applications are executed on an Intel Xeon X5550 machine (4 HPC registers available) running Ubuntu 14.04 with Linux 4.4 Kernel and HPC features are captured using
Perf tool available under Linux at a sampling time of 10 ms.
Perf provides rich generalized abstractions over hardware-specific capabilities. HPC-based profilers are currently built into almost every popular operating system.
Linux Perf is a new implementation of performance counter support for Linux which is based on the Linux kernel subsystem
perf-event and provides users a set of commands to analyze performance and trace data. It exploits
perf-event-open function call in the background which can measure multiple events simultaneously. In our experiments, we executed more than 3500 benign and malware applications for data collection. Benign applications include real-world applications comprising MiBench [
20] and SPEC2006 [
62], Linux system programs, browsers, and text editors. Malware applications collected from virustotal and virusshare online repositories include Linux ELFs and scripts created to perform malicious activities and include 850 Backdoor, 640 Rootkit, and 1460 Trojan samples. The functionality of Backdoor applications is trying to provide remote access to the remote user (attacker) and facilitates information leakage; Rootkits provide the attackers with privilege access to modify the registers and authorized programs; and Trojans perform phishing of confidential information in the system.
In our experiments, the HPC information is collected by running applications in an isolated environment referred to as Linux Containers (LXC) [
63]. LXC is chosen over other commonly available virtual platforms such as VMWare or VirtualBox since it provides access to actual performance counters data instead of emulating HPCs. To effectively address the non-determinism and overcounting issues of HPC registers in hardware-based security analysis discussed in recent works [
43,
64], we have extracted various hardware events available under
Perf tool using static performance monitoring approach [
34] where we can profile applications several times measuring different events each time. Furthermore, to ensure that running malware inside the Linux container does not contaminate the system’s environment and also no contamination occurs in collected data due to the previous run of the program, the container is destroyed after each run.
4.3. Stealthy Malware Threat Models
The proposed intelligent hardware-assisted malware detection approach in this work is focused on the identification of a type of stealthy malware, referred to as an embedded malware attack which is a potential threat in today’s computing systems that can hide itself within the running benign application on the system.
For modeling the embedded malware threats, we have considered persistent malicious attacks which occur once in the benign application with a notable amount of duration attempting to infect the system. For the purpose of thorough analysis, we deployed various malware types for embedding the malicious code inside the benign application including Backdoor, Rootkit, Trojan, and Hybrid (Blended) attacks. For per-class embedded malware analysis, traces from one category of malware, are randomly embedded inside the benign applications and the proposed detection approach attempts to detect the malicious pattern.
Furthermore, the Hybrid threat combines the behavior of all classes of malware and hides them in the normal program. Persistent malicious codes are primarily a subset of Advanced Persistent Threat (APT) which is comprised of stealthy and continuous computer hacking processes, mostly crafted to perform specific malfunction activities. The purpose of persistent attacks is to place custom malicious code in the benign application and remain undetected for the longest possible period. Persistent malware signifies sophisticated techniques using malware to persistently exploit vulnerabilities in the systems usually targeting either private organizations, states, or both for business or political motives. The hybrid malware in our work represents a more harmful type of persistent threat in which the malicious samples are chosen from different classes of malware to achieve a more powerful attack functionality seeking to exploit more than one system vulnerability.
To create an embedded malware time series and model the real-world applications scenario, with capturing interval of 10 ms for HPC features monitoring, we consider 5 s. infected running application (benign application infected by embedded malware). For this study, 10,000 test experiments were conducted in which malware appeared at a random time during the run of a benign program. In our experiments, three different sets of data including training, validation, and testing sets are created for comprehensive evaluation of the StealthMiner approach. Each dataset contains 10,000 complete benign HPC time series and 10,000 embedded malware HPC time series. As the attacker can deploy unseen malware programs to attack the system, we create these three datasets with three groups of recorded malware HPC time series consisting of 33.3% for training, 33.3% for validation, and the remaining of whole recorded data for testing evaluation.
4.4. Overview of StealthMiner
As discussed, prior works on HMD mainly assumed that the malware is executed as a separate thread when infecting the computer system. This essentially means that the HPCs data captured at run-time inserted to the classifier belongs only to the malware program. In real-world applications, however, the malware can be embedded inside a benign application, rather than spawning as a separate thread, producing a more harmful attack. Therefore, the HPCs data captured at run-time in each interval could belong to both malware and benign application. As we will show in this work, this HPC data pollution could result in performance degradation of traditional ML classifiers. In response to this challenge, we propose StealthMiner malware detection framework which is based on a lightweight Fully Convolutional Neural Network (FCN)-based time-series classification. Primarily, the proposed FCN-based approach attempts to automatically identify potentially contaminated intervals in HPC-based time series at run-time and utilize them to distinguish the embedded malware from benign applications.
The overview of
StealthMiner and its comparison with prior works is described in
Figure 4. The network is a simplified version of neural network models inspired from previous general convolutional neural network-based time series classification models [
55,
56]. As shown in
Figure 4a, our proposed solution in this work is based on the least number of HPC features and targets detecting stealthy attacks that have been ignored in prior studies on hardware-based malware detection. Furthermore, as seen in
Figure 4b, the proposed FCN-based malware detector is created by stacking two 1-D convolution layers with 16 and 2 kernels, respectively. The size of the kernel in these two convolution layers is 2 and 3, respectively. These convolution layers aim at selecting the subsequence of the HPC time series for identifying the malware. Next, a global average pooling layer is applied to convert the output of the convolution layer into low dimension features. These features are then fed into a fully connected neural network to distinguish the embedded malware from benign applications.
Concretely, given a time series of HPC features of
, where
N is the length of the time series in the first 1-D convolution layer, an output of
kth kernel is computed by:
where 2-d vector
is the weight of
kth kernel and
is a
matrix that describes all weights of first layer. Given
, a batch normalization function,
, and a ReLu activation function,
, are then applied.
is a function which normalizes mean and variance of the
to 0 and 1, respectively. Given an input vector
x,
can be written as below:
where
and
is the mean and variance of vector across
kth kernel. ReLu activation function is a nonlinear activation function that sets any negative value in input
to 0:
The first layer output
is a
N dimension feature map generated from the
kth kernel. We denote
as the output of the convolution layer. Intuitively, convolution layer converts original time series of length
N into 16 different
N dimensional feature maps capturing different potential local features that can be used to classify the input data [
56].
The
is then fed into next convolution layer with total number of kernels equal to 2. This layer summarizes
into two different feature maps which can be computed via:
where the weight of all kernels is a 3-d tensor
of size
. For each
,
and
functions are further applied and four feature maps (denoted as
) are generated. Intuitively, stacking two convolution layers can increase the accuracy of the framework and the ability of the model to detect complicated features which are not possible to be captured by a single convolution layer [
56]. Note that any positive value inside the
indicates the potential HPC intervals can be used to determine whether the input HPC time series contains embedded malware. Next, we conduct a global average pooling step to convert feature map
into low dimension features. In particular, given a feature map of
, we deploy the average value of all elements inside
as the low dimension feature. As a result, this step converts
into a 2-d vector (denoted as
).
Finally,
is fed into a fully connected neural network with softmax activation function formulated below where a standard neural network layer is designed for our target classification task in detecting embedded malware:
where
is the softmax activation function. It can be written as follows:
The Equation (
3) first converts
into a new 2-d real value vector via linear transformation
, where
W is a
matrix and
is a
vector. Next, all elements in the vector is mapped to [0,1] via
function. The final output is a 2-d vector
which describes the possibility that the time series is benign or infected by malware (See
Figure 5).
Suppose that we denote all the weights and the output of network as
and
, respectively. Given a training dataset
and the network weights
, we update
by minimizing the binary cross-entropy loss which can be computed by
where
and
is the HPC time series and the associated ground true label of the
ith record in
. And
indicates whether the time series is benign or contains malware. Equation (
7) can be minimized via a standard backpropagation algorithm, a widely used model for training various types of neural networks [
55,
56]. It primarily updates weights in the neural network by propagating the loss function value from the output to the input layer and iteratively minimizes the loss function for each layer via the gradient descent method. In this work, for each layer, the weights are optimized via Adam optimizer [
65], a stochastic gradient descent method used to efficiently update weights of neural network.
To demonstrate the functionality of the StealthMiner approach in identifying the malware embedded inside the benign program, two illustrative case studies are presented
Figure 5. As shown in
Figure 5a, an HPC-based time series is an input to the classifier which contains an embedded rootkit malware (the embedded malware is highlighted in red). To identify the hidden malicious pattern,
StealthMiner generates two feature maps
,
via the proposed fully convolution neural network. The
and
are then categorized as a 2-d feature vector
by calculating the simple average of all the values in the feature map. In the given example,
is equal to [0.26, 0.32]. This 2-d feature is then fed into a fully connected neural network layer and the proposed detector analyzes the input HPC time series and attempts to find that whether the input trace contains an embedded malware or not in which in this case it successfully identifies the embedded malware with a significantly high probability (0.999). Similarly, when a benign HPC trace is fed into StealthMiner (as shown in
Figure 5b), following the same process as the first example, the time series is converted into the 2-d feature vector (
). Then, the 2-d vector is fed into the fully connected neural network layer and the network successfully identifies that it is a benign trace with a probability of 0.73.
StealthMiner Implementation and Overhead: We implemented the proposed embedded malware detection framework via Pytorch deep learning library. For evaluating
StealthMiner framework using performance metrics such as accuracy and F-measure (described in
Section 5, the proposed detector determines whether the input time series contains embedded malware by computing the
. For measuring the Area Under the Curve (AUC), we directly use the
computed via Equation (
3).
Different from existing neural network time series classification models proposed in prior works, the
StealthMiner framework has a small total number of kernels and layers which dramatically reduces the number of parameters and the cost of detecting malware in the new HPC time series. For instance, in the latest neural network introduced by [
55], to classify a time series the proposed solution needs more than 100,000 parameters. Hence, applying such heavyweight classification models to our embedded malware detection problem would significantly increase the overhead and complexity of our design, which certainly makes the solution impractical. In contrast, the
StealthMiner framework only contains 200 parameters. Having a small number of parameters enhances the efficiency of the proposed ML-based malware detection solution highlighting the effectiveness and applicability of our proposed neural network-based approach to efficiently identify the embedded malware.