Malware Variants Detection Model Based on MFF–HDBA

Wang, Shuo; Wang, Jian; Song, Yafei; Li, Sicong; Huang, Wei

doi:10.3390/app12199593

Open AccessArticle

Malware Variants Detection Model Based on MFF–HDBA

by

Shuo Wang

,

Jian Wang

,

Yafei Song

^*

,

Sicong Li

and

Wei Huang

Air Defense and Antimissile School, Air Force Engineering University, Xi’an 710051, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9593; https://doi.org/10.3390/app12199593

Submission received: 14 September 2022 / Revised: 20 September 2022 / Accepted: 21 September 2022 / Published: 24 September 2022

(This article belongs to the Special Issue AI for Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

A massive proliferation of malware variants has posed serious and evolving threats to cybersecurity. Developing intelligent methods to cope with the situation is highly necessary due to the inefficiency of traditional methods. In this paper, a highly efficient, intelligent vision-based malware variants detection method was proposed. Firstly, a bilinear interpolation algorithm was utilized for malware image normalization, and data augmentation was used to resolve the issue of imbalanced malware data sets. Moreover, the paper improved the convolutional neural network (CNN) model by combining multi-scale feature fusion (MFF) and channel attention mechanism for more discriminative and robust feature extraction. Finally, we proposed a hyperparameter optimization algorithm based on the bat algorithm, referred to as HDBA, in order to overcome the disadvantage of the traditional hyperparameter optimization method based on manual adjustment. Experimental results indicated that our model can effectively and efficiently identify malware variants from real and daily networks, with better performance than state-of-the-art solutions.

Keywords:

malware variant detection; convolutional neural network; multi-scale feature fusion; channel attention mechanism; discrete bat algorithm

1. Introduction

Malware is a general term for software programs that are designed by people with malicious intentions to launch security attacks. Malware includes viruses, worms, trojans, spyware, bots, and so on. According to the ninth weekly report released by the National Internet Emergency Response Center (CNCERT) [1], in 2022 the number of hosts infected with network viruses in the territory was more than 1,550,000 from just 21 February to 27 February; the number of malicious computer programs was as high as 72,119,000. A massive proliferation of malware not only poses a serious threat to internet users, but also has an enormous impact on the security of national networks.

Malware detection approaches are divided into dynamic analysis techniques and static analysis techniques, according to the file execution status. Dynamic analysis [2,3,4] is required to actually run executables in sandboxes, simulators, and for virtual machines monitoring. In the process of file execution, application behavior is monitored and analyzed through system calls. This method consumes time and resources, due to the dynamic execution of malware files. In contrast, static analysis [5,6] disassembles malware code instead of executing it. Traditional static analysis techniques adopt the idea of template matching, based on feature codes [7,8]. That requires researchers with expert knowledge to manually extract the static features of malware, such as flow graphs, byte sequence n-grams, and opcodes [9], and then to compare the extraction features, one by one, with known static features in the database. Hence, traditional methods are inefficient and have difficulty detecting various malware. Moreover, driven by economic benefits, automated malware development toolkits [10] were rapidly evolved by malware authors, such as packing, polymorphism, and instruction virtualization, in order to evade detection. Due to the wide availability of the toolkits, a large number of malware variants proliferate, which poses a significant challenge for traditional malware detection methods.

In order to solve the dilemma faced by traditional detection methods, malware vision-based analysis technique was born [11]. Vision-based malware analysis has been proven to be effective and efficient in malware variant detection, and is 4000 times faster compared to dynamic detection techniques [12]. The method first maps malicious codes to grayscale images, and extracts the texture features of malicious code images on the basis of similarities to image texture features in the same malware family, along with differences in image texture features with different malware families. Then, the detected malware samples are classified on the basis of those features. Vision-based malware analysis does not depend on expert knowledge and static decompilation. Furthermore, even if the malware authors reuse the same malware segment in order to generate new malware variants, this is easily observed when we visualize the malware binary files.

With the revolution of machine learning and deep learning techniques, computer vision is rapidly evolving, which enhances the development of malware detection based on visualization [13,14,15]. In 2011, Nataraj et al. [16] first proposed a malware detection method based on visualization. The researchers mapped malicious binary portable executable (PE) files to grayscale images, and utilized the GIST descriptor for feature extraction; meanwhile, a K-nearest neighbor (KNN) classifier was trained for malware variants classification. The researchers achieved a high classification accuracy of 97.18% on their own data set. Kabanga et al. [17] designed a simple convolutional neural network (CNN) structure for recognizing malware samples, which is a sequential structure consisting of three convolutional layers and two fully connected layers. Experimental results showed that their method achieved good performance in malware classification. However, numerous malware variants are constantly generated and updated via powerful malware development technologies, imposing increasing challenges to the feature extraction capabilities of malware detection approaches. Moreover, a common phenomenon on exiting malware data sets is that the samples in different categories are unbalanced, which may reduce the classification accuracy of models that are based on deep learning technology. Furthermore, it is essential for deep learning to optimize hyperparameters. However, the hyperparameter settings in recent related studies relied on human experience and lacked a theoretical basis and interpretability; consequently, the model became easily trapped in local optima.

In order to solve these problems, this article proposed an effective and intelligent vision-based malware classification framework, referred to as MFF–HDBA; this framework can realize automatic feature extraction and classification without domain expert knowledge, and can also effectively and efficiently identify malware variants from real and daily networks.

The main contributions of our research can be summarized as follows:

(1): A malware preprocessing method was implemented, based on bilinear interpolation algorithm and data augmentation. In preprocessing, malware after visualization is normalized via bilinear interpolation algorithm, and data augmentation technique was utilized to solve the issue of malware data set imbalance.
(2): A novel malware detection method was developed by combining multi-scale feature fusion and channel attention mechanism for more abundant texture information capture and robust feature extraction.
(3): A hyperparameter optimization algorithm was proposed on the basis of the discrete bat algorithm, referred to as HDBA. HDBA was utilized to overcome the disadvantage of the traditional hyperparameter optimization method that lacks a theoretical basis and easily falls into local optima.
(4): The proposed method was evaluated through extensive experimentation. The results indicated that the proposed method not only achieves high classification accuracy, but also a low MTTM (Mean Time to Detect) overhead, which are better than state-of-the-art solutions.

The remainder of this paper is organized in the following manner. Section 2 reviews the related works of vision-based malware classification. Section 3 introduces the details of the proposed method, including multi-scale feature fusion, channel attention mechanism, and HDBA. Section 4 presents two malware data sets and statistical measures. Experiments are implemented to evaluate the proposed method in Section 5. Section 6 ends the paper with concluding remarks.

2. Related Research

Due to the progress of artificial intelligence technology, a growing number of researchers are applying it, from machine learning and deep learning, to vision-based malware detection methods. These studies focused on improving recognition accuracy and reducing identification time. This section introduces related research on vision-based malware detection methods. Table 1 summarizes recent related studies.

2.1. Vision-Based Malware Recognition Using Machine Learning

Malware detection methods based on machine learning contain two main parts: feature extraction and classification. Texture feature extraction algorithms are used to implement feature extraction, such as histogram of oriented gradient (HOG), PCA, scale-invariant feature transform (SIFT), and local binary pattern (LBP). Moreover, machine learning classifiers are used for classification, such as KNN, support vector machine (SVM), and naive Bayes. Liu et al. [18] designed a novel feature descriptor which was more discriminating and robust. The descriptor was composed of a multi-layer LBP and dense SIFT descriptor for extracting more effective features of malware variants. Compared with the performance of state-of-the-art classification, their descriptors were outstanding. Naeem et al. [19] developed a novel feature descriptor named LGMP which could extract more abundant features. They extracted the local features of malware images using D-SIFT (dense scale-invariant feature transform) descriptor and the global features of the malware images via GIST descriptor. The LGMP feature vectors were generated by fusing the local feature vectors and the global feature vectors. They used LGMP descriptors for feature extraction and the KNN classifier for malware identification, achieving lower response times and better performance for malware classification. Nataraj et al. [20] presented a novel malware classification method named SPAM (signal processing for analyzing malware). SPAM is a type of fusion model that combines vision-based and signal-based features. The method achieved good performance in both response time and classification accuracy. Roseline et al. [21] provided a robust, vision-based anti-malware solution against similar characteristics in malware variants. They made use of layer ensemble technology in deep random forest to improve the classification accuracy, and ultimately achieved high efficiency with low complexity.

Above vision-based malware detection methods using machine learning is feature extraction, which is separate from classification; it results in low efficiency when faced with a large number of malware variants. Moreover, the extraction of malware image texture features relies on feature engineering, which consumes significant computational resources.

2.2. Vision-Based Malware Recognition Using Deep Learning

Compared with machine learning, deep learning is an end-to-end framework which can implement automatic feature extraction and classification, thus avoiding the influence of man-made factors. Many researchers combined deep learning techniques and vison-based malware detection and achieved good results. Yue [22] proposed a weighted loss function to optimize CNN for malware classification. The method controlled the weighting loss by introducing a parameter. Compared with the model without weighted loss function, their method obtained better results. Catal et al. [23,29] reviewed many malware detection approaches based on deep learning for a systematic analysis. In addition, they also developed a novel malware detection model by combining graph attention networks (GAN) and Node2Vec in order to improve the performance of malware detection methods, and achieved better classification accuracy. Gibert et al. [24] analyzed the characteristics and shortcomings of manual feature extraction, and designed a deep neural network structure for extracting texture features of malware images. The method achieved good classification performance on several malware data sets, with good generalization ability. However, the structure of the network was shallow, and it was difficult to capture deep feature information of malware. Thus, it was less effective in identifying challenge malware variants that were encrypted by packing or obfuscation. Venkatraman et al. [25] proposed a hybrid deep learning framework for malware detection. They used CNN with cost sensitivity to extract texture features of malware images, and bidirectional gated recurrent unit (BiGRU) to capture byte sequence information, which increased feature diversity and greatly improved detection accuracy.

Some literature [30,31,32] noted that attention mechanism enhances the feature extraction ability of deep neural networks. Thus, many scholars tried to utilize attention mechanism to improve malware detection. Wang et al. [33] developed a depthwise efficient attention module (DEAM), which was an improvement of convolutional block attention module (CBAM), in order to enhance feature representation and improve the model’s performance. They combined DEAM and Densenet [34] to identify malware samples, and obtained a higher classification accuracy. However, the model structure was complex while the time cost and computational resource consumption were high. Li et al. [35] implemented self-attention operation on the malware images before they were input to the first convolution layer. In addition, they applied spatial pyramid pooling layer before the full connected layer to adapt to different sizes of malware input. Experimental results indicated that a combination of self-attention and spatial pyramid pooling can effectively improve malware detection accuracy.

Additionally, many researchers found that imbalanced data sets negatively impact the performance of malware detection, and searched for solutions. Cui et al. [26,36] used swarm intelligence algorithm to deal with the issue of imbalanced data sets. They regarded the CNN accuracy as objective function, and utilized swarm intelligence algorithm to dynamically resample the batch size of input malware samples. After ensuring the best batch size of input, they trained the CNN using the best batch size. Compared with before batch size resampling, the method achieved a significant improvement in performance. Hemalatha et al. [37] designed a customized loss function to counteract malware data set imbalance. They applied the reweighted class-balanced loss function in the final layer of Densenet to improve classification performance. Experiments showed that their method can solve the issue of imbalanced data sets effectively to obtain high classification accuracy.

Recently, transfer learning technique [38,39] has been applied for malware classification to reduce the misclassification rate. It can apply the knowledge learned from the image classification task to the problem of malware classification. Walid et al. [40] used transfer learning and fine-tuning techniques to design an efficient vision-based malware classification model. They pre-trained the VGG16 model on large-scale data sets, and fine-tuned the model on the basis of the malware data set to increase classification accuracy. Alazab et al. [27] proposed a novel malware detection method named IMCEC. They visualized malicious code files as malware images, and utilized transfer learning technology to migrate various advanced CNN architectures which were pre-trained by the ImageNet data set [41] to these malware images. These different CNN architectures could capture different semantic features of malware images. Thus, the researchers used an ensemble of these CNN architectures, by ensemble learning technology, to improve feature extraction capabilities. The methods achieved a high classification accuracy, but required relatively long prediction times.

The aforementioned solutions focused on improving the efficiency and effectiveness of malware detection by designing or developing a structure of deep neural networks customized to the malware detect task. Many other researchers contemplated improving the malware visualization method to obtain more abundant texture information of the original malware images. Danish et al. [28] believed that visualizing malware PE files into color images could increase the diversity of malware features. They initially mapped malware PE files into grayscale images, and then used color maps to transfer grayscale images into color images. Experiments indicated that their visualization approach achieved great results. Li et al. [35] proposed a novel malware visualization method which transferred malware to RGB images. They implemented binary malware visualization, assemble code visualization, and developer feature visualization, and then combined them together to generate RGB images. The malware RGB images contained more abundant semantic information and spatial characteristics of the malware. However, their visualization method required disassembly of the malware with IDA Pro, and extracting the opcode information for visualizing into G-images while stringing information into B-images. The essential step involved corresponding the visualized R-images, G-images, and B-images according to the relative virtual address (RVA). This series of operations is expensive, and increases the complexity of malware visualization.

3. Methodology

The proposed malware classification framework mainly consists of two main parts: data preprocessing, and feature extraction and classification. The overall structure of our framework is illustrated in Figure 1.

This section is divided by subheadings, and provides a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

3.1. Data Preprocessing

3.1.1. Malware Visualization

Malware visualization is the process of converting malicious code binary files into grayscale images. First, the malicious binary files are read as a vector of 8-bit unsigned integers. Then, they are converted to decimal shaping in units of eight binary digits (in the range 0–255). These vectors are organized into a 2D array, where each byte represents a pixel. These arrays are mapped into grayscale images. In addition, the widths of grayscale images vary from the file sizes. Partial samples after visualization from different malware families are shown in Figure 2.

3.1.2. Bilinear Interpolation Algorithm for Size Normalization

In a classical convolutional neural network, the number of neurons input to the fully connected layer must be fixed, since the size of the weight matrix of the fully connected layer is fixed. This means that the size of the feature map must be consistent after convolution and pooling operations. Thus, malware image sizes which are input to CNN must be the same. However, the input malware image sizes are different from each other because different malicious code binaries files vary in size. Therefore, it is necessary to normalize the visualized malware images to the same size.

In order to keep the original texture features of the normalized malware images unchanged as much as possible, this paper utilized a bilinear interpolation algorithm to normalize the image size. The algorithm selects four pixel points near the interpolation point of the malware image, and then twice implements linear interpolation operations in the

X

direction first. Then, it implements linear interpolation calculations in the

Y

direction to obtain the pixel of the interpolation point.

f (x, y_{1}) = \frac{x_{2} - x}{x_{2} - x_{1}} f (x_{1}, y_{1}) + \frac{x - x_{1}}{x_{2} - x_{1}} f (x_{2}, y_{1})

(1)

f (x, y_{2}) = \frac{x_{2} - x}{x_{2} - x_{1}} f (x_{1}, y_{2}) + \frac{x - x_{1}}{x_{2} - x_{1}} f (x_{2}, y_{2})

(2)

f (x, y) = \frac{y_{2} - y}{y_{2} - y_{1}} f (x, y_{1}) + \frac{y - y_{1}}{y_{2} - y_{1}} f (x_{2}, y_{2})

(3)

where

f (x, y)

is the pixel value of the interpolation point in the malware image;

(x_{i}, y_{j}) (i, j = 1, 2)

are the four pixels near the interpolation point in the malware image. Figure 3 shows the normalized malware image of a sample in the C2LOP.P family, which has an original size of 370 × 256; its image size changes to 32 × 32, 64 × 64, 128 × 128, and 256 × 256 after normalization. Note that the basic texture features in the malware image are preserved after size normalization by bilinear interpolation algorithm.

3.1.3. Data Augmentation

Deep learning models depend on large amounts of data to mine the relationships among data. Therefore, the performance of the deep learning model is closely related to the quality of the data set. Data sets which are large and contain balanced samples can improve the classification accuracy of the model and avoid overfitting to a certain extent. However, there are insufficient samples and unbalanced categories in many malware data sets. Data augmentation can be applied to resolve this problem by increasing the sample size of a few categories, which effectively avoids overfitting and improves the robustness of the model. Common data augmentation is implemented by transforming original images to generate new images, such as through rescaling, flipping, and so on. The settings of data augmentation used in the experiments are provided in Table 2. Figure 4 shows some malware images after data augmentation techniques.

3.2. Feature Extraction and Classification

CNN is an end-to-end framework, which is able to realize automatic feature extraction and classification. Currently, CNN models have been widely used in malware detection and classification tasks. However, these methods utilize a single-scale convolution kernel in each layer for feature extraction of malware images. Features extracted via single-scale convolution kernels lack of diversity and robustness. Furthermore, the classification performance of models critically depends on feature extraction. Thus, the above methods have a high misclassification rate.

In order to solve the above problems, we designed a novel feature extraction and classification structure based on CNN. The core design idea of the structure is to increase feature diversity and enhance the feature extraction capability of the model. It enables our model to use a small number of neural network layers to obtain the feature extraction effect of a deeper neural network, which can decrease the model parameters. In addition, shallow neural network layers have fewer neural network parameters and floating point operations (FLOPs), which can improve the speed of model operations. This means that our model not only enhances feature extraction ability, but also performs malware classification rapidly. The proposed structure mainly consists of CBR layers, max-pooling layers, multi-scale feature extraction blocks, a dropout layer, a flatten layer, and a fully connected layer. The CBR layers are the basic units of the model, which include a convolution layer, batch normalization (BN) layer, and rectified linear unit (ReLU) activation function. The CBR layer is an improvement of the traditional convolutional layer, and can accelerate convergence of the model. The process of CBR layer operation is as follows: the input features first enter the convolution layer for convolution operation, then feature maps enter the BN layer for batch normalization, and finally the ReLU function is used to activate feature maps to obtain a nonlinear feature output.

Multi-scale feature extraction blocks form the core blocks of the model, which includes multi-scale feature fusion and channel attention mechanism, as shown in Figure 5. The core idea of multi-scale feature fusion is to simultaneously extract multi-scale features of input at multi-scale convolutional kernel sizes in order to increase feature diversity. These features are fused to generate a general feature that takes into account both local and global information. In the process of feature extraction, the feature map is obtained by combining the features extracted from each channel; however, not every channel can extract features effectively. Channel attention mechanism can calculate the weights of each channel based on the feature extraction effect of each channel and assign relatively large weights to the channels with good feature extraction effects, while relatively small weights are assigned to the channels with poor feature extraction effects. That enables channel attention mechanism to focus on the main information of the feature maps, which can enhance the core feature representation of malware, and improve the accuracy of malware detection and classification.

In the multi-scale feature fusion part, firstly, the input feature

F \in ℝ^{C \times H \times W}

is operated by four branches

I

,

I I

,

I I I

,

I V

simultaneously. In order to make the extracted features diverse and representative, convolutional kernels with different RFs are used in each branch for feature extraction. Each branch will receive the corresponding branch output features

F_{1} \in ℝ^{C_{1} \times H' \times W'}

,

F_{2} \in ℝ^{C_{2} \times H' \times W'}

,

F_{3} \in ℝ^{C_{3} \times H' \times W'}

,

F_{4} \in ℝ^{C_{4} \times H' \times W'}

. Then, the obtained output features from each branch

F_{1}

,

F_{2}

,

F_{3}

,

F_{4}

, are fused to generate a total feature that contains both local and global information. The fused features are output features

F' \in ℝ^{C' \times H' \times W'}

, and the output channel is

C' = C_{1} + C_{2} + C_{3} + C_{4}

. Channel attention mechanism is divided into two parts: squeeze and excitation. The compression operation

f_{s}

is a global pooling on

F'

to obtain

Z \in ℝ^{1 \times 1 \times C'}

, which is given by the following:

Z = f_{s} (F') = \frac{1}{H' + W'} \sum_{i = 1}^{H'} \sum_{j = 1}^{W'} F' (i, j)

(4)

Then, the activation operation

f_{e}

is performed on the compressed

Z

obtained in order to obtain the weight value of the channel

w

:

w = f_{e} (Z, W')

(5)

where

δ

is the ReLU activation function,

w \in ℝ^{1 \times 1 \times C'}

,

r

is a hyperparameter in the transformation, which is generally taken as

r = 16

. Finally, the channel weight

w

of the calculated channel is operated with the input feature

F'

in

f_{s c a l e}

, in order to realize the weight value of the channel assigned to each channel of the input feature; the weighted output

F_{m}

of the channel is obtained with the following equation:

F_{m} = f_{s c a l e} (F', w)

(6)

After the malware images pass through the multi-scale feature extraction blocks, a composite feature map including local and global information is generated.

In addition, we set a dropout layer [42,43] at the end of multi-scale feature extraction blocks to prevent the model from overfitting. Moreover, the flatten layer is used as the transition from convolutional layers to fully connected layers. It is set after the dropout layer in the structure of our model. The fully connected layers with activation function softmax are used for malware automatic classification.

3.3. Hyperparameter Optimization Based on Discrete Bat Algorithm

Setting hyperparameters can directly affect the performance of the model. Consequently, the selection and setting of hyperparameters is a difficult and hot topic for research in deep learning models. There are many important hyperparameters in CNN, such as the number of network layers, the number of neurons per layer, and the learning rate. The more layers there are of the neural network and the number of neurons per layer, the more that the feature extraction capability of the network can be enhanced, and the more this facilitates solving complex problems. However, when the number of neural network layers and the number of neurons per layer become too numerous, it leads to the CNN model, resulting in weakened generalization ability. The learning rate is an important hyperparameter that affects convergence of the model. If the learning rate is too large, the faster the parameters are updated in back propagation, and the loss function becomes prone to oscillation, making convergence difficult. If the learning rate is too small, the parameter update in back propagation becomes slower. In addition, it makes the loss function converge slowly, causing overfitting.

At present, there is no theorem for the hyperparameter settings, or a generally applicable consensus setting method. In practice, the hyperparameter settings rely on human experience, resulting in continuous adjustments from referring to settings in some typical models or training effects of the model. The hyperparameter adjustment method that relies on human experience lacks theoretical basis and interpretability. In addition, the method can only adjust each hyperparameter one by one, and it is difficult to take into account different hyperparameters. If multiple hyperparameters are adjusted at the same time, it is difficult to determine exactly which hyperparameter adjustment is affecting the performance of the model. Furthermore, sometimes a single adjustment of the hyperparameters may be difficult to improve the performance of model, leading it to become trapped in a local optimum. The swarm intelligence algorithm is an effective measure to solve complex optimization problems.

In order to solve the issue of hyperparameter optimization, this paper proposed a hyperparameter optimization algorithm based on the discrete bat algorithm, referred to as HDBA. The bat algorithm is a swarm intelligence algorithm inspired by the behavior of bats, which use echoes to detect prey and avoid obstacles. The standard bat algorithm is adapted to solve the optimization problem for continuous values, while the hyperparameters are discrete values. Therefore, we discretize the population’s position and velocity updates to overcome the limitations of the standard bat algorithm. We utilize the accuracy of the model as the objective function, and the position of the bat is the set of the number of multi-scale feature extraction blocks and the number of convolutional kernels in each convolution. The optimization search process ends when the epoch reaches the max iteration. The process of parameter optimization is shown in Algorithm 1.

Algorithm 1: HDBA operation

Input:

X

: bat population;
Output:

H P a r a m_{b e s t}

: the best Hyperparameter setting;

1: Initialize original bats population $X$ , velocity $v$ ;
2: $A c u u r a c y \leftarrow$ $f (X)$
3: $x_{b e s t} \leftarrow$ GetBest $(A c c u r a c y)$
4: While $t < m a x I t e r a t i o n s$ do
5: for $x_{i} \subset X$ do
6: $(x_{i}^{t + 1}, v_{i}^{t + 1}) \leftarrow$ update $(x_{i}^{t}, v_{i}^{t})$
7: $(x_{i}^{t + 1}, v_{i}^{t + 1}) \leftarrow$ $([x_{i}^{t + 1}], [v_{i}^{t + 1}])$
8: $A c u u r a c y_{i}^{t + 1} \leftarrow$ $f (x_{i}^{t + 1})$
9: end
10: $X \leftarrow$ LocalSearch $(X)$
11: $x_{b e s t} \leftarrow$ GetBest $(X)$
12: $t = t + 1$
13: $H P a r a m_{b e s t} \leftarrow x_{b e s t}$
14: end

In the algorithm, GetBest is the global optimization function for the bat population, and LocalSearch is a local search of bat populations.

H P a r a m_{b e s t}

is the best hyperparameter setting. The formula for update is shown below:

f_{i} = f_{\min} + (f_{\min} + f_{\max}) γ

(7)

v_{i}^{t + 1} = v_{i}^{t} + (x_{i}^{t + 1} + x_{b e s t}) f_{i}

(8)

x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1}

(9)

where

γ \in [0, 1]

is a random vector that obeys a uniform distribution, and

x_{b e s t}

is the current global optimalization result.

4. Data Sets and Evaluation Metrics

4.1. Data Sets and Experimental Setting

All experiments were conducted on 64-bit Windows AMD Ryzen 75,800 H with Radeon graphics (3.2 GHz), 16 GB RAM, and an NVIDIA GeForce RTX 3070 laptop GPU. The methods were developed using Python 3.8 with Tensorflow 2.8 and scikit-learn 1.0.2 packets.

All evaluation experiments were conducted under two malware data sets, including Malimg data set and DataCon data set.

The DataCon data set [44] was provided by multi-domain large-scale competition open data for security research, Qi An Xin Technology Research Institute. It contains 7896 malware samples and 15,759 benign samples collected from real and daily networks. During the process of evaluation, our model was built by the standard data partitioning strategy, where 70% of the data set of each class was used for training purposes, and the remaining 30% of the data set was used for testing the malware for recognition. The details of the DataCon data set are shown in Table 3.

The Malimg data set [16] consisted of real-life malware samples. It contained 25 malware families, with a total of 9435 malware samples. During the process of evaluation, our model was built by the standard data partitioning strategy, where 70% of the data set of each class were used for training purposes, and the remaining 30% of the data set were used for testing the malware for recognition. The details of the Malimg data set are shown in Table 4.

4.2. Evaluation Metrics

In order to evaluate the classification performance of our model, we utilized accuracy, precision, recall, and F1-scores as assessment metrics. These metrics have been extensively applied by the research community to provide evaluations of models [45,46,47]. These metrics are defined as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(10)

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(13)

where

T P

represents the actual positive malware types that are correctly predicted as positive samples, and

F P

means the actual negative malware types that are wrongly predicted as positive samples. Similarly,

T N

implies the actual negative malware types that are correctly predicted as negative samples, and

F N

means the actual positive malware types that are wrongly predicted as negative samples. The

T P

,

F P

,

T N

, and

F N

values can be represented as shown in Figure 6, where

F_{i} (i = 0, 1, 2 \dots n)

refers to malware families.

5. Experimental Results and Discussion

We evaluated the effectiveness and efficiency of the proposed approach by the following experiments:

(1): Effects of different malware image sizes on model performance;
(2): Comparison of our model performance with advance deep learning frameworks;
(3): Ablation experiments;
(4): Hyperparameter optimization experiment;
(5): Comparative analysis with state-of-the-art solutions;
(6): Validation with the DataCon data set.

5.1. Effects of Different Malware Image Sizes on Model Performance

The input size of the model based on CNN must be a fixed size. Therefore, we reshaped the malware image to a fixed square size using bilinear interpolation algorithm. In addition, the scale and performance of the model are closely related to the malware image sizes which are input to the proposed model. The smaller the input size of model, the more difficult it is to retain all the texture information of the image, making it easy to misclassify similar malware family samples. Conversely, the larger the input size costs more computation time, and even causes model overfitting. In order to obtain an optimal input size, the malware images were normalized to 32 × 32, 64 × 64, 128 × 128, 256 × 256, and 512 × 512, using bilinear interpolation; the results are shown in Table 5.

The results show that the classification accuracy increased initially from 84.28% to 99.36%, and then decreased to 99.04% when increasing the size of malicious code images from 64 × 64 to 512 × 512. This phenomenon indicated that the model was overfitted, and the input size of 256 × 256 was critical. In addition, the model parameters increased with increasing malware image size. Moreover, the larger number of parameters indicated that there was a larger consumption of computer resources, which also led to an increase in the training time of the model. Considering the classification performance and parameters, we selected a malware image size of 256 × 256 as the input for the model.

5.2. Comparison of Our Model Performance with Advanced Deep Learning Frameworks

Since AlexNet [48] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, a surge of development in deep learning occurred. From the efforts of these experts and scholars came numerous excellent CNN structures that were proposed, one after another, such as VGGNet [49], ResNet [50], and GoogleNet [51]. These CNN models have been utilized to resolve the classification issue of multi-class malware, with good results. In order to validate the effectiveness of the proposed model in this paper, this section first evaluated the training performance of this model and then compared our model with AlexNet8, VGG16, ResNet50, and Inception V3. In this section, we implemented the experiments on the Malimg data set after data augmentation, which contained 51608 samples in the training set.

Figure 7 shows the training curve of our model with 50 rounds. The training time of our model was 10,081.05 s and the memory consumption was 3365.03 MB(Mbytes). It can be observed that the model converged when the epoch reached 20, indicating a faster convergence rate. The classification accuracy reached 100% in the training set and 99.36% in the test set. The model performed well on both the training and test sets without overfitting. In addition, the training time for the model was closely related to the performance of the GPU. Using a powerful GPU can greatly reduce the time required to train the model on hundreds of thousands to millions of malware variants in an acceptable time.

Table 6 shows the results of the comparison experiments between our model and three other classical neural network models. From Table 6, it can be observed that the accuracy, precision, recall, and F1-score of our model improved significantly compared with the other models. The performance of AlexNet8 was limited to its shallow architecture. The performance of ResNet50 and VGG16 were better than AlexNet8 due to their greater depth of network. VGG16 utilized small-size filters to increase the depth of the network, while ResNet50 used residual block to solve the problem of gradient explosion and gradient disappearance for more convolution layers. Inception V3 increased the width of the neural network architecture. The experimental results indicated that the width of the neural network can improve model performance. Our model achieved the best performance among all of the above models, due to the design of multi-scale feature extraction blocks. It was able to capture more abundant texture information of feature maps from different scales of receptive field (RF), and enhanced the expression of typical features.

In order to further observe and analyze the classification performance of the models in each malware family, the distribution of the five models in each malware family in the Malimg data set was plotted in Figure 8. The results indicate that our model significantly improved the insufficient classification accuracy of AlexNet, VGG16, ResNet50, and Inception V3 in some confusable malware families to different degrees, such as C2LOP.P, C2LOP.gen!g, Swizzor.gen!E, and Swizzor.gen!, which ultimately improved the overall classification accuracy of our model. The above experimental results demonstrate that a combination of multi-scale feature fusion and channel attention mechanism has excellent feature extraction capability, which can effectively extract representative features from malware and distinguish easily confused malware variants.

The mean time to detect (MTTD) reflects the average time taken by the detector to successfully identify a threat. The experiments were implemented by calculating the parameters and MTTDs of the models, and the results are shown in Table 7. The experimental results show that our model obtained the smallest volume, with 1.11 million(M) parameters, which was 1/362 that of AlexNet, 1/134 that of VGG16, 1/23 that of ResNet50, and 1/21 that of Inception V3. In addition, our model also yielded the fastest detection speed among the models, requiring on average just 9.63 milliseconds to identify a new sample. The advanced method, IMCFN [27], required an MTTD of 1.18 s on average. This means that our model identified 123 unknown samples by the time IMCEC identified one. This results further demonstrate the light-weight structure and timeliness of our model.

5.3. Ablation Experiments

Ablation experiments were implemented in this section to evaluate whether the multi-scale feature extraction blocks could improve the classification effect. In this section, all the models were trained on the Malimg data set after data augmentation, and the experimental results are shown in Table 8 and Figure 9.

As shown in Table 8, in comparing CNN with CNN + Channel Attention, and Multi-scale Feature Fusion with Multi-scale Feature Fusion + Channel Attention, the results showed that the classification performance of a model can be significantly improved by introducing channel attention mechanism. Channel attention mechanism can improve the image texture feature extraction capability by enhancing the channel feature expressions with good feature extraction effect, and suppressing the channel feature expressions with poor feature extraction effect in the feature maps. In comparing CNN with Multi-scale Feature Fusion, and CNN + Channel Attention with Multi-scale Feature Fusion + Channel Attention, it was demonstrated that the multi-scale feature fusion effectively improved the accuracy. Global and local information are considered by multi-scale feature fusion, which increases feature diversity. The details of classification in each malware family of all the models are shown in Figure 9, which also shows the confusion matrixes for each model. In the confusion matrix, the ordinate represents the predicted label while the abscissa represents the true label of malware families, and the diagonal indicates the correct prediction result. This means that the fewer discrete points there are, the better the detection performance it achieves. Compared with other models, Multi-scale Feature Fusion + Channel Attention achieved the best detection performance due to improvements in classification accuracy for the confusable malware families, particularly in C2LOP.P, C2LOP.gen!g, Swizzor.gen!E, and Swizzor.gen!. These results demonstrate that the proposed model enhances critical feature representation, and improves the classification accuracy of confusable malware families.

5.4. Hyperparameter Optimization Experiment

In order to verify the effectiveness of the proposed hyperparameter optimization algorithm, we compared the results without HDBA, including no optimization and manual optimization, with the model using HDBA on the Malimg data set. In order to ensure that no other factors interfered, we implemented the experiments with fixed samples of model input. The results of the comparison experiments are shown in Table 9. Note that the manual optimization obtained a higher accuracy than no optimization, and our model with HDBA achieved the best performance among all the models. In summary, using manual optimization can improve the classification accuracy of a model to a certain extent, but it is difficult to optimize the performance of the model. The hyperparameter optimization algorithm that we proposed in this article, HDBA, can better optimize the value of hyperparameters and improve the classification performance of the model. Moreover, it can resolve the problem of having no theoretical basis for hyperparameter optimization.

5.5. Comparative Analysis with State-of-the-Art Solutions

In order to validate the efficiency and effectiveness of our model, we compared our method with existing malware classification methods that used machine learning or deep learning techniques on the Malimg data set. Table 10 summarizes the comparative assessment with recent state-of-the-art solutions. As the results show, our method achieved the best performance compared with other existing malware classification techniques. This also indicated that the proposed method contributes to capability improvement for malware classification from a combination of data augmentation, multi-scale feature fusion, channel attention mechanism, and HDBA. Our model constructed a connection with multi-scale feature information; thus, the feature extraction method we proposed was better to capture the representative characteristics. Secondly, data augmentation can minimize the impact of imbalanced malware data sets. Moreover, HDBA is effective for improving accuracy by optimizing hyperparameters of our model.

5.6. Validation with DataCon Data Set

In this section, we validated the proposed method with the DataCon data set of 7896 mining samples and 15759 non-mining samples. The DataCon dataset is a challenging malware data set. The samples in the data set were captured from a live network in 2020, which consisted of numerous adversarial samples by malware development toolkits, such us packing, instruction virtualization, and metamorphism. We evaluated the Datacon data set when no HDBA was applied. Then, we evaluated the Datacon data set with our model using HDBA. The experimental results are shown in Table 11. We achieved an overall accuracy of 96.64% for our model while 95.46% for no HDBA. The results further illustrated excellent recognition capability of our model, even for malware samples in live networks. Moreover, our model made adaptive adjustments according to different malware data sets.

6. Experimental Results and Discussion

As the number of malware variants continues to multiply, they pose a significant threat to network security. This paper proposed an improved vision-based intelligent malware detection method. The proposed method is divided into two parts: data preprocessing, and feature extraction and classification. The generated malware images are normalized via bilinear interpolation algorithm, and data augmentation is utilized to resolve the problem of imbalanced malware data sets. With this lightweight design idea, a novel CNN structure was proposed that combined with multi-scale feature fusion and channel attention mechanism to enhance feature extraction capabilities. In addition, we proposed a hyperparameter optimization algorithm, named HDBA, to improve the performance of the model. Experimental results indicated that our model achieved an accuracy of 99.36%, which is higher than state-of-the-art malware detection methods. Moreover, the MTTD of our model was only 9.63 ms with low parameters of 1.11 M. Based on validation on the DataCon malware data set, our model can effectively identify malware variants from real and daily networks.

Author Contributions

Funding acquisition, Y.S.; Investigation, J.W.; Methodology, S.W.; Resources, J.W.; Validation, S.L. and W.H.; Visualization, S.W.; Writing—original draft, S.W.; Writing—review & editing, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Science Foundation of China (61806219, 61703426 and 61876189), by the National Science Foundation of Shaanxi Provence (2021JM-226) by Young Talent fund of University and Association for Science and Technology in Shaanxi, China (20190108, 20220106), and by the Innovation Capability Support Plan of Shaanxi, China (2020KJXX-065).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this paper can be obtained by contacting the authors of this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Network Security Information and Dynamics Weekly Report. Available online: https://www.cert.org.cn/publish/main/44/index.html (accessed on 13 September 2022).
Zhang, J.; Qin, Z.; Yin, H.; Ou, L.; Zhang, K. A feature-hybrid malware variants detection using CNN based opcode embedding and BPNN based API embedding. Comput. Secur. 2019, 84, 376–392. [Google Scholar] [CrossRef]
Zhang, J.; Gao, C.; Gong, L.; Gu, Z.; Man, D.; Yang, W.; Li, W. Malware Detection Based on Multi-level and Dynamic Multi-feature Using Ensemble Learning at Hypervisor. Mob. Networks Appl. 2020, 26, 1668–1685. [Google Scholar] [CrossRef]
Dai, Y.; Li, H.; Qian, Y.; Lu, X. A malware classification method based on memory dump grayscale image. Digit. Investig. 2018, 27, 30–37. [Google Scholar] [CrossRef]
Souri, A.; Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Cent. Comput. Inf. Sci. 2018, 8, 3. [Google Scholar] [CrossRef]
Aslan, O.; Samet, R. A Comprehensive Review on Malware Detection Approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
Daniel, G.; Carles, M.; Jordi, P. The Rise of Machine Learning for Detection and Classification of Malware. J. Netw. Comput. Appl. 2020, 153, 102526. [Google Scholar]
Le, Q.; Boydell, O.; Mac Namee, B.; Scanlon, M. Deep learning at the shallow end: Malware classification for non-domain experts. Digit. Investig. 2018, 26, S118–S126. [Google Scholar] [CrossRef]
Samaneh, M.; Ali, A.G. Application of Deep Learning to Cybersecurity: A Survey. Neurocomputing 2019, 347, 149–176. [Google Scholar]
Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
Conti, G.; Bratus, S.; Shubina, A.; Sangster, B.; Ragsdale, R.; Supan, M.; Lichtenberg, A.; Perez-Alemany, R. Automated mapping of large binary objects using primitive fragment type classification. Digit. Investig. 2010, 7, S3–S12. [Google Scholar] [CrossRef]
Nataraj, L.; Yegneswaran, V.; Porras, P.; Zhang, J. A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, Chicago, IL, USA, 21 October 2011; pp. 21–30. [Google Scholar] [CrossRef]
Yu, J.; He, Y.; Yan, Q.; Kang, X. SpecView: Malware Spectrum Visualization Framework with Singular Spectrum Transformation. IEEE Trans. Inf. Forensics Secur. 2021, 16, 5093–5107. [Google Scholar] [CrossRef]
Xiao, M.; Guo, C.; Shen, G.; Cui, Y.; Jiang, C. Image-based malware classification using section distribution information. Comput. Secur. 2021, 110, 102420. [Google Scholar] [CrossRef]
Wang, S.; Wang, J.; Song, Y.; Li, S. Malicious Code Variant Identification Based on Multiscale Feature Fusion CNNs. Comput. Intell. Neurosci. 2021, 2021, 1070586. [Google Scholar] [CrossRef] [PubMed]
Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B. Malware images: Visualization and automatic classification. In Proceedings of the VizSec ‘11: 2011 International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
Kabanga, E.K.; Kim, C.H. Malware Images Classification Using Convolutional Neural Network. J. Comput. Commun. 2018, 6, 153–158. [Google Scholar] [CrossRef]
Liu, Y.-S.; Lai, Y.-K.; Wang, Z.-H.; Yan, H.-B. A New Learning Approach to Malware Classification Using Discriminative Feature Extraction. IEEE Access 2019, 7, 13015–13023. [Google Scholar] [CrossRef]
Naeem, H.; Guo, B.; Naeem, M.R.; Ullah, F.; Aldabbas, H.; Javed, M.S. Identification of malicious code variants based on image visualization. Comput. Electr. Eng. 2019, 76, 225–237. [Google Scholar] [CrossRef]
Nataraj, L.; Manjunath, B. SPAM: Signal Processing to Analyze Malware. IEEE Signal Process. Mag. 2016, 33, 105–117. [Google Scholar] [CrossRef]
Roseline, S.A.; Geetha, S.; Kadry, S.; Nam, Y. Intelligent Vision-Based Malware Detection and Classification Using Deep Random Forest Paradigm. IEEE Access 2020, 8, 206303–206324. [Google Scholar] [CrossRef]
Yue, S. Imbalanced Malware Images Classification: A CNN based Approach. arXiv 2017, arXiv:1708.08042. [Google Scholar]
Catal, C.; Gunduz, H.; Ozcan, A. Malware Detection Based on Graph Attention Networks for Intelligent Transportation Systems. Electronics 2021, 10, 2534. [Google Scholar] [CrossRef]
Gibert, D.; Mateu, C.; Planes, J.; Vicens, R. Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 2019, 15, 15–28. [Google Scholar] [CrossRef]
Venkatraman, S.; Alazab, M.; Vinayakumar, R. A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 2019, 47, 377–389. [Google Scholar] [CrossRef]
Zhihua, C.; Lei, D.; Penghong, W.; Xingjuan, C.; Wensheng, Z. Malicious code detection based on CNNs and multi-objective algorithm. J. Parallel Distr. Com. 2019, 129, 50–58. [Google Scholar]
Danish, V.; Mamoun, A.; Sobia, W.; Babak, S.; Qin, Z. Image-Based malware classification using ensemble of CNN architectures (IMCEC). Comput. Secur. 2020, 92, 101748. [Google Scholar]
Danish, V.; Mamoun, A.; Sobia, W.; Hamad, N.; Babak, S.; Qin, Z. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 2020, 171, 107138. [Google Scholar]
Catal, C.; Giray, G.; Tekinerdogan, B. Applications of deep learning for mobile malware detection: A systematic literature review. Neural Comput. Appl. 2022, 34, 1007–1032. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, A.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Wang, C.; Zhao, Z.; Wang, F.; Li, Q. A Novel Malware Detection and Family Classification Scheme for IoT Based on DEAM and DenseNet. Secur. Commun. Networks 2021, 2021, 6658842. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 2261–2269. [Google Scholar]
Li, Q.; Mi, J.; Li, W.; Wang, J.; Cheng, M. CNN-Based Malware Variants Detection Method for Internet of Things. IEEE Internet Things J. 2021, 8, 16946–16962. [Google Scholar] [CrossRef]
Cui, Z.; Xue, F.; Cai, X.; Cao, Y.; Wang, G.-G.; Chen, J. Detection of Malicious Code Variants Based on Deep Learning. IEEE Trans. Ind. Inform. 2018, 14, 3187–3196. [Google Scholar] [CrossRef]
Hemalatha, J.; Roseline, S.; Geetha, S.; Kadry, S.; Damaševičius, R. An Efficient DenseNet-Based Deep Learning Model for Malware Detection. Entropy 2021, 23, 344. [Google Scholar] [CrossRef]
Bansal, M.; Kumar, M.; Sachdeva, M.; Mittal, A. Transfer learning for image classification using VGG19: Caltech-101 image data set. J. Ambient Intell. Humaniz. Comput. 2021, 1–12. [Google Scholar] [CrossRef]
Kumar, S.; Janet, B. DTMIC: Deep transfer learning for malware image classification. J. Inf. Secur. Appl. 2022, 64, 103063. [Google Scholar] [CrossRef]
El-Shafai, W.; Almomani, I.; AlKhayer, A. Visualized Malware Multi-Classification Framework Using Fine-Tuned CNN-Based Transfer Learning Models. Appl. Sci. 2021, 11, 6446. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Xie, J.; Ma, Z.; Lei, J.; Zhang, G.; Xue, J.H.; Tan, Z.H.; Guo, J. Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 1, 4605–4625. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn Res. 2014, 15, 1929–1958. [Google Scholar]
DataCon: Multi-Domain Large-Scale Competition Open Data for Security Research. Available online: https://datacon.qianxin.com/opendata (accessed on 13 September 2022).
Anandhi, V.; Vinod, P.; Menon, V.G. Malware visualization and detection using DenseNets. Pers. Ubiquitous Comput. 2021, 1–17. [Google Scholar] [CrossRef]
Naeem, H.; Ullah, F.; Naeem, M.R.; Khalid, S.; Vasan, D.; Jabbar, S.; Saeed, S. Malware detection in industrial internet of things based on hybrid image visualization and deep learning model. Ad Hoc Netw. 2020, 105, 102154. [Google Scholar] [CrossRef]
Xiao, G.; Li, J.; Chen, Y.; Li, K. MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J. Parallel Distrib. Comput. 2020, 141, 49–58. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Venkatraman, S. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 2019, 7, 46717–46738. [Google Scholar] [CrossRef]
Naeem, H.; Guo, B.; Ullah, F.; Naeem, M.R. A Cross-Platform Malware Variant Classification based on Image Representation. KSII Trans. Internet Inf. Syst. 2019, 13, 3756–3777. [Google Scholar]
Vinita, V.; Sunil, K.M.; Singh, V.B. Multiclass Malware Classification via First- and Second-Order Texture Statistics. Comput. Secur. 2020, 97, 101895. [Google Scholar]
Moussas, V.; Andreatos, A. Malware Detection Based on Code Visualization and Two-Level Classification. Information 2021, 12, 118. [Google Scholar] [CrossRef]
Sudhakar; Kumar, S. MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things. Futur. Gener. Comput. Syst. 2021, 125, 334–351. [Google Scholar] [CrossRef]

Figure 1. Overall structure of the proposed malware classification framework.

Figure 2. Malware samples after visualization from different families.

Figure 3. Malware samples after size normalization.

Figure 4. Partial malware images after data augmentation.

Figure 5. Structure of a multi-scale feature extraction block.

Figure 6. Classification confusion matrix.

Figure 7. Training curve of the proposed model.

Figure 8. Accuracy, precision, recall, and F1-scores for each model in every malware family.

Figure 9. Confusion matrixes for each model.

Table 1. Summary of recent related research.

Study	Approach	Contribution	Description	Disadvantage
[18]	Machine learning	Proposed a multi-layer learning framework based on a bag-of-visual-words (BoVW) model to obtain feature descriptors of malware images.	Multi-layer LBP+SIFT+BoVW	Long prediction time
[19]	Machine learning	Proposed a malware identification model for characterizing malicious variants, both locally and globally, to achieve a useful classification.	LGMP	Long prediction time (4.27 s)
[20]	Machine learning	Explored orthogonal yet complementary methods to analyze malware, motivated by signal and image processing.	SPAM	Low detection accuracy
[21]	Machine learning	Proposed a diverse deep forest model for effective malware detection and classification.	CRF+RF+Deep Forest	Poor generalization ability and high memory cost
[22]	Deep learning	Proposed a weighted softmax loss for convolutional neural networks on unbalanced malware images classification.	Weighted softmax loss+Vgg-verydeep-19	Poor feature extraction ability
[23]	Deep learning	Proposed a graph attention network (GAN)-based framework to improve the performance of malware detection models.	Node2Vec+GAN	Low detection accuracy
[24]	Deep learning	Proposed a novel file, agnostic deep learning system, for classification of malware based on its visualization as grayscale images.	CNN	Poor feature extraction ability
[25]	Deep learning	Proposed a new hybrid model for image-based analysis using similarity mining and deep learning architectures to identify and classify obfuscated malware accurately.	CNN+LSTM	High computational overhead
[26]	Deep learning	Proposed a method to advance the detection of malicious code using convolutional neural networks and intelligence algorithm.	NSGA-II+CNN	Long training time and high computational overhead
[27]	Deep learning	Proposed a novel ensemble convolutional neural networks-based architecture for effective detection of both packed and unpacked malware.	IMCEC	Long prediction time (4.27 s)
[28]	Deep learning	Proposed a novel classifier to detect variants of malware families and improve malware detection using CNN-based deep learning architecture.	IMCFN	High computational overhead

Table 2. Parameter settings of data augmentation.

Method	Setting	Method	Setting
rescale	1/255	shear range	0.0
width shift	0.0	zoom range	0.0
height shift	0.0	horizontal flip	false
rotation range	0.0	fill mode	none

Table 3. Details of each malware family in the DataCon data set.

Family	Samples
Mining	15,759
Non-mining	7896

Table 4. Details of each malware family in the Malimg data set.

Malware Type	Family Name	No.	Samples
Worm	Allaple.A	3	2824
	Allaple.L	4	1491
	VB.AT	23	383
	Yuner.A	25	775
Worm: AutoIT	Autorun.K	6	81
Trojan	Alueron.gen!J	5	173
	C2LOP.gen!g	7	175
	C2LOP.P	8	121
	Malex.gen!J	17	111
	Skintrim.N	20	55
Dialer	Adialer.C	1	97
	Dialplatform.B	9	152
	Instantaccess	12	356
PWS	Lolyda.AA1	13	153
	Lolyda.AA2	14	159
	Lolyda.AA3	15	98
	Lolyda.AT	16	134
Trojan Downloader	Dontovo.A	10	137
	Obfuscator.AD	18	117
	Swizzor.gen!E	21	103
	Swizzor.gen!I	22	107
	Wintrim.BX	24	72
Rogue	Fakerean	11	306
Backdoor	Agent.FYI	2	91
Backdoor	Rbot!gen	19	133

Table 5. Performance of the model with different malware images.

Input Size	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Parameters (M)
32 × 32	84.28	83.44	84.27	82.82	0.31
64 × 64	94.22	93.08	94.22	93.26	0.35
128 × 128	98.61	98.61	98.60	98.60	0.50
256 × 256	99.36	99.39	99.36	99.36	1.11
512 × 512	99.04	99.18	99.04	99.06	3.58

Table 6. Performance of each model.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
AlexNet8	94.12	92.77	94.11	93.19
VGG16	97.11	97.00	97.11	97.03
ResNet50	97.54	97.57	97.54	97.53
Inception V3	98.71	98.72	98.71	98.71
Our model	99.36	99.39	99.36	99.36

Table 7. Parameters and MTTDs of the tested models.

Model	AlexNet	VGG16	ResNet50	Inception V3	Our Model
Parameters (M)	362.0	134.4	23.6	21.9	1.11
MTTD	28.28	20.84	19.61	29.18	9.63

Table 8. Results of ablation experiments.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
CNN	96.47	96.53	96.47	96.46
CNN + Channel Attention	97.86	97.85	97.86	97.77
Multi-scale Feature Fusion	98.72	98.86	98.72	98.73
Multi-scale Feature Fusion + Channel Attention	99.36	99.39	99.36	99.36

Table 9. Results of the hyperparameter optimization experiment.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
No optimization	98.82	98.86	98.82	98.84
Manual optimization	99.04	99.18	99.04	99.05
HDBA	99.36	99.39	99.36	99.36

Table 10. Results of a comparative assessment with recent state-of-the-art solutions.

Methods	Year	Description	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Ref [16]	2011	GIST + KNN	97.18	-	-	-
Ref [20]	2016	SPAM	97.40	-	-	-
Ref [36]	2018	DRBA + CNN	94.50	96.60	88.40	-
Ref [19]	2019	LGMP + KNN	98.40	-	98.20	97.1
Ref [26]	2019	NSGAII + CNN	97.60	97.60	88.40	-
Ref [52]	2019	CNN + LSTM	96.30	96.30	96.20	96.20
Ref [25]	2019	CNN + BiGRU	96.30	91.80	91.50	91.60
Ref [53]	2019	CSGM + KNN	98.40	-	98.20	97.10
Ref [54]	2020	MxN + GLCM	98.58	98.04	98.06	98.05
Ref [21]	2020	Sliding Window Scanning + RF	98.65	98.86	98.63	98.74
Ref [46]	2020	DCNN	98.79	98.79	98.47	98.46
Ref [28]	2020	IMCFN	98.82	98.85	98.81	98.75
Ref [45]	2021	DenseNet201	98.97	-	-	98.88
Ref [33]	2021	DEAM + Densenet	98.50	96.90	96.60	96.70
Ref [55]	2021	Two-level ANN	99.13	-	-	-
Ref [56]	2021	MCFT-CNN	99.19	97.72	97.76	97.68
Ref [39]	2022	DTMIC	98.93	99.00	99.00	99.00
Ours	2022	Multi-scale Feature Fusion + Channel Attention + HDBA	99.36	99.39	99.36	99.36

Table 11. The performance of our model on the Datacon data set.

Dataset	Our Model				No HDBA
Dataset	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
DataCon dataset	96.64	96.63	96.64	96.62	95.20	95.23	95.20	95.21

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Wang, J.; Song, Y.; Li, S.; Huang, W. Malware Variants Detection Model Based on MFF–HDBA. Appl. Sci. 2022, 12, 9593. https://doi.org/10.3390/app12199593

AMA Style

Wang S, Wang J, Song Y, Li S, Huang W. Malware Variants Detection Model Based on MFF–HDBA. Applied Sciences. 2022; 12(19):9593. https://doi.org/10.3390/app12199593

Chicago/Turabian Style

Wang, Shuo, Jian Wang, Yafei Song, Sicong Li, and Wei Huang. 2022. "Malware Variants Detection Model Based on MFF–HDBA" Applied Sciences 12, no. 19: 9593. https://doi.org/10.3390/app12199593

APA Style

Wang, S., Wang, J., Song, Y., Li, S., & Huang, W. (2022). Malware Variants Detection Model Based on MFF–HDBA. Applied Sciences, 12(19), 9593. https://doi.org/10.3390/app12199593

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Malware Variants Detection Model Based on MFF–HDBA

Abstract

1. Introduction

2. Related Research

2.1. Vision-Based Malware Recognition Using Machine Learning

2.2. Vision-Based Malware Recognition Using Deep Learning

3. Methodology

3.1. Data Preprocessing

3.1.1. Malware Visualization

3.1.2. Bilinear Interpolation Algorithm for Size Normalization

3.1.3. Data Augmentation

3.2. Feature Extraction and Classification

3.3. Hyperparameter Optimization Based on Discrete Bat Algorithm

4. Data Sets and Evaluation Metrics

4.1. Data Sets and Experimental Setting

4.2. Evaluation Metrics

5. Experimental Results and Discussion

5.1. Effects of Different Malware Image Sizes on Model Performance

5.2. Comparison of Our Model Performance with Advanced Deep Learning Frameworks

5.3. Ablation Experiments

5.4. Hyperparameter Optimization Experiment

5.5. Comparative Analysis with State-of-the-Art Solutions

5.6. Validation with DataCon Data Set

6. Experimental Results and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI