A Multifaceted Deep Generative Adversarial Networks Model for Mobile Malware Detection

: Malware’s structural transformation to withstand the detection frameworks encourages hackers to steal the public’s conﬁdential content. Researchers are developing a protective shield against the intrusion of malicious malware in mobile devices. The deep learning-based android malware detection frameworks have ensured public safety; however, their dependency on diverse training samples has constrained their utilization. The handcrafted malware detection mechanisms have achieved remarkable performance, but their computational overheads are a major hurdle in their utilization. In this work, Multifaceted Deep Generative Adversarial Networks Model (MDGAN) has been developed to detect malware in mobile devices. The hybrid GoogleNet and LSTM features of the grayscale and API sequence have been processed in a pixel-by-pixel pattern through conditional GAN for the robust representation of APK ﬁles. The generator produces syntactic malicious features for differentiation in the discriminator network. Experimental validation on the combined AndroZoo and Drebin database has shown 96.2% classiﬁcation accuracy and a 94.7% F-score, which remain superior to the recently reported frameworks.


Introduction
The mobile device has enormously penetrated society in the last few years as a result of enriched interactive features and cost-effectiveness. The advancement of sophisticated sensing devices has brought an exceptional boost to its employment in routine activities. The software developers have made efforts to develop multiple application software related to the academics, health, security, and entertainment sectors. A huge bulk of such application software is publicly available online to assist the end-users. Moreover, online available applications are freely downloaded and installed on user devices without spending any money. The most common operating systems for mobile devices are the Android and iOS operating systems. The software applications interoperable with these operating systems are available on the Google play store and also on third-party software platforms. The Android OS is the primary target for intruders to steal money and the privacy of the public. Malware is malicious software that attacks the operating system of the mobile and attacks the privacy of the users by stealing the credentials of the user. Malware software mostly attacks the android operating system whenever the user unknowingly installs unlicensed apps. The intrusion of malicious software variants through spyware, spoofing, hacking, and phishing has developed an alarming security thread for public credentials and private data. The phishing attacks deceive the recipient by persuading him to click the malicious URL leading to the installation of malware. The open-source code available contains malicious codes including trojan horses, riskware, spyware, adware, and ransomware [1,2]. The malware variants increase their production due to the use of intelligent malware-producing agents such as Zeus, SpyEyea and Dos [3,4]. The privacy and security of the public have recently been researched to develop the privacy-aware applications [5], permissions-assisting application [6], and the malware detection frameworks [7,8]. Anti-virus apps have been developed such as lookout, Norton, and Coodo mobile security, depending on signatures for the detection of malware. The malware signatures are the random snippets and binary patterns extracted from the data samples. The anti-virus developers use the cryptographic signature hash data. The signature data are stored initially to categorize the input data by providing the application [7]. However, malware can encounter the signatures-based detection mechanism by varying a subset of its software sections with preserved semantics.
Various researchers have developed machine learning methods to detect malware in the Android operating system of mobile devices. The developed frameworks can broadly be classified into static and dynamic analysis methods. The static models decompile the application's codes in the installed package data. Moreover, the static features including the application program [9] interface, permission sequence [10], and metadata [11] are extracted in the recompilation process. The model is trained concerning the extracted set of static features for the robust classification of malicious activity by malware. The dynamic analysis models employ the reliable data of the applications, including the system calls [12] and traffic traces [13]. The malware identification model employs a dynamic set of features for the training and classification stage. The dimension of the static and dynamic feature set is extensive, and the model suffers from dimensionality issues. The details of features and samples of the OmniDroid dataset [14] are presented in Table 1. The irrelevant features increase the complexity of the malware detection framework; therefore, preprocessing methodologies have been developed to reduce the feature dimension by removing the unwanted feature values. However, the unsupervised clustering-based malware detection methods that have been developed for detecting zero-day type malware cannot adopt the feature reduction technique to reduce the complexity and dimensionality of the framework [15].
The supervised model developed based on a handcrafted method with a feature reduction mechanism has resulted in a less complex malware identification system. The supervised method consisting of both handcrafted and deep learning-based approaches has been developed by researchers to decrease the complexity while preserving the performance of the model. Various deep learning methods require the conversion of malware into an image in the preprocessing stage followed by the feature extraction and classification stages. The deep learning methods have not only improved the classification performance but also reduced effort for the identification of robust features in the overall system. However, the deep learning model requires a huge set of examples for efficient training. In practice, it is not possible due to limited set of malware samples available. Therefore, the deep learning frameworks are bounded to achieve a constrained accuracy. To withstand the hurdle of limited data samples, various researchers have employed the oversampling phenomena by creating the syntactic samples in SMOTE [16]. The limitation of such methods is that mostly, the re-using of existing samples is adopted instead of creating new samples.
Generative Adversarial Networks (GAN) [17] developed consisting of generator and discriminator neural networks produce the adversarial samples to extend the dataset, leading to false classification results. The generator part produces a fake set of samples, while the discriminator part differentes between the fake and real samples. The direct supply of real samples to the GAN leads to false classification results; therefore, a preprocessed sample is required to increase the invariance and robustness in the overall framework. To increase the robustness and invariance of the classification model, a Multifaceted Deep Generative Adversarial Networks Model (MDGAN) has been developed in this work. Various GAN variants exist in the literature which includes the deep convolutional generative adversarial networks (DCGAN) [18], which was developed for unsupervised learning consisting of multiple architectural constraints. The SinGAN [19,20] is developed to learn the image patch distribution at various scales for recognition based on a single train image. The models require RGB input images and mostly focus on the input data type; however; the proposed MDGAN employs the multi-face input consisting of the 2D grayscale image features concatenated to the LSTM binary sequence features set for the detection of malware types. The remaining paper is organized as follows. Section 2 includes the literature review portion that introduces the related works. The proposed methodology portion includes the details of the proposed MDGAN framework. The results section presents the performance of the proposed model in comparison to recently reported works.

Literature Review
The android operating system in mobile devices is vulnerable to critical issues including malware attacks. The public privacy and security issues lead to an extreme financial loss of the vendor's capital expenses [21]. The earliest research on malware detection can broadly be classified into two classes: non-machine learning and machine learning-based methods. The non-machine learning frameworks rely on the signature data that consists of static and dynamic analysis approaches. The static method extracts the features from the static code files, whereas the dynamic analysis employs the features of the executing code. The non-machine learning-based method for malware identification is time-consuming and depends solely on the developer's expertise [22]. Moreover, non-machine learning-based methods also required specialized software environments and computational resources [23]. In comparison, the machine learning methods are more precise and less complex depending on the feature extraction model and the classification model. The feature extraction from the input data being broadly classifier into handcrafted and the deep learning methods considers the patterns of the input value concerning the neighbors, whereas the classification model utilizes the pre-trained feature with an associated category label to classify the test data. Various classifiers such as Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors, and Decision Tree are utilized for the categorization of data into malware and non-malware files. To enhance the classification performance, the malware is first converted into an image before the feature extraction and classification stage. The machine learning methods are based on both handcrafted feature models and deep-leaning based feature extraction models. The image classification task requires a robust and distinctive feature set that remains invariant in the presence of variation in geometry and photometry. Malware identification requires a robust set of features and a stable categorization framework. The conversion of malware into the image type RGB [24], grayscale [25] and binary, consisting of texture, shape, contour, and color distribution can be represented both with handcrafted and DNN methods. The objects within the image are described with the help of color, texture, and shape information [26]. However, due to intra-class variations in color, shape, and illuminations, the image representation becomes quite challenging through simple statistical information. Therefore, stable feature values are identified in the image that do not change when the orientation, scale, and illumination of the objects in the image change. The handcrafted approach for the features extraction process consists of the texture descriptors such as Local Binary Pattern (LBP) [27], Histogram of Oriented Gradient (HoG) [28], and Gray Level Co-occurrence Matrix (GLCM) [29]. The main aim of choosing LBP, HoG, and GLCM is their robustness to noise and invariance to changes in scale, orientation, and the illumination of the objects in the input image. The concatenated shape of all three features representing the same image as a whole is used for the recognition of the objects in the input image. The input image is represented with a multiple features set to bring further distinctiveness and stability to the classification process of the framework. The limitation of such methods is that the handcrafted feature used in this method requires expert knowledge for the extraction of a robust feature set. Furthermore, the feature set designed for one dataset does not work perfectly on the new data files where the new samples of malware are introduced. The DNN and handcrafted model developed so far require a huge set of training samples and require expensive hardware to categorize data precisely. The research aims to develop a highly accurate least complex algorithm for the identification of malware files, which suffers from the constraint on the training samples.
The features extracted from the 2D grayscale visualization of the binary data can help in differentiating various regions of the data file. The features include texture and intensitybased information which cannot be extracted in the binary sequence data. Although a disparity exists between the malware images and the real natural image, the transfer learning through GoogleNet has created a structure that results in a robust set of generic feature values that enhance the classification performance. Therefore, we proposed to combine the features obtained with the image and data sequence to bring invariance and robustness to the malware classification procedure. Various types of neural networks have been developed in the literature for the classification of the ImageNet dataset. The DNN model varies in depth, layer configuration, and size. Some very famous DNN models include AlexNet, ResNet, DagNet, Vgg-16, inceptionV3, GoogleNet, and Yolo models [30]. The models can classify the images very precisely but require a huge set of training data, which is not available in the case of malware data. Moreover, the DNN models also suffer from the issue of uncertainty when they receive out-of-domain input. In [31], the framework has been developed to detect the out-of-distribution samples to resolve the issue of uncertainty. In [32], GAN is used for malware data augmentation, and the imbalance set of the sample has been normalized by generating a training sample from the original set of fewer data. In [33], a vision-based multi-classification approach is developed for IoT malware detection. In [34], the forest penalizing attribute-based malware detection method is developed to classify APK files into malware and non-malware data. The data sample constraints are resolved by introducing the Generative Adversarial Network (GAN), which can generate synthetic samples with the discriminator capable of differentiating the synthetic and real samples. GANs were developed to improve the learning mechanism of the deep neural network by the adversarial learning technique. The parallelization property of the generating of synthetic samples makes the GANS superior to the simple generative algorithms such as PixelCNN [35] and FVBNs [36]. In [37], the framework Malware Generative Adversarial Network (MalGAN) is introduced to generate adversarial examples to attack for malware identification in the data. MalGAN, consisting of the generator part and the discriminator part, remains flexible and defensive in malware file recognition in the data [38].

Proposed Methodology
The proposed framework given in Figure 1 shows the overall architecture of the Multifaceted Deep Generative Adversarial Networks Model (MDGAN), which depends on multiple representations at the input face. The proposed framework consists of three main sections to detect the malware in the APK files. The input APK package consisting of various scripts is initially preprocessed and transformed into a binary image and the API sequence file. In the second phase, the binary image is subjected to GoogleNet to extract the distinctive feature part. The API sequence has also proceeded through the LSTM network for the stable set of features. The features are concatenated to obtain a single multi-face vector representation of the APK script files. In the third phase, the GANs are used to extend the trained data through its generator network and then discriminate and classify the input test multi-face feature into malware and non-malware classes.

Data Preprocessing
The input APK file is preprocessed and the malware binary is converted into a grayscale image. A few sample grayscale images of the malware family have been presented in Figure 2. Moreover, to enhance the classification performance, the input APK file is converted to an API sequence for 1D DNN feature extraction. The malware binary is loaded into a 1D array with an 8-bit binary stream transformed into a 2D grayscale image. The input android APK file consists of Manifest.xml for app description, Classes.dex for executable functions, Res directory to store data files, Lib directory for code storage, META-INF for app certificates, Resources.arsc for compiled resources, and Assets directory for Maintainance and upgrade. In the data preprocessing stage, the executable binary files are transformed into 8-bit un-signed segments that are then transformed into 2D grayscale image pixel values, as shown in Figure 3. The width and height of the input 2D image are kept as 224 × 224, which is compatible with the input layer of the GoogleNet DNN model.  The API sequence is achieved in two stages. Initially, the word vocabulary vector is created from the API file; then, the API word vector is transformed into an API sequence consisting of numeric integer values. The API execution sequence is analyzed to identify the number of unique words, and then, the numeric value is assigned to each word of the vocabulary, as shown in Figure 3.

Raw Feature Extraction
The raw feature values have been obtained from the multi-face image and sequential numeric data representation of the input APK files. The input image file is obtained through data preprocessing and has been reshaped into a 224 × 224 image file to become compatible with the input layer of the GoogleNet. The GoogleNet is a 22-layer deep neural network model, consisting of various convolutional layers, pooling layers, inception layers, and fully connected layers employed for the extraction of feature values. The pre-trained GoogleNet is applied through the transfer learning method on the APK image to represent the APK file. The raw feature not only consists of the GoogleNet fully-connected layer variables but also LSTM output feature values extracted from the API sequence. The dimension of the GoogleNet feature vector is 1000 variables, while the dimension of the LSTM feature vector is 64 variables. Both GoogleNet and the LSTM feature values collectively create feature vectors of 1064 variable feature vectors.

Generative Adversarial Network
The implicit GAN model reported in [17] failed to estimate the probability density of the randomly sampled input data. With the complexity of sampling in multi-dimensional data, the GAN failed to generate the synthetic sample for a subset of data. The contingent GAN model resolved the issue by including the conditional constrained χ on the input abstraction layer of the generator network and its discriminator part.
The mapping f : {τ, χ} → r transforms the GAN from an unsupervised to a partially supervised network. The objective function in Equation (1) defines the conditional GANs (CGAN), with the real sample denoted by variable r and the synthetic by variable χ. The f denotes the CGAN, with p(r) as the probability of the real sample, and p(τ) denotes the probability of the random noisy input sample. The symbol ζ with subscript r ∼ p(r) denotes the expected value of the random real samples r, while ζ τ∼p(τ) denotes the expected value of the synthetic samples τ. The variables G and D represent the generator and the discriminator, respectively.
In the conditional GANs, the generator part produces the synthetic samples as much similar to the original content, whereas the discriminator part has to differentiate in the original and synthetic samples. The issue of pattern collapse has been resolved by employ-ing the pixel-to-pixel sample selection in the conditional GANs. The objective function of the pixel-to-pixel conditional GAN g is given in Equation (2).
where the symbol g denotes the pix2pix conditional GAN model. The loss function of the proposed multi-face GANs consisting of pixel-to-pixel conditional GAN depending on the hybrid raw feature is represented with Equations (3) and (4).
whereas the symbol v denotes the input raw feature vector. The variable λ works as an adjustment parameter in Equation (4); when the λ is zero, then the loss function of the pix2pix CGAN turns to a conditional GAN model. The proposed multi-face contingent pixel-to-pixel version of the GAN model is presented in Figure 4. The pixel-to-pixel considers the feature values individually in the raw feature set. The raw feature is divided into N × 1 and patch, and the discriminator judges the patch instead of the whole feature vector for authenticity. The N dimension equals 128, which provided a higher recognition in the proposed model. The classification method of LeNet-5 [39] is employed, and the features extracted are classified with the fully connected layer to reduce the complexity of the computational procedures. The network parameters are trained separately to optimize the classification performance of the network.

Datasets
The dataset employed in the experimental validation of the proposed model consists of 5546 malicious and 5831 benign Android APK files. The benign files were obtained from the AndroZoo database, where the (.CSV) file annotates the description of each app file in the dataset. The malware files collected from the Derbin database consisted of 179 malware families collectively. The malware family files such as Fake-Installer, Droid-KungFu, Plankton, Op-fake, Base-Bridge, Gin-Master, Iconosys, Kmin, Adrd, Geinimi, Fake-Doc, and DroidDream are considered in the database. The details of the dataset family samples have been shown in Figure 5. The complete database is divided with a ratio of 7:3 for training and test sets, respectively.

Evaluation
The performance evaluation of the proposed framework is shown based on the confusion matrix given in Figure 6. The horizontal axis displays the actual category index, while the vertical axis represents the predicted labels. The list of overall evaluation parameters has been displayed in Table 2. The T i,j in the table denotes the distribution of actual ith values in the jth prediction class of the nth family.

Parameter Expression
Mean Precision The overall accuracy obtained with the expression is given in Equation (8), where the parameters T P , T N , F P , and F N denote the true positive, true negative, false positive, and false negative respectively.
Overall-Accuracy = T P + T N T P + T N + F P + F N (8)

Classification Results
The generator networks generate adversarial synthetic samples similar to malicious samples when the conditional synthetic χ and the real × match. This enhances the training data for the machine learning procedure used by the proposed MDGAN model. The confusion matrix shown in Figure 6 displays the effectiveness of the adversarial training in the proposed MDGAN method combined with the multi-face data input mechanism that has learned and categorized the malware family. The malware family samples count shown in Figure 5 displays the total number of samples present in each malware type of the family. The malware family classification result shown in the confusion matrix witnesses the superiority of the proposed framework over other recently reported mechanisms. The DroidDream and Iconosys have not learned the feature set more effectively and obtained 85.7% and 88.5% precision values with their 81 and 152 samples in the family as given in Table 3. The Adrd and Geinimi both have the highest inter-class variation due to which they have achieved the highest precision of 100%, while the FakeInstaller achieved the second highest precision of 98.9% due to its highest samples strength of 925 samples in the family. The mean precision value obtained by the proposed MDGAN is 95.1%, which is higher than DNN-RNN, CNN-raw opcodes, DroidDetective, API calls-yerima, and CNN-BiLSTM-NB with precision of 90%, 87.2%, 89.5%, 94.3%, and 90.0%, respectively. The mean recall value of MDGAN is 94.6%, which is slightly lower when compared to the 96% value attained by DroidDetective given in Table 4. The mean F1-score of MDGAN is 94.7%, which is superior to DNN-RNN, CNN-raw opcodes, DroidDetective, API calls-yerima, and CNN-BiLSTM-NB with F1-score values of 87.1%, 86.2%, 92.1%, 92.3%, and 86.3% respectively. The mean accuracy achieved by MDGAN is 96.2%, which is superior to DNN-RNN, CNN-raw opcodes, DroidDetective, API calls-yerima, and CNN-BiLSTM-NB with a mean accuracy of 90%, 87.4%, 86.0%, 91.8%, and 88.1%.

Conclusions
The Multifaceted Deep Generative Adversarial Networks Model (MDGAN) was developed to identify malware APIs installed on mobile devices. The android devices operate the open access applications available on the Google play store, which contain malware files that affect the APK file by performing malicious activities to leak the privacy and safety of the users. The proposed framework is multi-face with hybrid DNN API-image and API sequence features, and it is interfaced with conditional GAN operating on the pixel-to-pixel sample selection. The proposed MDGAN achieved superior performance compared to the existing works with 95.1%, 94.6%, 94.7%, and 96.2% mean precision, recall, F1-score, and average accuracy, respectively.

Conflicts of Interest:
The authors declare no conflict of interest.