Towards Explainable Quantum Machine Learning for Mobile Malware Detection and Classiﬁcation †

paper published


Introduction
In 2022, Android became the most popular operating system in the world, with over 2.5 billion active users spanning over 190 countries https://www.businessofapps.com/ data/android-statistics/ (accessed on 16 November 2022). In the past decade, Google Play (https://play.google.com/ (accessed on 16 November 2022)), the official platform for downloading mobile applications for Android-powered devices, has grown enormously, reaching USD 38.6 billion in revenue only in 2020. In the same year, there were more than 2.9 million apps in the store that had been downloaded 108 billion times [1]. Although it has struggled to overtake Apple in Japan and the U.S. currently, Android is the mostpopular platform in most places in the world. In countries such as Brazil, India, Indonesia, Iran, and Turkey, it has an over 85% market share [1]. Thanks to this spread, malicious developers are finding new ways to embed malicious payloads into legitimate applications to exfiltrate private and sensitive data from our mobile devices (and to obtain revenue from the gathered information) [2]. For this reason, the adoption of malicious samples containing Trojans, adware, and ransomware malicious payloads is not a surprise for techniques have been demonstrated to obtain better results in image analysis, security researchers are proposing methods to analyze applications in terms of images [12].
A criticism raised for deep learning techniques is due to the so-called lack of explainability [13,14], aimed to provide a kind of explanation behind a certain classification performed by deep learning models. We consider the explanation in terms of capturing the high-level visual patterns to describe the areas of the input image that have influenced most of the classifier prediction.
In very recent times, we have been witnessing the introduction of quantum computing Machine Learning, devoted to exploiting quantum computing concepts in the Machine Learning field [15], i.e., to perform classification tasks by considering quantum theory [16].
In this paper, we propose several Android malware detectors based on deep learning architectures and quantum computing ones. The main idea is to introduce the concept of quantum computing in malware detection, comparing quantum and classical convolutional models in terms of accuracy for the malware detection task in the Android environment. At the current state-of-the-art, according to the knowledge of the authors, this paper represents the first attempt to introduce quantum computing in image-based malware detection.
This paper represents an extension of the research entitled "Introducing Quantum Computing in Mobile Malware Detection" [17], accepted for publication at the 17th International Workshop on Frontiers in Availability, Reliability, and Security (FARES 2022) to be held in conjunction with the 17th International Conference on Availability, Reliability, and Security (ARES 2022). Concerning the work in [17], we explain the novel contributions introduced in this paper: • A fully quantum Machine Learning network is presented and considered in the experimental analysis, while in the previous paper, a quantum hybrid network was considered. The latter represents the main contribution of the paper: in fact, this represents the first attempt to apply a fully quantum Machine Learning model to a malware detection task; • A comparison between two different quantum architectures, i.e., the full and the hybrid quantum one; • Comparison between state-of-the-art Convolutional Neural Network and quantum architectures; • We extend the number of experiments presented in [17] by tuning the models with the aim to empirically obtain better detection performances; • Three state-of-the-art deep learning models are added (i.e., VGG19, MobileNet, and EfficientNet) to perform a more complete comparison: • To provide explainability behind the model decision, we resort to an algorithm aimed to highlight the areas of the application under analysis (represented as an image) that mostly contributed to a certain prediction (i.e., malware or Trusted). To the best of the knowledge of the authors, this is the second main contribution of the paper. Indeed, this is the first attempt to apply explainability to a quantum Machine Learning model; • We freely release for research purposes the source code we developed for the fully quantum architecture, to encourage researchers to investigate this area.
The remainder of the paper proceeds as follows: In the next session, preliminary notions about malware detection, quantum computing, quantum Machine Learning, and Gradient-weighted Class Activation Mapping are reported. In Section 3, we present the models we considered; the results are reported in Section 4, while in Section 5, the latter are discussed. Finally, in the last section, the conclusions and future research plans are drawn.

Background
This section reports preliminary information about the technique exploited for malware detection in the Android environment (i.e., image-based malware detection) and background notions about quantum Machine Learning. For interested readers, we refer to the literature for further information [18,19].

Image-Based Malware Detection
In recent years, a popular method to classify malware consists of converting malicious software into images and then applying deep learning models based on images to perform classification tasks [20,21]. This process starts from the executable file, which is transformed into an array of values by grouping the bits in blocks and casting the bytes to unsigned integers. Then, the array of values is scaled into a 2D matrix and converted to a grayscale (or RGB) image, by casting each value to a pixel [1].
Most of the malware is a variant of previous malware samples, with some differences in the source code to mislead signature-based antimalware: this is the reason why malware variants of the same original malicious sample are grouped into malware families. Assuming that the malware of the same family shares part of the code, also its corresponding images will have patterns in common. Similar to the classical classification of objects within images, a deep learning model trained on a large number of input samples will be able to recognize the pattern that characterizes one malware family rather than another. Figure 1 reports two samples of two Android malware converted to images, belonging to two different families (i.e., Mecor and Airpush, respectively). The images may look like random noise, but the information coming from the input executables is preserved. Figure 1. Two examples of Android malware converted into grayscale images: in detail on the left, a sample belonging to the Mecor family, while on the right, a sample obtained from the Airpush family.

Quantum Computing
Quantum computing was born to give answers to unsolvable problems with the use of classic computing, and it takes into account quantum mechanics laws, i.e., the part of physics that studies the smallest particles and how they assume more than one state at the same time [22]. In a nutshell, we can state that quantum mechanics is the basis of quantum computing. Indeed, it refers to the scientific laws that regulate the behavior of molecules, atoms, and subatomic particles and uses the related physical phenomena known as superposition and entanglement for the calculation [23].
Computers normally process information in bits that are zeros and one sequence (i.e., on and off), while quantum computers use Quantum Bits (i.e., qubits), which implement the concept of superposition. Simply speaking, the latter is when a bit can assume a value of zero, one, or even both at the same time. The superposition state represents a combination of all possible configurations. Overlapping groups of qubits can create complex and multidimensional computational spaces. It is in these spaces that complex problems are represented in new ways.
Mathematically, a qubit can be seen as a unit vector described in a two-dimensional complex Hilbert vector space C 2 . To represent a complex vectorial space, the use of the notation of Dirac is opportune. It is also possible to obtain the qubit visual representation using the Sphere of Bloch. In that, we can imagine that all possible states are placeable on the surface of a sphere of unit radius, where the two poles represent the two fundamental states, respectively. Starting from Figure 2, it is possible to establish a bi-univocal correspondence between the representation of a generic qubit state: and its representation on the surface of the sphere in R 3 : where θ and ϕ are real numbers that identify the spherical coordinates of the point. This formula has its correspondence to the real physical world: any physical system with at least two discrete and sufficiently separated energy levels can indeed be used for the qubit representation. Figure 2. The Bloch Sphere, the geometrical representation of the pure state space of a two-level quantum mechanical system, i.e., a qubit [24].
Another important difference between classical and quantum computing is that, in the traditional field, the system remains constant from measurement to measurement and the outcome never changes. Indeed, in the quantum world, each operation is irreversible, and the result of a measurement is uncertain. A similarity between classical and quantum computing can be found in the usage of a register. A register is a collection of n qubits that can be represented in 2 n different states. The same happens in classic binary encoding. To represent registers and the qubit union, the tensor product is represented by the following symbol: ⊗.
Due to its unusual processing capacity, quantum computing is aimed at sectors that are traditionally computationally intensive, such as Machine Learning. Machine Learning models themselves can have generalization problems, and given the task of making more and more precise predictions, they become more complex, require more data, and their compu-tations become more expensive. In this case, quantum computing represents an interesting development, as it promises improved performance and better generalization.

Quantum and Classical Machine Learning
The increasing importance of Machine Learning in recent years has led to many relevant studies that investigate the promise of quantum computers for Machine Learning [25][26][27]. Quantum Machine Learning (QML) is the field that concerns the integration of quantum algorithms into classical Machine Learning models. Classical Machine Learning algorithms are used to analyze large amounts of data, and quantum computing helps by using qubits and quantum operators to increase the computation and data storage speed of these algorithms. The QML field is still at the forefront of computer science research, and some improvements (the so-called "Quantum Advantage") have not been proven theoretically, yet. With the concept of the "Quantum Advantage", the researchers refer to the advantage of Quantum Models over classical approaches, by leveraging quantum effects. While some attainable advantage of the Quantum Models over generic classical computations has been proven, there is no certainty that the "advantage" may bring complete "supremacy" in the future; the question is still an open debate. The strong quantum speedup is still debatable if a lower bound for the classical algorithms has not been found yet. Indeed, even if the Quantum Advantage was demonstrated for some problems (such as Shor's factoring algorithm [28]), the demonstration represents a quantum speedup over the best-known classical counterpart, but it may be, theoretically, disrupted by improvements in classical calculation techniques and lower bound verifications.
The main difference between a quantum and a classical calculator is the basic unit of calculation. In the classic case, a process is based on the bit, which can assume only two states, generally represented as 0 and 1, corresponding to the state of charge of a transistor. On the other hand, in the quantum scenario, the equivalent of the bit is the quantum bit (or qubit), which follows properties deriving from the postulates of quantum mechanics. Superposition and entanglement are two of the key concepts of quantum theory and contribute to the great computational capacity of quantum computers. The combination of these two principles allows one to create computer systems characterized by a great calculation capacity and speed of execution. While in classical computer science, a system consisting of two bits can store only one of the four possible binary combinations (00, 01, 10, 11), a two-qubit quantum system can store all four combinations.
In [29], the authors conducted research in which they distinguished four ways to combine quantum computing and Machine Learning. There are several approaches to applying quantum principles. Those methodologies are reported in Figure 3 and differ on whether the data are generated by a Quantum (Q) or Classical (C) system and if the information processing device is Quantum (Q) or Classical (C). Briefly, In summary, case CC refers to classical data treated conventionally, case CQ employs quantum computing to process classical datasets, and case QQ uses quantum data processed by a Quantum Computer. In this paper, we focus our attention on quantum data processed by classical algorithms (i.e., QC). The quantum Machine Learning algorithms can be a quantum version of conventional ML or a completely new algorithm that addresses a classical problem in a quantum scenario. However, also hybrid classical-quantum methods can be used, where parts of the process, usually computationally expensive ones, are assigned to QC. The aim is to combine quantum and classical algorithms to obtain higher performance and decrease the learning cost. One more classification can be performed on the type of input data, which can be encoded with classical or quantum representation; thus, also the data type generates more hybrid approaches between the classical and quantum worlds.

Quantum Neural Network
A neural network is inspired by the biological model of information processing in the brain; it can be summarized as a graph consisting of a set of x m elements linked together by weighted connections with w ml parameters, which represent the equivalent of the synapses of a biological neuronal model.
An activation function defines the value of a neuron based on the current value of all other states weighted by the Wml values, and the dynamics of the network unfold as the neurons are continuously updated through the activation function. This kind of model can be seen as a real computational tool, and its programming can be performed by setting the w ml weights and using an activation function that encodes a certain relationship between the input and output. For pattern classification, we usually consider a feedforward neuronal network, in which neurons are organized into layers (layers) and each layer feeds its information to the next one. A set of initial values is used to feed the input layer, and after subsequent updates on each layer, it is possible to read the final value on the output layer. Feed-forward neural networks often use sigmoid as the activation function: The network is initialized with a set of inputs, and the initial output is compared with the expected values to adjust the value of the weights to minimize the error of the classification. If an appropriate set of weights is given, these kinds of networks can classify new inputs extremely well. Despite various approaches and ideas in an attempt to adapt neural networks to quantum computation, there is no known concrete proposal describing a quantum classification method with neural networks that is sufficiently performant and functional. Finding this adaptation remains one of the most interesting challenges.

Grad-CAM
The classical image classification problem is one of the most popular tasks for Deep Learning models. It consists of classifying images that contain items or generic shapes (such as typewritten letters) with the highest accuracy possible. The deep learning model completes the task by leveraging the information of a dataset of input samples. In the training phase, the deep learning models extract and memorizes features and patterns peculiar to a specific output class, thus learning how to distinguish between the different input samples.
One of the most widely used deep learning models for image classification is the Convolutional Neural Network (CNN), which exploits mathematical convolutional operators on the input image to extract features. The input images pass through several layers of convolution, to combine the pixels with the neighboring ones, and subsampling, to reduce the size of the two-dimensional matrix while preserving the most relevant information. Finally, the last part of the CNN is usually composed of dense layers, which are formed by a variable number of perceptrons); this last part of the model performs the classification, and it is trainable with the standard backpropagation algorithm. We refer to the literature for further information on CNNs [30,31].
Many complex CNN variants were proposed in the literature; mainly, they differ in the size of the architecture and the number of convolutional layers. For instance, AlexNet introduced the use of ReLU as the activation function instead of the then function and optimization for multiple GPUs. It also addresses overfitting by using data augmentation techniques and the dropout layer. In this paper, we experimented with this architecture and other architectures, by considering also a CNN designed by the authors. We refer to the original papers for further information on their architectures [32,33].
Gradient-weighted Class Activation Mapping (Grad-CAM) is a technique to extract the gradients of the deep learning models' convolutional layers and use them to provide graphical information on the inference step. Briefly, the gradients capture high-level visual patterns and can describe which areas of the input image have influenced most of the model output decision. Furthermore, the convolutional layer preserves spatial information; thus, Grad-CAM uses this data to provide a heat map of the input image. This heat map highlights the input image area that was used by the deep learning model to classify a specific input; it provides a visual "explanation" to a certain decision. The Grad-CAM adopted in this work is an implementation of the one introduced by the paper [34].

Method
This section aims to present the proposed method, devoted to directly comparing different deep learning models to discriminate between malicious and legitimate Android applications. Furthermore, those networks are also able to distinguish between malware belonging to different malicious families. To perform our experiments, in addition to the commonly Convolutional Neural Network, we also employed two quantum Machine Learning models, i.e., a hybrid model and a fully quantum one. Figure 4 shows the workflow of the proposed method, composed of four main steps. . The workflow of the proposed method: starting from a dataset of Android applications, converted to grayscale images, a set of convolutional and quantum models is considered to build classifiers aimed to discriminate between malware and Trusted applications, by considering Grad-CAM to provide explainability in terms of graphical information about the area of the images that influenced the prediction.

Dataset
The idea behind this step is to obtain a dataset of real-world legitimate and malicious Android applications and convert each application into a grayscale image (as introduced in Section 2.1). We developed a script to automatically unzip an Android application and consider only the .dex file for the image conversion. We confirmed the maliciousness and the trustworthiness of the Android application under analysis by submitting the dataset to the Virustotal (https://www.virustotal.com/(accessed on 16 November 2022)) web service, aimed to check an application under analysis against more than 60 antimalware engines. It is crucial to gather an extended dataset free of bias and duplicated applications to have consistent results.

Model Research and Implementation
This step is aimed at defining the convolutional and quantum models for the design's comparison. In particular, we considered the following deep learning models: AlexNet, a Convolutional Neural Network developed by the authors in [17] (i.e., Standard-CNN), VGG16, a Hybrid Quantum Convolutional Neural Network (i.e., CNN with a layer that uses transformations in circuits to simulate a quantum convolution), and a Quantum Neural Network (i.e., a fully quantum model considering two-qubit gates, with the readout qubit always acted upon). In the following, we introduce in detail the considered (convolutional and quantum) deep learning models: • Standard-CNN: From 2012, CNNs conquered a plethora of of the ICISSP, 22 gen-24 gen 2018 2018 tasks and are currently growing at a rapid pace. There are differences in their architecture, but all the CNN models are based on the principle of the convolutional filter. This filter, called the kernel, is applied to the pixel matrix that composes the image on the three RGB levels. We report the structure of the CNN proposed by the authors in [17] in Listing 1. The idea is to consider a lighter model if compared to state-of-the-art models (as, for instance, VGG16 also exploited in the comparison). For the Standard-CNN, the following layers (usually considered by typical CNN models) are exploited: convolutional, Maxpooling, Dropout, and Dense layers; • Visual Geometry Group 16: This network, also known as VGG-16 [33], was designed and developed in 2014 to solve difficult image classification tasks by exploiting Im-ageNet, an extended dataset composed of 1000 different output labels belonging to different domains. The VGG16 network demonstrated that network depth is a crucial factor to increase classification accuracy in deep learning. The VGG16 model is composed of several convolution layers, each one with a 3 × 3 filter and Maxpooling layers considering a 2 × 2 filter. The 16 in VGG16 is related to the 16 layers with trainable weights in its architecture. The last model part is composed of 2 fully connected (Dense) layers with a softmax activation to perform the classification task. One of the most typical approaches for VGG16's training is devoted to keeping the convolutional part of the model with the weights obtained from training the model on the ImageNet dataset, while the Dense layers part is trained for the specific classification task required; • Visual Geometry Group 19: Also known as VGG19, it was introduced by Oxford University following VGG16 [33]. This network differs from the previous because it exploits 19 layers: sixteen convolutional layers and three fully connected ones;  [37]. In a nutshell, it adds to the classic convolutional network a first layer aimed to simulate computations performed by a quantum computer, by performing a quantum convolution. The first layer implements a quantum convolution, aimed to use the transformations in circuits to simulate the quantum computer behavior. Listing 2 shows the implementation of the Hybrid-QCNN, where the first layer represents a convolutional layer built to work on quantum circuits; • The Quantum Neural Network, also known as QNN, was introduced in 2018 by Farhi et al. [38], and it represents a deep learning network exploiting only quantum operation (for this reason, we refer to the QNN also as a fully quantum model).
Considering that the classification is based on the expectation of the readout qubit, the authors in [38] proposed the usage of two-qubit gates, with the readout qubit always acted upon. This is similar in some ways to running a small Unitary RNN across the pixels. The proposed QNN for malware detection exploits this approach, where each layer uses n instances of the same gate, with each of the data qubits acting on the readout qubit, by starting with a simple class that will add a layer of these gates to a circuit. Figure 5 shows an example of a circuit layer to better understand how it looks. The quantum circuit consists of a sequence of parameter-dependent unitary transformations, which act on an input quantum state. The input quantum state is an n-bit computational basis state corresponding to a sample string. The idea is to design a circuit made from two-qubit unitaries that can correctly represent the label of any Boolean function of n bits. Listing 3 shows the Python code related to the Quantum Neural Network we implemented. In particular, in Listing 3, two different layers are exploited: the first one is a Parametrized Quantum Circuit (PQC) layer, typically related to the fully quantum model, and the last one is a Dense layer used to perform the classification (as happens in a classic Convolutional Neural Network). The PQC level is considered for the training of parameterized quantum models. Given a parameterized circuit, this level initializes the parameters. We define a simple quantum circuit on a qubit. This circuit parameterizes an arbitrary rotation on the Bloch Sphere in terms of three angles, i.e., a, b, and c. The source code of the Quantum Neural Network developed by the authors is available, for research purposes, at the following link: https://github.com/vigimella/Quantum-Neural-Network (accessed on 16 November 2022).  Considering that quantum Machine Learning requires huge computational resources if compared to Convolutional Neural Networks, we reduced the image input size for the CNN models to adopt an image size closer to the Hybrid-CNN one, which was 25 × 25 × 1. Relating to the QNN, the image size we considered was 4 × 4 × 1 due to the more extended huge amount of computational resources required from the fully quantum model. Once having evaluated all the quantum and convolutional models, the obtained results are discussed to provide suggestions and insights.

Experimental Analysis
To implement the proposed comparison between quantum and convolutional deep learning models, we used Python as a programming language, and we considered two different open-source libraries, Tensorflow and Tensorflow Quantum, to develop the deep learning models. More details on Tensorflow Quantum can be found in [37]. Briefly, Tensorflow Quantum is a framework that offers high-level abstractions for the design and training of both discriminative and generative quantum networks, compatible with existing Tensor-Flow APIs, along with quantum circuit simulators. In addition, the authors demonstrated that this library can be applied to tackle advances in many fields, i.e., quantum learning tasks including meta-learning, layerwise learning, Hamiltonian learning, sampling thermal states, variational quantum eigensolvers, classification of quantum phase transitions, generative adversarial networks, and reinforcement learning.

Gradient-Weighted Class Activation Mapping
Gradient-weighted Class Activation Mapping (Grad-CAM) is a technique to provide graphical information on the parts of the input image that have influenced most of the classification output of a CNN. The output of Grad-CAM is a heat map, which can be overlayed on the input image to highlight the relevant part. The heat map is generated using the gradients of the final convolutional layers, which are the one that capture higher-level visual patterns and preserve spatial information on the input image.
The Grad-CAM we adopted [34] is a generalization of the approach proposed in [39]. Grad-CAM does not require any modification to the model architecture and provides a clear way to understand if the model has learned correctly, that is if the model is using the discriminative pattern in the input image to classify that sample. Intuitively, in an imagebased malware classification task, the deep learning model should highlight the payload (i.e., the malicious code) of the malware, which is the shared pattern with the other malware of the family. Otherwise, if the model is classifying that sample because of another part of the input image than the payload, the malware could be easily modified by cutting out that highlighted part, preserving the malicious behavior (expressed by the payload), and so generating a new malware variant, which will pass the model check as legitimate.
As a matter of fact, the application of Grad-CAM can be useful to understand in which part of the image under analysis are located the bytes that, from the model point of view, are symptomatic of the malicious payload. It can also be of interest to the security analyst for studying malware families: since samples belonging to the same family share the same payload, Grad-CAM for these samples should highlight the same area of the image with similar color intensities. It can also be useful for identifying malware variants belonging to the same family attackers develop new variants to evade signature-based detection provided by antimalware by applying code obfuscation techniques [40,41]. Therefore, the highlighting of different areas among various samples of the same family can be symptomatic of a new variant of an existing malware family.

Results
In this section are shown the results obtained from the experimental analysis. We executed the experiments using the following hardware and software characteristics: Intel Xeon Gold 6140M CPU at 2.30 GHz, 64 GB RAM, and Ubuntu 22.04.01 LTS distro as the Operating System. Different from the preliminary research work [17], in this paper, we considered more experiments with different settings, where we changed the sample dimension and learning rate regarding the models such as AlexNet, Standard-CNN, and VGG16. In addition, we also performed a comparison with other state-of-the-art methods, e.g., VGG19, EfficientNet, and MobileNet. The image dimension used during the training phase increased from 100 × 100 × 1 to 110 × 110 × 1, while the learning rate went from 0.010 to 0.001, to understand whether different values related to the image dimension and the learning rate can be helpful to obtain better performances in malware detection. The best settings given as the input to each model, such as image size, epochs number, and batch dimension, are reported in Table 1. In Table 2 are shown the experimental analysis results. The dataset used was composed of 8446 Android applications, divided into six different Android malware families: Airpush (1170), Dowgin (763), FakeInst (2129), Fusob (1275), Jisut (550), and Mecor (1499). In addition, we also dedicated a class for the Trusted (1060) family, which contains legitimate Android applications. Each application was analyzed using different free and commercial antimalware to avoid possible mistakes in dataset labeling. For this task, we exploited the VirusTotal web service (https://www.virustotal. com/ (accessed on 16 November 2022)), which aggregates more than 60 antimalware and performs an online scan. After the division of all applications into the correct class, we divided them into the training (6759) and test (1687) set, with a percentage of 80-20. In addition, we also included in the training folder another sub-folder called "VAL", where 1351 samples were stored. During the training session, the validation technique helped us to understand if the model was learning well or not. Usually, the validation was also used to prevent over-fitting. The last phase of this process was to convert all applications into an image in PNG format. The latter was possible using a Python script developed by the authors. As emerges from the settings in Table 1, we used the smallest images for the Hybrid-QCNN and QNN models to perform the experiments. To make a better comparison of the different models, we also executed Standard-CNN and EfficientNet using an image with a 25 × 25 × 1 dimension. Although one of our intents was to compare also the fully quantum model, this did not happen because of the smallest image dimension. The convolutional models and the hybrid model cannot execute experiments using an image dimension such as 4 × 4 × 1. The models were evaluated using several metrics such as the loss, accuracy, precision, recall, F-measure, and Area Under the Curve (AUC). Table 3 shows the experimental results on the test set concerning the output classes (the malware families and the Trusted category). In the case of multi-class classification, where there is no binary choice between positive and negative classes, the analysis targets one class at a time: the true positive is the correct class (the cell cross between the predicted and true label in the confusion matrix representation); the false positive is the sum of target class samples misclassified to another class (the Y-axis sum); the false negative is the sum of target class samples of other classes. Table 3 displays the experimental findings for the output classes on the test set (the malware families and the Trusted category).  Table 3. Experimental results obtained with the convolutional and the quantum model obtaining the best performances for the family classification task. In Table 2, we report the best results obtained from several experiments. From it, we can see that the best model is the Standard-CNN, with a precision equal to 0.972 and a recall value of 0.970. In addition, also good performances were obtained using the MobileNet, VGG16, and VGG19 models. The Hybrid-QCNN model exhibited a precision equal to 0.905 and a recall of 0.903, while the fully quantum model, labeled as QNN, obtained lower results, such as a precision value of 0.623 and a recall value of 0.103. Due to the limited computing resources, to perform a better comparison between the fully convolutional and hybrid models, we also executed experiments using Standard-CNN and EfficientNet with a size of 25 × 25 × 1. From Table 2, we can see, albeit by a little, that the Standard-CNN model performed better than the Hybrid model. We can also say that the results obtained from the Hybrid models were far superior than the EfficientNet results. In addition, it is worth noting also the interesting results obtained after the usage of the AlexNet model. Although it usually does not give back a high score for the metrics, in this case, we achieved a precision value equal to 0.920 and a recall value equal to 0.983. We, respectively, report in Figures 6 and 7 the confusion matrix of the Standard-CNN and Hybrid-QCNN models. The images help to perform a more accurate direct comparison of the best deep learning model (in this work) and the quantum model.

Output Classes Models Accuracy Precision Recall F-Measure AUC
In Figure 6, we report the confusion matrix obtained from the Standard-CNN model. In that, we can see that the Deep Learning model produced a small number of misclassifications. The higher number of items classified incorrectly was for the Airpush family, with 11 samples classified as Trusted. The confusion matrix obtained after the execution of the Hybrid-QCNN model is shown in Figure 7.
The Hybrid-QCNN model had a significantly larger number of misclassified samples than the Standard-CNN model. Similar to what happened with the Standard-CNN model, the majority of misclassified samples belonged to the Airpush and Dowgwin malware families and were designated as Trusted by the Hybrid-QCNN model. A possible reason for this result is that, by reducing both sets of images, the Airpush and Dowgin malware looks similar to the Trusted applications. As we can see from the confusion matrix shown in Figure 2, the number of Airpush and Dowgwin malware applications labeled as Trusted is, respectively, 24 and 25 samples.

Discussion
Unfortunately, due to hardware limitations, we were not able to experiment with the quantum models (i.e., Hybrid-QCNN and the QNN) using images with dimensions bigger than 25 × 1. Anyway, when we trained the quantum and convolutional models by using the same image dimensions, we can note that the quantum models can obtain better performances. This result is highlighted in Figure 2, in particular by observing the results obtained from the EfficientNet(25 × 3) and Hybrid-QCNN networks, with an accuracy, respectively, equal to 0.251 (for the EfficientNet model) and 0.905 (for the Hybrid-QCNN one). The Standard-CNN model, with images of a size of 25 × 3, obtained an accuracy equal to 0.915, which is slightly higher if compared with the one obtained from the Hybrid-QCNN ones, but the Standard-CNN model was trained with 64 as the dimension of the batch, while the Hybrid-QCNN with a batch size equal to 32. For these reasons, we observed that quantum Machine Learning can be promising in the malware detection task.
After the execution of each experiment, we decided to apply the Grad-CAM algorithm to the best convolutional and quantum models, i.e., the Standard-CNN and Hybrid-QCNN ones. Using this algorithm, it is simple to understand which areas of one image are most influential at the end of the discrimination. Below, we report some examples of a single image that included the PNG image of the file, the activation map, and the image generated by overlaying the initial PNG image with its activation map. The activation map consisted of three unique colors: yellow, green, and blue. The regions colored yellow symbolize the most interesting area, while green designates the sections in the center. Finally, to highlight a section of the image that is unrelated to the model, we used the blue shadow.
In Figure 8, we can see the detection of a sample belonging to the Airpush family with a high precision of 100%. As is possible to see, the most significant area is at the top of the image. In fact, there, the pixels are overlayed with the yellow shadow, while other parts of the image do not appear to be very important to classify the malware. As is described in Section 4, the models can make mistakes in terms of identification. Figure 9 show the wrong classification of malware belonging to the Airpush family recognized as a member of the Downgin family with an accuracy slightly less than 100% (99.9%).

Figure 8.
Sample classified as belonging to the Airpush family (4a985c341e1ee08f647395d00640c1f2) with 100%, where we can note that the yellow areas are on the upper side of the image, obtained with the model built using the Standard-CNN network. In Figure 10, we report an example of the classification of a Dowgin malicious sample with a precision value equal to 100%. In that picture, it is possible to understand why in eighteen cases of the images related to this family were classified as Airpush. In fact, as is possible to see in Figures 8 and 10, pixels at the top of the images are highlighted with the yellow shadow. Instead, in Figure 11, the models are classified as Trusted dangerous applications belonging to the Dowgin family. In conclusion, we can affirm that, as it did not achieve higher results, using the Hybrid-QCNN makes more mistakes during the classification than the Standard-CNN model. In addition, thanks to the usage of the Grad-CAM algorithm, even inexperienced people can understand where the model has confusion.
Assuming that malware belonging to the same family shares the malicious payload (i.e., the dangerous action), the idea at the bottom of Grad-CAM is to highlight similar areas in the images. Differently, samples belonging to different families should exhibit different areas highlighted by Grad-CAM. The rationale is to find regions of the images obtained from malware symptomatic of certain malicious behaviors: in this way, the security analyst can focus his/her manual effort on a reduced section of the application under analysis. We are aware that images obtained from application source code can be more informative from this point of view; as a matter of fact, in this case, the areas highlighted will be related to code snippets and, for this reason, more understandable from the security analysis. Figure 11. The wrong classification of sample Dowgin (6afbe9d4c41fd46a3c516a9203dc4393) misclassified as Trusted with 100%, where we can identify really small yellow areas are in the middle part of the image, obtained with the model built using the Hybrid-QCNN network.

Related Work
In this section, we provide the current state-of-art use of deep learning to detect mobile malware and the adoption of quantum Machine Learning for general tasks.
The use of quantum computing is significant in several fields, such as medicine. In fact, after the global pandemic in 2020, the usage of deep learning helped during the evaluation of the thoracic CT exam. To improve that performance, Amin et al. in [42] used QML. The authors concluded that, through the use of quantum algorithms, the performances in spotting the infection in its early stages increased.
The authors in [25] implemented Support Vector Machine (SVM) in a quantum computer with the complexity logarithmic in the size of the vectors and the number of training examples. In circumstances when classical sampling algorithms need polynomial time, an exponential speedup is obtained. At the core of this quantum network, the big data algorithm is a non-sparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix. In contrast to them, we used a Hybrid QCNN model with a high level of accuracy, i.e., 0.905.
The researchers in [26] demonstrated that, by using quantum computing, the time required to train a deep restricted Boltzmann machine was reduced and the learning results were better than classical computing. As a result, the optimization of the underlying objective function improved significantly. In addition, quantum computer techniques have been demonstrated to efficiently handle some problems that are intractable on ordinary, classical computers. Furthermore, the quantum method demonstrated may efficiently train entire Boltzmann machines and multilayer, fully connected models for which there are no efficient classical counterparts.
In [43], Seymour and his colleague created a minimalist malware classifier using cross-validation. The accuracy obtained was comparable with the standard Machine Learning algorithms.
The authors in [27] showed that quantum Machine Learning algorithms exponentially increased their speed, opposite the classical models, performing their tasks in logarithmic time both with the number of vectors and their dimension. This study was conducted on supervised and unsupervised quantum Machine Learning algorithms for cluster assignment.
In [44], the researchers proposed a mix of software used to discover the most common hashes and n-grams between benign and malicious software and a quantum search method. The latter will be utilized to find the desired hash value by searching through every permutation of the entangled key and value pairs. This eliminates the need to recompute hashes for a set of n-grams.
The authors in [45] proposed a federated learning approach aimed to detect malware in IoT devices, by modeling the network traffic of several real IoT devices affected by malware. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) were exploited by the authors.
The researchers in [46] compared the performance obtained from 26 state-of-the-art pre-trained Convolutional Neural Network models in Android malware detection, by also including the performance obtained by large-scale learning with the SVM and RF classifiers. The model obtaining the best performance was the EfficientNet-B4 one.
The authors in [47] proposed a hybrid multi-level deep learning model, formed by unsupervised and supervised DL models, to encode X86 malware binary files and classify them. It uses similarity matrices to detect relevant similarity patterns between input samples.
Several studies tried to propose methods for malicious malware family detection. More specifically, the authors in [48] presented a solution that is based on the BIRCH clustering algorithm. After extracting static and dynamic analysis features from the malware samples, they constructed the necessary vectors for the clustering algorithm and grouped the samples into families based on this information. Another study that used the call graph clustering was proposed in [49].
In [50], the authors extracted n-gram features from the binary code of the samples. These features were selected using the Sequential Floating Forward Selection (SFFS) method with three classifiers, and the accuracy of the method reached 96.64%. However, their methodology was different from the one presented in this paper, and they did not provide results specifically referring to ransomware.
In [51], the authors made use of the PrefixSpan algorithm to create groups of malware families, based on sequence patterns. However, these patterns are related only to network traffic packets and not to the overall behavior of the malware. In [6] also, malware samples were assigned to a specific family by exploiting model checking. In [52], ARIGUMA was proposed, a malware analysis system for family classification. The authors considered the number of functions being called in a function and the number of functions that call a specific function. Furthermore, the local variables, the arguments, and the instructions were taken into account. Even though this method can also detect obfuscated APIs, the classification accuracy was only 61.6%. In [53], the authors presented a malware family classification system that was based on the instruction sequence.
The researchers in [54] adopted supervised classification algorithms for ransomware family identification. Moreover, they considered the binary trees generated by these algorithms to infer the phylogenetic relationships.
As emerged from the current state-of-the-art analysis, this paper represents the first attempt to introduce quantum Machine Learning in the malware detection research field. Moreover, this is the first paper devoted to introducing an explanation behind the decision made by a quantum Machine Learning classifier.

Conclusions and Future Works
In the last few years, we have been witnessing an increase in mobile malware, with particular regard to malicious payloads targeting the Android platform, the most diffused one. Considering the inefficiency of the currently adopted signature-based approach in the detection of zero-day malicious behaviors, the research community, from both the academic and industrial sides, is focused on proposing new techniques to mitigate malware, mainly exploiting deep learning. In this last context, deep learning recently has introduced quantum computing to train models using the integration of quantum algorithms within Machine Learning algorithms. For these reasons, we proposed to apply quantum Machine Learning models for Android malware detection. We considered a comparison between five state-of-the-art deep learning models, i.e., AlexNet, MobileNet, EfficientNet, VGG16, VGG19, and a deep learning network developed by the authors, which we called Standard-CNN, and two quantum models, the first one being a hybrid quantum model and the second one a fully quantum model: all these (convolutional and quantum) models were trained to perform the malware detection in the Android environment. The input for the models was a set of images obtained from the conversion of Android applications into grayscale images. In the experimental analysis, 7386 applications belonging to six different malicious families and 1060 legitimate applications were considered. As a result of the experiments, we obtained that the architecture obtaining the best performance was the Standard-CNN model, with an accuracy equal to 0.970. Relating to the quantum models, the best model in terms of accuracy was the Hybrid-QCNN model, with an accuracy of 0.905. In future work, more complex quantum models (in terms of PQC layers) will be considered. Furthermore, considering that the proposed method is not operating as system-dependent (as a matter of fact, the images are generated directly from the application bytes), we will evaluate whether the proposed approach is working in ransomware detection. Additionally, to provide more explainability, different images will be proposed, for instance an interesting improvement of the proposed method is represented by the adoption of an image obtained from the application code; in this case, Grad-CAM will be able to highlight areas that are immediately traceable to code snippets and, presumably, the malicious payload.