MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection

Makkawy, Saleh J.; De Lucia, Michael J.; Barner, Kenneth E.

doi:10.3390/jcp5040109

Open AccessArticle

MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection

by

Saleh J. Makkawy

^1,2,*

,

Michael J. De Lucia

¹ and

Kenneth E. Barner

¹

Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19711, USA

²

Department of Computer Science, Umm Al Qura University, Mecca 21955, Saudi Arabia

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2025, 5(4), 109; https://doi.org/10.3390/jcp5040109

Submission received: 1 October 2025 / Revised: 20 November 2025 / Accepted: 29 November 2025 / Published: 4 December 2025

(This article belongs to the Section Security Engineering & Applications)

Download

Browse Figures

Versions Notes

Abstract

As technology advances, developers continually create innovative solutions to enhance smartphone security. However, the rapid spread of Android malware poses significant threats to devices and sensitive data. The Android Operating System (OS)’s open-source nature and Software Development Kit (SDK) availability mainly contribute to this alarming growth. Conventional malware detection methods, such as signature-based, static, and dynamic analysis, face challenges in detecting obfuscated techniques, including encryption, packing, and compression, in malware. Although developers have created several visualization techniques for malware detection using deep learning (DL), they often fail to accurately identify the critical malicious features of malware. This research introduces MalVis, a unified visualization framework that integrates entropy and N-gram analysis to emphasize meaningful structural and anomalous operational patterns within the malware bytecode. By addressing significant limitations of existing visualization methods, such as insufficient feature representation, limited interpretability, small dataset sizes, and restricted data access, MalVis delivers enhanced detection capabilities, particularly for obfuscated and previously unseen (zero-day) malware. The framework leverages the MalVis dataset introduced in this work, a publicly available large-scale dataset comprising more than 1.3 million visual representations in nine malware classes and one benign class. A comprehensive comparative evaluation was performed against existing state-of-the-art visualization techniques using leading convolutional neural network (CNN) architectures, MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3. To further boost classification performance and mitigate overfitting, the outputs of these models were combined using eight distinct ensemble strategies. To address the issue of imbalanced class distribution in the multiclass dataset, we employed an undersampling technique to ensure balanced learning across all types of malware. MalVis achieved superior results, with 95% accuracy, 90% F1-score, 92% precision, 89% recall, 87% Matthews Correlation Coefficient (MCC), and 98% Receiver Operating Characteristic Area Under Curve (ROC-AUC). These findings highlight the effectiveness of MalVis in providing interpretable and accurate representation features for malware detection and classification, making it valuable for research and real-world security applications.

Keywords:

Android; malware; entropy; static analysis; dynamic analysis; behavior analysis; bytecode; convolutional neural network; ensemble models; undersampling

1. Introduction

Smartphones are proliferating, with projections indicating that they will exceed 7 billion by 2025, and nearly 70% using the Android operating system [1,2]. Due to their compact designs, these mobile devices have become indispensable, facilitating tasks such as email management, banking transactions, and storing sensitive health information. However, the widespread adoption of smartphones has also drawn the attention of hackers [3], exacerbated by the open-source nature of Android’s OS and its SDK. This vulnerability has facilitated the way for various forms of malware, including viruses [4,5], worms [6], adware [7,8], spyware [9], ransomware [10], rootkits [11], trojans [12], keyloggers [13], botnets [14], and mobileware [15]. Consequently, developing a robust defense system capable of identifying and mitigating this wide range of threats is crucial. While traditional detection methods discussed in Section 2 like signature-based [16,17], dynamic analysis [18], and static analysis [19] remain dominant, they often struggle with modern evasion techniques such as code obfuscation, encryption, polymorphism, and packing. As a result, there has been a growing interest in using advanced Deep Learning (DL) techniques to analyze malware and detect these suspicious behaviors. Several research works have explored the transformation of binary code or bytecode into image representations to leverage the capabilities of Deep Neural Network (DNN) for malware detection. However, these approaches often fail to capture semantic context, structural anomalies, and obfuscation pattern features that are critical for accurate and robust classification.

Designing effective detection systems, particularly those utilizing visual representations, requires a thorough understanding of the structural composition of Android applications. The following section provides an overview of the Android Package Kit (APK) file structure, focusing on its core components relevant to malware analysis.

1.1. Overview of the Android APK File Structure

The APK is a compressed file that the Android OS uses to distribute and install applications, consisting of core files and folders such as the application bytecode, assets, resources, and a manifest file, as presented in Figure 1.

Research on visualization-based malware detection often centers on two primary components of APKs AndroidManifest.xml and Classes.dex. Several researchers have analyzed the AndroidManifest.xml file to extract essential information about the application, such as its services, activities, package names, and permissions [20,21]. While others focus on analyzing the Classes.dex file, which contains the executable bytecode intended for the Android Dalvik Virtual Machine (DVM), making it a vital source for behavioral analysis [22,23]. This research focuses on encoding and analyzing the Classes.dex file, given its importance in capturing malicious behavior. Our proposed approach utilizes CNN models to detect hidden Android malware threats by transforming bytecode into visual representations, with a focus on highlighting anomalous operational or structural patterns indicative of obfuscation or malicious intent.

1.2. Contributions

Our novel Android malware visualization framework uniquely integrates critical semantic and structural features extracted from executable bytecode, transforming them into RGB representations. Unlike previous techniques, MalVis enhances interpretability and classification accuracy while maintaining resilience against obfuscation. Our contributions include the following:

MalVis Dataset: Introducing MalVis, the largest Android malware visualization dataset with over 1.3 million images across ten classes, including nine malware types and benign software that is accessible to the research community at (www.mal-vis.org, accessed on 2 October 2025). Scripts for generating these various visualization methods are publicly available on GitHub at the link (https://github.com/makkawysaleh/MalVis, accessed on 2 October 2025).
Enhanced Visualization Framework: Developing an advanced MalVis framework that enhances malware visualization by incorporating an entropy encoder with an N-gram technique. This approach utilizes the three RGB channels to effectively capture a broader range of malware characteristics, including encryption, compression, packing, and structural irregularities. This improves the precision of malware pattern detection in the visualizations.
Enhanced Multiclass Labeling: Implementing an improved multiclass labeling approach using results from Euphony [24] and VirusTotal [25] allows precise classification and analysis of malware behavior, enhancing targeted threat identification and classification.
Robust Detection Model: Evaluation of the performance of the MalVis framework on several state-of-the-art visualization methods using advanced deep CNN architectures such as MobileNet-V2, DenseNet201, ResNet50, VGG16, and Inception-V3, combined with several ensemble techniques, to further improve detection accuracy and generalization. The results showed that the MalVis framework achieved superior performance compared to others.
Improved Framework Explainability and Transparency: We employ two distinct heatmap and attention mechanisms, GradCAM and GradCAM++, to ensure that the MalVis Framework effectively utilizes the introduced malicious features detected by the application of our entropy and N-gram encoders to the malware representations. Additionally, identifying prevalent malicious patterns specific to each malware class in the malware representations.

This research builds on our previous work on improving Android malware detection using bytecode-to-image encoding frameworks [26] to detect anomalous structural and malicious features in Android malware. Although traditional detection techniques such as signature-based, static, and dynamic analysis remain prevalent, visualization-based approaches have gained popularity due to their speed and ability to highlight malicious patterns. However, existing methods often rely on simplistic byte-to-color mappings based on byte location in the file, which overlook semantic features and abnormal structure traits of malware. Additionally, they struggle against obfuscation, encryption, and packing. MalVis offers a richer and more interpretable representation that enhances the classification’s robustness and addresses the limitations of existing methods. This paper is organized as follows. Section 2 reviews traditional detection methods and their limitations, discusses the motivation behind the visual-based detection, limitations of existing malware image-based datasets, and prior grayscale and RGB visualization techniques. Section 3 details our proposed MalVis framework, including the data collection and label generation process, bytecode-to-image transformation, an in-depth analysis of entropy and N-gram features through two distinct visualization approaches, and discusses the impact of obfuscation methods on the MalVis representations. Section 4 defines the performance evaluation metrics used to assess the model’s effectiveness. In Section 5, we present experimental results across binary and multiclass classification tasks, demonstrate performance improvements through ensemble modeling, illustrate the effect of undersampling in addressing imbalance, enhance model explainability using GradCAM and GradCAM++, analyze key visualization findings, and demonstrate the visualization key findings. Finally, Section 7 concludes the paper and outlines future research directions.

2. Related Works

This section reviews state-of-the-art traditional detection methods and their limitations, highlighting the motivation for using visual representations in malware analysis. Then, it explores Android malware visualization datasets that serve as benchmarks for image-based malware detection. Finally, it explores recent visualization-based detection techniques, focusing on grayscale and RGB encoding methods for identifying malicious patterns.

2.1. Signature-Based Analysis

The signature-based detection method is widely used to recognize and detect malware. A software signature is a unique identifier that cannot be replicated and is typically generated using hash algorithms such as RSA, MD5, SHA1, SHA-256, and SHA-512 [16,27]. The detection engine generates a signature for the software and compares it with a database of known malicious signatures stored locally or remotely in the vendor’s cloud. Typically, these databases are proprietary assets of vendors, with restricted access granted to licensed users. A significant limitation of this method is the need for the detection engine to continuously refresh its signature database, which can lead to potential gaps in identifying new zero-day malware [28]. As technology evolves, malware creators continue to find new techniques to evade detection, such as code alteration, function modification, file repacking, data encoding, or null byte injection, all to generate new signatures capable of evading security defenses [17,29,30,31,32].

2.2. Dynamic Analysis

Dynamic analysis is a pivotal method for malware detection, which involves observing and understanding software behavior during execution within a controlled and contained environment, such as a Sandbox or Virtual Machine (VM). This technique is effective in detecting abnormal actions, such as invoking suspicious system calls [33], examining network traffic [34], altering memory [35], and detecting errors in Logcat that invoke suspicious services from the OS [36]. However, this method requires accessing or monitoring users’ sensitive information, which can be impractical when managing highly confidential data [37]. Despite its promising outcomes, acquiring an extensive dataset of labeled training data for optimal performance is often both time-consuming and costly [38]. Security researchers are increasingly shifting to the visualization of malware based on static analysis images, which allows instant scanning of malware images to overcome the challenges posed by new malware [21,22,39]. Unlike dynamic analysis methods that require days or weeks to monitor suspicious behavior in an application, these image representations can be harmless, do not require manual feature engineering, and resist typical obfuscation techniques employed by adversaries [40].

2.3. Static Analysis

Static analysis is a technique used to evaluate applications without executing them or observing their execution behavior, which can often be demanding and time-consuming. By not requiring execution, static analysis provides a unique assessment mode comparable to behavioral analysis techniques. One of the primary advantages of this malware detection method is its cost-effectiveness, as it minimizes the need for additional hardware or extensive computational resources beyond the actual analysis tool itself [19]. Despite its advantages, this approach has notable limitations. Specifically, its effectiveness largely depends on identifying already known malware patterns, which challenges its ability to generalize and detect evolving zero-day malware. Research efforts focus on improving the detection of suspicious activities using advanced methodologies, such as machine learning (ML) and convolutional neural networks (CNNs) [26,41], to mitigate this limitation. These innovations aim to enhance the adaptability and robustness of static analysis in response to evolving threats.

2.4. Motivation

Deep Neural Networks, especially CNNs, have shown exceptional performance across domains such as vision, biomedical [42], and cybersecurity [42,43,44,45], primarily due to the availability of large, structured datasets. In malware detection, transforming code into image representations allows CNNs to identify visual patterns of malicious behavior, offering a scalable, non-executable, and efficient alternative to traditional analysis methods. These representations are significantly smaller than the raw executable files, reducing storage needs and execution risks, with the (Figure 2) presenting the reduction ratio in percentage. Our approach further benefits from transfer learning by leveraging pretrained CNNs trained on massive image datasets such as ImageNet, enabling effective pattern recognition in malware with minimal domain-specific training. Despite these advantages, progress is hindered by limited access to large-scale, interpretable, and public malware visualization datasets. Addressing this gap is essential for advancing robust, explainable, and reproducible malware detection research.

2.5. Existing Malware Image Datasets

We highlight two widely known Android malware datasets: AndroZoo [46] and Drebin [47], both commonly used in Android malware detection research. However, because our MalVis approach focuses on transforming bytecode into images for visualization, we primarily compare MalVis with other existing image-based datasets in this section. Scott Freitas et al. [48] introduced the MalNet database, a substantial contribution to the field, comprising over 1.2 million malware images spanning 47 types and 696 families. While their direct byte-to-location color mapping method is innovative within the Android application structure, our dataset, MalVis, offers enhancements in the form of over 1.3 million images, with a particular focus on addressing malware obfuscation techniques. This focus improves the effectiveness of Android malware detection, as discussed in more detail by Makkawy et al. [26]. Virus-MNIST, proposed by David A. et al. [4], is a large publicly available malware image dataset. The dataset includes 51,880 grayscale images of malware, classified into nine virus classes and one benign class, all formatted as 32 × 32 images. The dataset represents the malware classification problem, like the famous MNIST dataset used for handwriting recognition. Malware images are generated by converting the first 1024 bytes of Portable Executable (PE) files into 32 × 32 grayscale images. Although Virus-MNIST introduces a significant step towards standardizing malware image datasets, its representation of malware using only the first 1024 bytes may result in a lack of capturing the complete characteristics of the malware [49]. L. Nataraj et al. [50] present the MalImg dataset, which offers a straightforward and efficient malware visualization and classification method. Using image processing techniques, this approach classifies malware samples based on their similarity to specific malware types, utilizing standard image features. MalImg achieves a notable classification accuracy of 98% on a dataset comprising 9458 samples across 25 distinct malware types. However, with this limited dataset size, there is a possibility that the model is overfitting to the specific characteristics of these samples. Table 1 summarizes public and private image-based malware datasets, including MalVis, MalNet, and Virus-MNIST, providing details on the number of classes and dataset size. Despite existing visualization contributions, these methods face some limitations. As observed by Kunwar et al. [51], MalNet encodes malware bytecode based on the byte location in the executable file, lacks resilience to obfuscation, and fails to identify suspicious behaviors. Virus-MNIST [4] uses only the first 1024 bytes of PE files, limiting its representational scope. In contrast, MalVis combines entropy and N-gram analysis to generate color-encoded RGB representations that emphasize abnormal structures, encryption, packing, and compression behaviors. Our experiments show that this richer visual encoding enhances model interpretability and improves classification performance. A detailed discussion of the MalVis dataset and its visualization approach is presented in Section 3.

2.6. Visualization Strategies for Malware Detection

As image-based malware detection has become a powerful paradigm for analyzing Android applications, it bypasses manual feature engineering through automated visual pattern recognition. Current approaches primarily focus on two types of representations:

2.6.1. Grayscale Image Encoding

The foundational work by Nataraj et al. [50] established grayscale conversion by mapping binary bytes to pixel values in the range (0–255), revealing structural patterns in malware families. Modern implementations include DexRay by Nadia Daoudi et al. [56], which converts DEX bytecode into 1D grayscale vectors (1 × 128 × 128) for CNN classification, achieving a 96% F1-score while resisting obfuscation. Despite its highest accuracy, their approach resulted in smaller-size grayscale images that could be affected by more data loss in the representations. In another instance, Wang et al. [57] developed a novel scheme that combines static and dynamic analysis with CNN for efficient malware detection and classification. Their method integrates a Convolutional Block Attention Module (CBAM) with CNN to detect malware similarities using grayscale images from the MalImg and Microsoft datasets. However, their experiment was conducted on a relatively small dataset of approximately 20,000 samples, covering 25 types of malware, which could be susceptible to model overfitting.

2.6.2. RGB Image Encoding

Advanced malware variants often exhibit more sophisticated patterns and behaviors that are difficult to capture using standard grayscale or single-channel representations. Additional color channels are required to encode these complex characteristics, enabling richer feature representation and enhancing the model’s ability to detect subtle malicious traits. Asim et al. [58] introduced a technique that transforms APK files into lightweight RGB images utilizing a predefined dictionary and an intelligent mapping mechanism. Their method converts the AndroidManifest.xml permissions into ASCII values, which are then aggregated and encoded into a single color value. While this approach facilitates the image-based representation of APK features in RGB channels, it suffers from significant information loss due to the summation of ASCII values, which flattens each permission into a single numerical value. This reduction hinders the model’s ability to capture detailed permission information, thus limiting its effectiveness in capturing nuanced malicious behaviors. Progress in this field is hindered by the scarcity of publicly available visualized malware datasets [59] and the need for robust methodologies to capture malware patterns and behaviors [48] effectively. The MalVis dataset aims to tackle these challenges by providing comprehensive representations of malware, which convert abnormal operational and structural patterns in bytecode into visual forms. Additionally, it incorporates multiclass labels for precise malware classification and analysis, thereby enhancing targeted threat identification.

3. Methodology

This section presents the methodology of the MalVis framework, as illustrated in Figure 3. The framework operates through several interconnected stages that transform Android applications into visual representations for malware detection. First, data collection and label generation involve gathering malware and benign samples from multiple repositories, resulting in a collection of 1,300,822 APK files. Second, feature extraction utilizes reverse engineering tools to decompile APK files and extract the executable bytecode classes.dex for visualization purposes. Third, bytecode is transformed into RGB channel images using two novel MalVis encoding schemes (Classbyte and N-gram). We further analyze the impact of entropy and N-gram encoding on visualization quality, examining how these techniques capture malware obfuscation methods and structural anomalies. Finally, we describe the CNN architectures, training configurations, and experimental environment setup used for the classification task. The following subsections provide detailed explanations of each pipeline stage.

3.1. Data Collection and Label Generation

The MalVis generation process utilizes a subset of the AndroZoo dataset [46], as shown in Figure 3➀, a key resource in Android research that encompasses 24,743,375 applications collected from platforms such as the Google Play Store. Our model training uses both a binary classification dataset and the MalVis multiclass classification dataset. The binary dataset comprises 49,150 malware samples and 135,324 benign samples, as presented in our earlier research [26]. It was primarily utilized to evaluate all proposed visualization approaches on a dataset of manageable size and to determine the optimal visualization method for Android bytecode. The multiclass malware dataset utilizes Euphony [24] to categorize malware into 289 distinct categories. For training purposes, we focus on the nine largest categories to enhance labeling accuracy and minimize false positives by excluding samples with multiple labels. Furthermore, we cross-verify these samples with VirusTotal [25] to ensure their reliability. The refined dataset, illustrated in Figure 3➁, includes 1,300,822 samples, comprising nine types of malware and an additional 135,324 benign samples sourced from AndroZoo. Figure 4 displays the distribution and application visualizations of the nine malware types alongside the benign class.

3.2. MalVis Bytecode-to-Image Visualization

The MalVis bytecode-to-image visualization process begins by extracting Dalvik Executable (DEX) files from Android APKs using AndroGuard [60], a well-known reverse engineering tool, presented in Figure 3➂. This step yields the classes.dexfiles, as illustrated in Figure 3➃. These .dex files consist of byte values in the range of 0x00 to 0xFF. These values are first converted into a one-dimensional array of unsigned integers, where each value is represented by a number between 0 and 255. These integers correspond directly to the pixel color intensities. The 1D array is reshaped into a two-dimensional grayscale image with a fixed width and height of 256 pixels to visualize the bytecode. This transformation employs Nearest-Neighbor Interpolation (NNI) and the Pillow library in Python, ensuring consistent image dimensions while preserving the original byte sequence structure. Shannon entropy is applied to the executable dex file using a 32-byte sliding window to determine the red and blue channels as illustrated by Figure 3➄. These channels are defined by distinct formulas motivated by [61], as described in our earlier paper [26]. Each formula utilizes Shannon entropy differently to highlight regions of considerable randomness, which may indicate encryption or obfuscation. Note

H (X) = - \sum_{i = 1}^{N} P (x_{i}) {log}_{2} (P (x_{i})),

(1)

where

H (X)

represents the Shannon entropy of the random variable X, which measures the uncertainty or randomness in the 32-byte sequence,

P (x_{i})

is the probability of observing the specific

i^{t h}

outcome or byte value

x_{i}

, N denotes the total number of unique outcomes for the random variable X (for a single byte,

N = 256

),

x_{i}

refers to a specific byte value in the range

{0, 1, 2, \dots, 255}

, and

{log}_{2}

is the logarithm to base two commonly used in entropy calculations. Given the varied types of malware introduced by the MalVis dataset, we have explored techniques to improve the recognition of these variations. Our analysis focuses on extending our earlier approach [26] from two color channels (red and blue) to three channels by encoding an additional feature into the green channel of RGB images using two primary encoding methods:

Classbyte Encoding:We adopt the Classbyte encoder introduced by Duc-Ly et al. [21], as shown by Figure 3➅, which maps semantic features of bytecode to varying intensities of the green channel. We selected this method due to its effectiveness and comparable performance to our previously employed entropy-based encoding for binary classification tasks.
N-gram Encoding: We incorporate N-gram representations, as illustrated by Figure 3➅, derived from byte sequences to capture the malware bytecode’s underlying structural patterns and contextual dependencies. This technique, commonly used in malware detection research [62,63], enriches the green channel with statistical features that reflect code regularities and anomalies, thereby enhancing the capability to distinguish between different types of malware.

The following subsections discuss the implications of these methods for advancing malware visualization.

3.2.1. Approach Classbyte

This approach uses the Classbyte representation, which performs similarly to the entropy encoder in binary classification. It translates the features identified by the four Classbyte colors into four distinct shades of green in the green channel, as illustrated in Figure 5➀. The method highlights sections of bytecode containing both clear-text printable and non-printable ASCII characters and null byte areas, as illustrated in Figure 5➁. These distinctions assist in analyzing the bytecode to determine whether it has been encrypted or injected with null bytes to evade malware detection. The previously generated red and blue channels are combined with the newly constructed green channel, resulting in MalVis (Classbyte encoded) RGB images, as shown in Figure 5➂. Unfortunately, this approach did not yield the desired improvement in the accuracy of multiclass classification. Further results of the analysis and evaluation of this approach are presented in Section 5.

3.2.2. Approach N-Gram

This approach utilizes the N-gram method, which has been extensively studied for malware anomaly detection. The approach is particularly relevant for Android applications, which are often written in Java and Kotlin, thus inheriting the programmatic structure. Abnormalities are detected when the byte sequences differ from the typical bytecode structure using the green channel depicted in Figure 6➁. One of the key goals of this approach is to bridge the gap between raw visualization and interpretability. Unlike prior methods that map byte values to color values without semantic linkage, our framework encodes interpretable attributes: entropy highlights encrypted or compressed code regions. At the same time, N-gram transitions emphasize structural irregularities in bytecode. This mapping allows security analysts and researchers to visually associate distinct color patterns with specific malware behaviors, such as repacking functions or obfuscation.

Figure 7, Figure 8 and Figure 9 further clarify this concept. For example, the script in Figure 7 below depicts a simple Java code for a ‘for’ loop before it is compiled into bytecode. The bytecode often reveals specific patterns that represent the underlying syntax and structure of the program. The keyword ‘for’ indicates the presence of a loop followed by an initialization statement, a condition, and an increment surrounded by braces

(\dots)

, while curly braces

{\dots}

denote the loop’s body.

Running “javac For_Loop.java” compiles the code into a bytecode file named “For_Loop.class”, which the DVM uses to execute the program, as demonstrated in Figure 8.

We implemented the Bi-gram method on the raw bytes using a two-byte window to recognize anomalies in the bytecode’s operational structure. Using Bi-gram on the bytecode represents the transition between instructions as listed in Figure 9.

The Bi-gram method identifies obfuscated code by detecting irregular two-byte patterns within the byte sequence. While larger N-gram windows can capture more complex structural dependencies, the associated feature space increases exponentially with window size. For instance, the use of 2-byte (Bi-gram) configuration yields

256^{2} = 65,536

possible patterns, a 3-byte (Tri-gram)

256^{3} \approx 16.7

million, and a 4-byte configuration

256^{4} \approx 4.3

billion. Although higher-order N-grams may reveal more intricate obfuscation techniques, they introduce substantial sparsity and computational overhead. Training several CNN models on this visualization method with the Bi-gram encoding features required two weeks of computation. Ultimately, the two-byte configuration was chosen as an optimal balance between representational depth and computational efficiency. Future research could investigate the effects of utilizing larger N-gram windows.

The Bi-gram formula is

Bi-gram value = b_{1} \times 2^{8} + b_{2},

(2)

which takes two consecutive bytes,

b_{1}

and

b_{2}

, to compute the Bi-gram value. The multiplication of the first byte

b_{1}

by

2^{8} = 256

shifts it to higher-order in the combined value, which is then added to the

b_{2}

value, as shown in line 16 of the Algorithm 1. The resulting Bi-gram value represents the degree level of a green pixel and is normalized to the range

[0, 1]

by dividing by the maximum possible value of

(256 \times 256) - 1 = 65,535

as described by

g = \frac{Bi-gram value}{(256 \times 256) - 1} = \frac{b_{1} \times 256 + b_{2}}{65,535} .

(3)

Finally, if the byte is the last in the file, it is reset to 0, as detailed in lines 18 and 19 of the Algorithm 1. Hence, MalVis presents a conceptually innovative visualization design that maps meaningful malware properties to distinct visual domains. Consequently, MalVis is more effective and better aligned with the objectives of explainable malware classification. This approach has demonstrated improved accuracy in the context of multiclass MalVis, and both representation techniques are evaluated in Section 5.

3.3. The Impact of Entropy and N-Gram on MalVis Representations Experiments

In this section, we further investigate the sensitivity and interpretability of MalVis visualizations. We conducted a controlled analysis by applying targeted transformations to a benign Android application, specifically WhatsApp Classes.dex. The goal was to investigate and quantify the impact of encryption and unstructured operations in bytecode changes on the RGB image representations generated by our framework.

Algorithm 1 MalVis Visualization Algorithm: generate RGB from bytecode using entropy and N-gram
1: Input: Data array $data$ of bytecode, symbol map $symbol_map$ , index x
2: Output: RGB values in the range [0, 255]
3: $e \leftarrow Entropy (data, 32, x, len (symbol_map))$	▹Calculate entropy using a window size of 32 bytes
4: function curve(v)
5: $f \leftarrow {(4 v - 4 v^{2})}^{4}$
6: $f \leftarrow max (f, 0)$
7: return f
8: end function
9: if $e > 0.5$ then
10. $r \leftarrow curve (e - 0.5)$	▹ Red component is determined by the scaled entropy value
11: else
12: $r \leftarrow 0$	▹ If entropy is less than or equal to 0.5, set red component to 0
13: end if
14: $b \leftarrow e^{2}$	▹Blue component is proportional to the square of entropy
15: if $x < len (data) - 1$ then
16: $n_g r a m_v a l u e \leftarrow (data [x] ≪ 8) + data [x + 1]$	▹ Compute 2-byte n-gram value
17: $g \leftarrow \frac{n_g r a m_v a l u e}{65,535}$	▹ Normalize n-gram value to [0, 1] for green component
18: else
19: $g \leftarrow 0$	▹ If at the last byte, the green component is set to 0
20: end if
21: return $[int (255 \cdot r), int (255 \cdot g), int (255 \cdot b)]$	▹Return RGB values scaled to the range [0, 255]

3.3.1. Obfuscation Detection Captured by Entropy in Red and Blue Channels

In this experiment, we applied AES-256 encryption in Electronic Codebook (ECB) mode to the initial 30% of the Classes.dex bytecode. This encryption caused a noticeable shift in entropy, particularly affecting the red and blue channels of image representations. Entropy, which quantifies randomness over 32-byte windows, increased significantly in high-entropy areas, leading to brighter pixel intensities. This effect simulates obfuscation techniques that malware creators use to evade detection. As a result, the red and blue channels in Figure 10 display brighter pixels in the top-left region, highlighting the encrypted sections in the representation.

3.3.2. Unstructured Bytecode Insertion Captured by N-Gram in Green Channel

In this experiment, we examined the structural sensitivity of the green channel by injecting random, unstructured operations into the initial 30% of the Classes.dex file. This action disrupted the byte sequence, causing noticeable distortions in the N-gram values, significantly impacting the green channel. MalVis, which utilizes bi-gram formulas to detect abnormal operational patterns, recorded these disturbances as increased bi-gram values, resulting in brighter pixel values within green-channel textures, as depicted in Figure 11. These deviations were apparent when visualized next to an unchanged sample, highlighting the effectiveness of the green channel in detecting structural anomalies.

3.4. Model Architecture and Experiment Settings

To assess the effectiveness of our proposed approach, we utilized a selection of well-recognized CNN models, as shown by Figure 3➆, including MobileNetV2, ResNet-50, DenseNet-201, VGG-16, and Inception-V3. These models were applied to our generated visualizations and baseline comparison methods. The employed CNN models have proven highly effective in malware detection because they capture intricate patterns and features within image data [48,64]. To ensure consistency across the CNN models, all images were resized to

224 \times 224

pixels using nearest neighbor interpolation to align with the input dimensions required by the models. The dataset was partitioned into 80% for training, 10% for validation, and 10% for testing. The batch size 64 was chosen based on empirical experimentation, as it provided an optimal trade-off between training speed and memory consumption on our GPU setup. The training was conducted over 50 epochs and carefully monitored to mitigate overfitting. This setup enabled consistent and accurate assessments of the models’ performance across various visualization techniques.

3.5. Environment Setup

MalVis visualization and model training were generated using an Ubuntu Server 22.04 LTS OS with x86 64 architecture. The hardware setup consisted of a 16-core AMD Ryzen Threadripper PRO 5955WX processor, 128 GB of DDR4 RAM at 3200 MHz, and an NVIDIA RTX A6000 graphics card. The system was configured within a controlled environment to ensure accurate results and minimize external influences.

4. Performance Measures

To ensure fairness when comparing the visualization methods and evaluating our proposed approaches alongside the baseline methods presented in Table 2, we employed accuracy, precision, recall, ROC-AUC, and MCC as validation metrics in the binary classification context. Similarly, the same metrics were employed for consistent evaluation in multiclass classification, as demonstrated in Table 3. The accuracy (4) indicates the percentage of instances correctly identified among the entire set of samples. The F1-score (5) provides a harmonic mean of the model’s precision and recall, accounting for false positives and false negatives. Precision (6) refers to the proportion of true positives in relation to all positive predictions made. Recall (7) denotes the fraction of actual positives correctly identified by the model. ROC-AUC measures the area under the receiver operating characteristic curve, highlighting the balance between sensitivity and specificity. The MCC (8) serves as a metric to assess classification performance, factoring in true and false positives and true and false negatives. Accordingly,

\begin{matrix} Accuracy & = \frac{T P + T N}{T P + T N + F P + F N}, \end{matrix}

(4)

\begin{matrix} F1-score & = 2 \times \frac{P \times R}{P + R}, \end{matrix}

(5)

\begin{matrix} Precision & = \frac{T P}{T P + F P}, \end{matrix}

(6)

\begin{matrix} Recall & = \frac{T P}{T P + F N}, \end{matrix}

(7)

and

MCC = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}},

(8)

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.

5. Results

This section provides a comparative analysis of the performance of the newly introduced visualizations encoded using Classbyte and N-gram, compared to baseline methods, namely Entropy-based, MalNet [48], and Classbyte, as detailed in the following subsections.

5.1. Evaluation of MalVis (Classbyte Encoded) and MalVis Performance Compared to Other Methods on the Binary Classification Dataset

All methods used the same settings and were trained on identical subsets of training data to ensure a fair comparison. As shown in Table 2, the MalVis (Classbyte-encoded) approach, which combines Classbyte and Entropy representations, did not enhance classification performance as expected. Instead, it disrupted the entropy encoder’s ability to capture meaningful patterns, as illustrated in Figure 12. Encoding the four color features from the Classbyte method into a single green channel effectively overwrote the structures previously identified by the entropy encoder.

In contrast, the proposed MalVis method, which integrates entropy and N-gram encoders, consistently achieved superior or comparable performance to other methods across most CNN architectures, with only minor exceptions. For instance, while DenseNet201 did not exhibit significant improvements across all metrics, it only demonstrated higher precision, indicating the model’s strong capability in correctly identifying true positives while minimizing false positives. The observed shortcoming in the remaining metrics can be attributed to the highly imbalanced dataset described in Section 3.1, where adware and trojan samples dominate other classes. This imbalance motivated our efforts to mitigate class disparity in the multiclass dataset evaluation, as discussed in Section 5.3. One limitation of the study is the use of a single train–test split without implementing k-fold cross-validation. To enhance the robustness and validity of the results, future research could explore the application of k-fold cross-validation. This approach would enable a more comprehensive evaluation of the model’s performance by utilizing multiple subsets of the data for training and testing, potentially yielding more reliable and generalizable findings.

Furthermore, we emphasize the importance of visualization techniques that deliver consistently higher detection performance across diverse CNN models. Section 5.4 focuses on enhancing the MalVis framework through ensemble-based strategies to further improve its robustness and classification accuracy.

These experiments demonstrate that existing methods, including Classbyte and MalNet, provide limited semantic and structural variation, resulting in suboptimal performance for malware classification tasks. In contrast, MalVis outperforms these approaches by integrating both entropy and N-gram patterns, producing meaningful visual representations that more effectively expose obfuscation, encryption, and other malicious behaviors. Notably, our earlier method, which relied solely on entropy [26], did not achieve comparable performance, underscoring the value of combining multiple feature types. This highlights the need for enhanced visualization techniques that improve both interpretability and classification accuracy. Accordingly, MalVis was selected for the subsequent advanced multiclass classification experiments to better distinguish between diverse malware types.

5.2. Evaluation of MalVis Performance on Imbalanced Multiclass Dataset

The evaluation of the MalVis representation in the imbalanced multiclass malware classification task, presented in Table 3, demonstrated that the ResNet50 model achieved the highest performance. It achieved an overall accuracy of 94.03%, F1-score of 83.54%, and Precision of 83.34%, surpassing the performance of state-of-the-art multiclass malware classification approaches [48]. The analysis of the confusion matrix, presented in Figure 13,

\begin{matrix} A \end{matrix}

to

\begin{matrix} E \end{matrix}

reveals significant challenges in differentiating between the majority and minority classes within the imbalanced multiclass dataset. The darker column for the adware class suggests a bias due to its higher frequency in the training set, as shown in Figure 4. A deeper inspection of the confusion matrix in Figure 13 reveals frequent misclassification between Adware, Trojan, and Spyware classes. These malware types often share similar bytecode structures and employ comparable obfuscation techniques, resulting in visually overlapping patterns in the entropy and N-gram channels. For example, packed adware and spyware samples may exhibit high-entropy values with irregular n-gram sequences, which confound the classifier. These findings highlight the need for refined feature selection and possibly more semantic augmentation in future visualization efforts. This highlights the effect of class imbalance, leading to biased decision boundaries that favor the majority class at the expense of consistent performance across all classes. Various strategies can address this imbalance, such as oversampling minority classes, undersampling majority classes, applying class weighting in the loss function, and using ensemble methods [65,66]. The following sections cover the application of undersampling to majority classes and provide a detailed evaluation of eight different ensemble methods.

5.3. Evaluation of MalVis Performance Using Undersampling

We applied undersampling to the majority classes to address the imbalanced class distribution, resulting in a more balanced dataset. Although other approaches, such as class weight adjustment and the Synthetic Minority Over-sampling Technique (SMOTE), have been proposed to tackle class imbalance [67], these methods are beyond the scope of this paper. We employed undersampling due to limited computational resources and the time constraints associated with training the oversampled method. Table 4 presents the evaluation results for undersampling with MalVis. The confusion matrix in Figure 14, models

\begin{matrix} B \end{matrix}

to

\begin{matrix} F \end{matrix}

, highlights improved differentiation between majority and minority classes. Despite a 15–20% reduction in overall performance relative to the results of the imbalanced dataset (Table 3), we discuss ensemble methods to boost model performance in the following section.

5.4. Evaluation of MalVis Performance Using Ensemble Models

To address the performance impact caused by the undersampling approach, we explored the application of various ensemble methods. The aim was to leverage the combined strengths of all CNN models, thereby enhancing both the models’ performance and robustness. The ensemble methods implemented and evaluated include:

Average Voting: Combines predictions by averaging the probabilities of all CNN models.
Majority Voting: Determines the final class by selecting the most predicted by individual models.
Weighted Voting: Assigns different weights to CNN models based on their prediction accuracy. We preserve the ranking performance of the models and assign weights corresponding to their ranking positions.
Min Confidence Voting: Only consider a model’s prediction when it meets the minimum required confidence level. In our implementation, a confidence threshold of 60% was selected.
Soft Voting: Uses the predicted class probabilities to decide the final output.
Median Voting: Determines decisions by selecting the median of predicted class probabilities.
Rank-Based Voting: Ranks predictions from models and aggregates ranks to select a class.
Stacking Ensemble: Trains a new model to integrate the predictions of the base model and improve performance.

In Table 5, the Min Confidence Voting ensemble achieved the highest performance across all evaluation metrics except for ROC-AUC. These results indicate superior performance compared to those in the unbalanced dataset shown in Table 3. The confusion matrix in Figure 14 in box

\begin{matrix} A \end{matrix}

illustrates that the Min Confidence Voting ensemble demonstrated enhanced performance by producing a more pronounced diagonal shape. This indicates an improved ability to accurately detect the more challenging classes compared to the CNN models shown in boxes

\begin{matrix} B \end{matrix}

to

\begin{matrix} F \end{matrix}

after undersampling. Moreover, the Stacking ensemble achieved the highest ROC-AUC metric, attributable to its ability to integrate predictions from multiple models, thereby leveraging their strengths to improve overall performance in distinguishing different classes.

These findings underline the effectiveness of ensemble methods, particularly Min Confidence Voting and Stacking, in handling multiclass classification challenges on the MalVis dataset.

5.5. Current Limitations and Future Work

While our results demonstrate the effectiveness of the MalVis framework, several methodological limitations should be noted:

Statistical Validation: Experiments were conducted using a single train–test split without k-fold cross-validation or statistical significance testing. The primary focus of this research work was to develop a bytecode visualization framework that provides competitive performance and explainable visual patterns for malware analysis.
Computational Constraints: Performing full cross-validation and repeated training on the 1.3 M sample multiclass dataset was computationally expensive, limiting the scope of statistical validation.

Future work will include extensive cross-validation and statistical significance testing to further validate the robustness and generalizability of the proposed approach.

6. Explainability Through Grad-CAM and Grad-CAM++ Visualization

In Section 5, the CNN models using MalVis with undersampling demonstrated outstanding performance in classifying different types of malware. However, their decision-making process remains unclear. To enhance model interpretability and validate our model’s decision, we employed Gradient-weighted Class Activation Mapping (Grad-CAM) and its enhanced variant, Grad-CAM++, to visualize the regions in MalVis images that most significantly influence classification decisions. This analysis offers crucial insights into whether our models prioritize semantically meaningful features, particularly the proposed Entropy and N-gram components within the MalVis visualizations.

6.1. Grad-CAM

Grad-CAM [68] generates visual explanations by utilizing gradients of the target class flowing into the final convolutional layer. The technique produces localization maps highlighting important regions for predicting specific malware classes. The Grad-CAM heatmap for class c is calculated through three key steps:

Step 1: compute the gradients of the class score

y^{c}

with respect to the feature maps

A^{k}

. Where the i and j represent the spatial coordinates (i.e., row and column positions) within the feature map:

\frac{\partial y^{c}}{\partial A_{i, j}^{k}}

(9)

Step 2: Calculate the importance weights

α_{k}^{c}

for the feature map k for class c, employ global average pooling. Where Z here represents the total number of spatial positions and

A_{i, j}^{k}

is the value at spatial position

i, j

within the feature map k:

α_{k}^{c} = \frac{1}{Z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial A_{i, j}^{k}}

(10)

Step 3: Generate the final Grad-CAM heatmap to highlight the image regions most influential in the prediction for class c. Brighter regions signify higher influence. This is achieved through a weighted combination of feature maps. Here, k represents the index of the convolutional filter/channel. The application of the ReLU function (Rectified Linear Unit) ensures that only positive contributions are retained by eliminating negative values:

L_{G r a d - C A M}^{c} = ReLU (\sum_{k} α_{k}^{c} A^{k})

(11)

6.2. Grad-CAM++

Grad-CAM++ [69] addresses limitations of standard Grad-CAM by providing more precise localization through enhanced weight computation. It refines the importance weight calculation to better capture pixel-wise significance:

α_{k}^{c} = \sum_{i} \sum_{j} w_{i, j}^{k, c} \cdot ReLU (\frac{\partial y^{c}}{\partial A_{i, j}^{k}})

(12)

w_{i, j}^{k, c} = \frac{\frac{\partial^{2} y^{c}}{\partial {(A_{i, j}^{k})}^{2}}}{2 \cdot \frac{\partial^{2} y^{c}}{\partial {(A_{i, j}^{k})}^{2}} + \sum_{a, b} A_{a, b}^{k} \cdot \frac{\partial^{3} y^{c}}{\partial {(A_{a, b}^{k})}^{3}}}

(13)

In Formula (13), the pixel-wise weight

w_{i, j}^{k, c}

at spatial position

(i, j)

in feature map k for class c is computed as a normalized ratio. The numerator computes the second-order derivative

\frac{\partial^{2} y^{c}}{\partial {(A_{i, j}^{k})}^{2}}

, which measures how sensitively the class prediction score

y^{c}

changes with respect to the pixel activation

A_{i, j}^{k}

. This captures the local curvature of the prediction function at that specific pixel location. While the denominator normalizes the numerator to ensure bounded weights. This ensures that pixels exhibiting both strong gradients and consistent influence across the entire feature map receive appropriately higher weights.

We utilize cosine similarity to determine the ten most similar representations for each class, resulting in a total of 100 samples, with ten samples per class. Following this, we employ the two attention map techniques, Grad-CAM and Grad-CAM++. The outcomes, as presented in Figure 15, demonstrate the application of these methods to a single MalVis image belonging to a specific class. The following section provides a comprehensive analysis and identification of key observations derived from this experiment.

6.3. Key Findings from Visualization Analysis

The application of both Grad-CAM and Grad-CAM++ visualization techniques, as illustrated in Figure 15, provided critical insights into the model’s decision-making process. Both methods revealed that the CNN consistently focused on semantically meaningful regions within MalVis images, with Grad-CAM++ demonstrating more refined attention patterns. Notably, the model concentrated on areas identified by our entropy and N-gram encoders, confirming that the CNN relied on these introduced feature patterns rather than random noise or artifacts to accurately classify samples.

Distinct attention patterns were observed across different malware types, validating our approach and demonstrating the model’s discriminative capabilities. For example, the magnified section of the Spyware sample in Figure 15 shows intense focus, revealing structural regions highlighted in dark red and orange. These correspond to highly obfuscated code sections captured by the entropy encoder (red and blue channels) or abnormal code structures identified by the N-gram approach (green channel). These malicious patterns reflect obfuscation techniques used by malware creators to evade detection. In contrast, the magnified section of the Downloader sample exhibits distributed attention across the whole section, reflecting lower entropy levels with an indication of less randomness in the bytecode. The Benign sample, notably, shows light blue with less attention overall, focusing on lower-entropy regions containing regular coding patterns.

This differentiation confirms that the model effectively leverages entropy and N-gram information embedded within the MalVis representations to make classification decisions, focusing on relevant malware characteristics rather than spurious patterns. The interpretability offered by these attention mechanisms provides confidence in the model’s trustworthiness and establishes the reliability of its decision-making process.

7. Conclusions

This research establishes the critical importance of visualizing Android malware to safeguard user data and smartphone security. We introduced MalVis, the largest publicly available image-based dataset for Android malware, containing over 1.3 million samples. To complement this resource, we developed a novel visualization framework that transforms bytecode into RGB images by integrating entropy and N-gram encoding techniques. This method effectively captures the malware’s encryption, compression, structural, and operational anomaly patterns.

Through extensive evaluation, MalVis consistently outperformed existing visualization-based detection approaches, achieving 95% accuracy, 90% F1-score, 92% precision, 89% recall, 87% Matthews Correlation Coefficient, and a 98% ROC-AUC. Our interpretability analysis, using GradCAM and GradCAM++, revealed that the model focuses on semantically meaningful regions within MalVis images, particularly areas linked to entropy variations and N-gram patterns. Each malware type exhibits distinct attention patterns, validating the model’s discriminative capabilities. These visualization results confirm that classification decisions are based on relevant malware characteristics, establishing both the trustworthiness and explainability of our approach. Beyond its strong performance, MalVis delivers an innovative framework that links visual representations to the semantic characteristics of malware, enhancing interpretability and classification robustness. This dataset and framework provide a valuable foundation for advancing research in malware classification and explainable threat detection.

Author Contributions

Conceptualization, S.J.M.; Data curation, S.J.M.; Formal analysis, S.J.M.; Funding acquisition, K.E.B.; Investigation, S.J.M.; Methodology, S.J.M.; Project administration, M.J.D.L. and K.E.B.; Resources, S.J.M.; Software, S.J.M.; Supervision, M.J.D.L. and K.E.B.; Validation, S.J.M.; Visualization, S.J.M.; Writing—original draft preparation, S.J.M.; Writing—review and editing, S.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The Article Processing Charge (APC) was funded by Kenneth Barner.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this research are openly available on the MalVis site at www.mal-vis.org, accessed on 2 October 2025. The scripts used to generate the visualization methods are openly available on GitHub at https://github.com/makkawysaleh/MalVis, accessed on 2 October 2025.

Acknowledgments

The authors gratefully acknowledge the Université du Luxembourg for providing and maintaining the valuable collection of Android applications that supports the research community.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Sherif, A. Market Share of Mobile Operating Systems Worldwide from 2009 to 2024, by Quarter. Available online: https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/ (accessed on 23 September 2024).
Statista. Smartphone Operating System Share by Age Group in the U.S. as of December 2023. 2024. Available online: https://www.statista.com/statistics/1313944/main-smartphone-usage-share-by-age/ (accessed on 31 January 2024).
Business, V. Mobile Security Index (MSI) Report 2023: Security Threats and Attacks. 2023. Available online: https://www.verizon.com/business/resources/reports/mobile-security-index/ (accessed on 10 January 2024).
Noever, D.; Noever, S.E.M. Virus-MNIST: A benchmark malware dataset. arXiv 2021, arXiv:2103.00602. [Google Scholar] [CrossRef]
Wang, P.; González, M.C.; Hidalgo, C.A.; Barabási, A.L. Understanding the spreading patterns of mobile phone viruses. Science 2009, 324, 1071–1076. [Google Scholar] [CrossRef] [PubMed]
Kienzle, D.M.; Elder, M.C. Recent worms: A survey and trends. In Proceedings of the 2003 ACM workshop on Rapid Malcode, Washington, DC, USA, 27 October 2003; pp. 1–10. [Google Scholar]
Yilmaz, S.; Zavrak, S. Adware: A review. Int. J. Comput. Sci. Inf. Technol. 2015, 6, 5599–5604. [Google Scholar]
Suresh, S.; Di Troia, F.; Potika, K.; Stamp, M. An analysis of Android adware. J. Comput. Virol. Hacking Tech. 2019, 15, 147–160. [Google Scholar] [CrossRef]
Boldt, M.; Carlsson, B.; Jacobsson, A. Exploring spyware effects. In Proceedings of the Nordsec 2004, Espoo, Finland, 4–5 November 2004. [Google Scholar]
Ali, A. Ransomware: A research and a personal case study of dealing with this nasty malware. Issues Informing Sci. Inf. Technol. 2017, 14, 87–99. [Google Scholar] [CrossRef]
Beegle, L.E. Rootkits and their effects on information security. Inf. Syst. Secur. 2007, 16, 164–176. [Google Scholar] [CrossRef]
Zhenfang, Z. Study on computer trojan horse virus and its prevention. Int. J. Eng. Appl. Sci. 2015, 2, 257840. [Google Scholar]
Bhardwaj, A.; Goundar, S. Keyloggers: Silent cyber security weapons. Netw. Secur. 2020, 2020, 14–19. [Google Scholar] [CrossRef]
Feily, M.; Shahrestani, A.; Ramadass, S. A survey of botnet and botnet detection. In Proceedings of the 2009 Third International Conference on Emerging Security Information, Systems and Technologies, Athens, Greece, 18–23 June 2009; IEEE: Washington, DC, USA, 2009; pp. 268–273. [Google Scholar]
Pachhala, N.; Jothilakshmi, S.; Battula, B.P. A comprehensive survey on identification of malware types and malware classification using machine learning techniques. In Proceedings of the 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 7–9 October 2021; IEEE: Washington, DC, USA, 2021; pp. 1207–1214. [Google Scholar]
Halevi, S.; Krawczyk, H. Strengthening digital signatures via randomized hashing. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 20–24 August 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 41–59. [Google Scholar]
Canfora, G.; Di Sorbo, A.; Mercaldo, F.; Visaggio, C.A. Obfuscation techniques against signature-based detection: A case study. In Proceedings of the 2015 Mobile Systems Technologies Workshop (MST), Milan, Italy, 22 May 2015; IEEE: Washington, DC, USA, 2015; pp. 21–26. [Google Scholar]
Thangaveloo, R.; Jing, W.; Chiew, K.L.; Abdullah, J. DATDroid: Dynamic Analysis Technique in Android Malware Detection. Int. J. Adv. Sci. Eng. Inf. Technol. 2020, 10, 536. [Google Scholar] [CrossRef]
Pan, Y.; Ge, X.; Fang, C.; Fan, Y. A systematic literature review of android malware detection using static analysis. IEEE Access 2020, 8, 116363–116379. [Google Scholar] [CrossRef]
Sato, R.; Chiba, D.; Goto, S. Detecting android malware by analyzing manifest files. Proc. Asia-Pac. Adv. Netw. 2013, 36, 17. [Google Scholar] [CrossRef]
Vu, D.L.; Nguyen, T.K.; Nguyen, T.V.; Nguyen, T.N.; Massacci, F.; Phung, P.H. HIT4Mal: Hybrid image transformation for malware classification. Trans. Emerg. Telecommun. Technol. 2020, 31, e3789. [Google Scholar] [CrossRef]
Freitas, S.; Dong, Y.; Neil, J.; Chau, D.H. A large-scale database for graph representation learning. arXiv 2020, arXiv:2011.07682. [Google Scholar]
Aurangzeb, S.; Aleem, M.; Khan, M.T.; Loukas, G.; Sakellari, G. AndroDex: Android Dex images of obfuscated malware. Sci. Data 2024, 11, 212. [Google Scholar] [CrossRef]
Hurier, M.; Suarez-Tangil, G.; Dash, S.K.; Bissyandé, T.F.; Traon, Y.L.; Klein, J.; Cavallaro, L. Euphony: Harmonious unification of cacophonous anti-virus vendor labels for Android malware. In Proceedings of the 14th International Conference on Mining Software Repositories, Buenos Aires, Argentina, 20–21 May 2017; IEEE Press: Washington, DC, USA, 2017; pp. 425–435. [Google Scholar]
VirusTotal—Free Online Virus, Malware, and URL Scanner. Available online: https://www.virustotal.com (accessed on 5 August 2024).
Makkawy, S.J.; Alblwi, A.H.; De Lucia, M.J.; Barner, K.E. Improving Android Malware Detection with Entropy Bytecode-to-Image Encoding Framework. In Proceedings of the 2024 33rd International Conference on Computer Communications and Networks (ICCCN), Kailua-Kona, HI, USA, 29–31 July 2024; IEEE: Washington, DC, USA, 2024; pp. 1–9. [Google Scholar]
Dhammi, A.; Singh, M. Behavior analysis of malware using machine learning. In Proceedings of the 2015 Eighth International Conference on Contemporary Computing (IC3), Noida, India, 20–22 August 2015; pp. 481–486. [Google Scholar] [CrossRef]
Poettering, B.; Rastikian, S. Sequential digital signatures for cryptographic software-update authentication. In Proceedings of the European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 255–274. [Google Scholar]
Tahir, R. A study on malware and malware detection techniques. Int. J. Educ. Manag. Eng. 2018, 8, 20. [Google Scholar] [CrossRef]
Gao, C.; Cai, M.; Yin, S.; Huang, G.; Li, H.; Yuan, W.; Luo, X. Obfuscation-Resilient Android Malware Analysis Based on Complementary Features. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5056–5068. [Google Scholar] [CrossRef]
Elsersy, W.F.; Feizollah, A.; Anuar, N.B. The rise of obfuscated Android malware and impacts on detection methods. PeerJ Comput. Sci. 2022, 8, e907. [Google Scholar] [CrossRef]
Kirat, D.; Vigna, G. Malgene: Automatic extraction of malware analysis evasion signature. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 769–780. [Google Scholar]
Pereberina, A.; Kostyushko, A.; Tormasov, A. An approach to dynamic malware analysis based on system and application code split. J. Comput. Virol. Hacking Tech. 2022, 18, 231–241. [Google Scholar] [CrossRef]
Ullah, F.; Ullah, S.; Srivastava, G.; Lin, J.C.W.; Zhao, Y. NMal-Droid: Network-based android malware detection system using transfer learning and CNN-BiGRU ensemble. Wirel. Netw. 2024, 30, 6177–6198. [Google Scholar] [CrossRef]
Sihwail, R.; Omar, K.; Zainol Ariffin, K.A.; Al Afghani, S. Malware detection approach based on artifacts in memory image and dynamic analysis. Appl. Sci. 2019, 9, 3680. [Google Scholar] [CrossRef]
Seyfari, Y.; Meimandi, A. A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection. Multimed. Tools Appl. 2023, 83, 10525–10549. [Google Scholar] [CrossRef]
Orlova, V.; Goiko, V.; Alexandrova, Y.; Petrov, E. Potential of the dynamic approach to data analysis. E3s Web Conf. 2021, 258, 07012. [Google Scholar] [CrossRef]
Bhatia, T.; Kaushal, R. Malware detection in android based on dynamic analysis. In Proceedings of the 2017 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), London, UK, 19–20 June 2017; IEEE: Washington, DC, USA, 2017; pp. 1–6. [Google Scholar]
Jose, R.R.; Salim, A. Integrated static analysis for malware variants detection. In Inventive Computation Technologies, Proceedings of the ICICT 2019 Conference, Coimbatore, Tamil Nadu, 29–30 August 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 622–629. [Google Scholar]
Sutter, T.; Kehrer, T.; Rennhard, M.; Tellenbach, B.; Klein, J. Dynamic Security Analysis on Android: A Systematic Literature Review. IEEE Access 2024, 12, 57261–57287. [Google Scholar] [CrossRef]
Kancherla, K.; Mukkamala, S. Image visualization based malware detection. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 16–19 April 2013; IEEE: Washington, DC, USA, 2013; pp. 40–44. [Google Scholar]
Alblwi, A.; Makkawy, S.; Barner, K.E. D-DDPM: Deep Denoising Diffusion Probabilistic Models for Lesion Segmentation and Data Generation in Ultrasound Imaging. IEEE Access 2025, 13, 41194–41209. [Google Scholar] [CrossRef]
Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
Sun, M.H.; Kong, S.H.; Paek, D.H. A Survey on Deep Learning-Based Lane Detection Algorithms for Camera and LiDAR. IEEE Trans. Intell. Transp. Syst. 2025, 26, 7319–7342. [Google Scholar] [CrossRef]
Sun, H.; Chen, M.; Weng, J.; Liu, Z.; Geng, G. Anomaly detection for in-vehicle network using CNN-LSTM with attention mechanism. IEEE Trans. Veh. Technol. 2021, 70, 10880–10893. [Google Scholar] [CrossRef]
Allix, K.; Bissyandé, T.F.; Klein, J.; Le Traon, Y. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 13th International Conference on Mining Software Repositories, Austin, TX, USA, 14–22 May 2016; pp. 468–471. [Google Scholar]
Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Ndss, San Diego, CA, USA, 23–26 February 2014; Volume 14, pp. 23–26. [Google Scholar]
Freitas, S.; Duggal, R.; Chau, D.H. MalNet: A large-scale image database of malicious software. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–22 October 2022; pp. 3948–3952. [Google Scholar]
Rezaei, T.; Manavi, F.; Hamzeh, A. A PE header-based method for malware detection using clustering and deep embedding techniques. J. Inf. Secur. Appl. 2021, 60, 102876. [Google Scholar] [CrossRef]
Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
Kunwar, P.; Aryal, K.; Gupta, M.; Abdelsalam, M.; Bertino, E. SoK: Leveraging Transformers for Malware Analysis. arXiv 2024, arXiv:2405.17190. [Google Scholar] [CrossRef]
Kalash, M.; Rochan, M.; Mohammed, N.; Bruce, N.D.; Wang, Y.; Iqbal, F. Malware classification with deep convolutional neural networks. In Proceedings of the 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 26–28 February 2018; IEEE: Washington, DC, USA, 2018; pp. 1–5. [Google Scholar]
Panconesi, A.; Marian; Cukierski, W.; Committee, W.B.C. Microsoft Malware Classification Challenge (BIG 2015). Kaggle. 2015. Available online: https://kaggle.com/competitions/malware-classification (accessed on 3 March 2024).
Wang, C.; Zhang, L.; Zhao, K.; Ding, X.; Wang, X. Advandmal: Adversarial training for android malware detection and family classification. Symmetry 2021, 13, 1081. [Google Scholar] [CrossRef]
Ünver, H.M.; Bakour, K. Android malware detection based on image-based features and machine learning techniques. SN Appl. Sci. 2020, 2, 1299. [Google Scholar] [CrossRef]
Daoudi, N.; Samhi, J.; Kabore, A.K.; Allix, K.; Bissyandé, T.F.; Klein, J. Dexray: A simple, yet effective deep learning approach to android malware detection based on image representation of bytecode. In Proceedings of the Deployable Machine Learning for Security Defense: Second International Workshop, MLHat 2021, Virtual Event, 15 August 2021; Proceedings 2. Springer: Berlin/Heidelberg, Germany, 2021; pp. 81–106. [Google Scholar]
Wang, C.; Zhao, Z.; Wang, F.; Li, Q. A novel malware detection and family classification scheme for IoT based on DEAM and DenseNet. Secur. Commun. Netw. 2021, 2021, 6658842. [Google Scholar] [CrossRef]
Darwaish, A.; Naït-Abdesselam, F. Rgb-based android malware detection and classification using convolutional neural network. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; IEEE: Washington, DC, USA, 2020; pp. 1–6. [Google Scholar]
Ismail, S.J.I.; Rahardjo, B.; Juhana, T.; Musashi, Y. MalSSL–Self-Supervised Learning for Accurate and Label-Efficient Malware Classification. IEEE Access 2024, 12, 58823–58835. [Google Scholar] [CrossRef]
Google.com, A.D. Androguard Tool by Google, 13 February 2013/1st January 2024. Available online: https://code.google.com/archive/p/androguard/ (accessed on 8 January 2024).
Cortesi, A. (BinVis) A Library for Drawing Space-Filling Curves like the Hilbert Curve. 2015. Available online: https://github.com/cortesi/scurve (accessed on 8 January 2024).
Ali, M.; Shiaeles, S.; Bendiab, G.; Ghita, B. MALGRA: Machine learning and N-gram malware feature extraction and detection system. Electronics 2020, 9, 1777. [Google Scholar] [CrossRef]
Zhong, F.; Hu, Q.; Jiang, Y.; Huang, J.; Zhang, C.; Wu, D. Enhancing Malware Classification via Self-Similarity Techniques. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7232–7244. [Google Scholar] [CrossRef]
Almomani, I.; Alkhayer, A.; El-Shafai, W. An automated vision-based deep learning model for efficient detection of android malware attacks. IEEE Access 2022, 10, 2700–2720. [Google Scholar] [CrossRef]
Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine learning with oversampling and undersampling techniques: Overview study and experimental results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; IEEE: Washington, DC, USA, 2020; pp. 243–248. [Google Scholar]
Gosain, A.; Sardana, S. Handling class imbalance problem using oversampling techniques: A review. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; IEEE: Washington, DC, USA, 2017; pp. 79–85. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: Washington, DC, USA, 2018; pp. 839–847. [Google Scholar]

Figure 1. An illustration of the structure of an Android APK file, highlighting key components such as application bytecode, assets, resources, and the manifest file.

Figure 2. Comparison of average file sizes in DEX executables vs. MalVis representations across malware categories (with size reduction ratio in percentages).

Figure 3. A schematic illustration of the proposed framework architecture is organized into four distinct rows. The first row details the data collection and labeling process. The second row focuses on feature extraction. The third row constructs and generates RGB images using entropy and N-gram methods. The bottom row describes the training process, including visualization techniques, CNN models, ensemble methods, and both binary and multiclass classifications. The circled numbers (1–9) indicate the pipline’s steps explained in detail in the corresponding sections.

Figure 4. Distribution of malware types and benign in MalVis.

Figure 5. Overview of constructing the MalVis (Classbyte) visualization method, resulting in RGB image representations using the Classbyte encoding in the green channel and encoding entropy in the red and blue channels. The circled numbers (1–3) represent the pipeline steps described in the text.

Figure 6. Overview of the MalVis (N-gram) visualization pipeline, illustrating how Shannon entropy populates the red and blue channels while bi-gram (N-gram) values fill the green channel to emphasize structural and contextual bytecode patterns. The numbered circles (1–3) correspond to the pipeline steps described in the text.

Figure 7. An example of a simple for-loop written in Java.

Figure 8. Translation of a Java for-loop into its equivalent JVM instructions in bytecode form after compilation.

Figure 9. Representation of the employed Bi-gram approach on the Java instructions capturing the semantic transition of these instructions.

Figure 10. The impact of 30% AES-256 encryption on Classes.dex file captured by the entropy encoder in the red and blue channels of MalVis representations.

Figure 11. The impact of injecting 30% randomized unstructured operations to Classes.dex file captured by the N-gram encoder in the green channels of MalVis representations.

Figure 12. An illustration of the disruption caused by MalVis (Classbyte encoded) in the green channel, which impacts patterns captured by the entropy encoder compared to the representations by MalVis (N-gram encoded).

Figure 13. Confusion matrices of CNN models trained on the imbalanced multiclass MalVis (N-gram encoded).

Figure 14. Confusion matrices for CNN models trained on a balanced multiclass MalVis dataset. The blue dashed box highlights the optimal ensemble method (Min Confidence Voting).

Figure 15. Grad-CAM and Grad-CAM++ visualization results for different malware types. Each row showcases a randomly selected sample with its ground truth type, displayed across three columns: original MalVis images (left), Grad-CAM overlays (middle), and Grad-CAM++ overlays (right). In the heatmap overlays, red and orange regions indicate areas of high model attention, while blue regions represent areas of low attention. The red and blue channels of the MalVis images capture high-entropy sections of a code, whereas the green channel highlights areas with abnormal coding structures.

Table 1. Summary of image-based malware datasets detailing the number of classes, dataset sizes, and availability. The row highlighted in blue (MalVis) corresponds to our proposed dataset.

Dataset	# Classes	Dataset Size	Public	Private
MalVis	10	1,300,822	✓
MalNet [48]	696	1,262,024	✓
AndroDex [23]	180	24,746	✓
Virus-MNIST [4]	10	51,880	✓
MalImg [52]	25	9458	✓
Microsoft [53]	9	108,000		✓
IVMD-2013 [41]	2	37,000		✓
AdvAndMal [54]	12	5560		✓
Halil-2020 [55]	2	29,100		✓

Table 2. Comparison of visualization approaches using different CNN models. Abbreviations in the table include MNv2 (MobileNet-V2), DN201 (DenseNet201), RN50 (ResNet50), and INC-V3 (Inception-V3). Bold values highlight the highest score for each metric within the respective model.

Approaches	Models	Accuracy	F1-Score	Precsion	Recall	MCC	R-AUC
Classsbyte Encoder [21]	MNv2	91%	85%	79%	92%	80%	96%
	DN201	94%	89%	89%	88%	85%	97%
	RN50	93%	86%	89%	83%	81%	96%
	INC-V3	94%	89%	92%	85%	85%	97%
	VGG16	93%	87%	86%	88%	82%	96%
MalNet Encoder [48]	MNv2	92%	85%	91%	80%	81%	97%
	DN201	89%	83%	74%	90%	77%	96%
	RN50	86%	67%	91%	53%	63%	94%
	INC-V3	94%	90%	92%	87%	86%	97%
	VGG16	93%	88%	91%	84%	84%	97%
Entropy-based [26]	MNv2	93%	88%	90%	85%	84%	97%
	DN201	95%	91%	90%	92%	88%	98%
	RN50	93%	86%	90%	82%	82%	97%
	INC-V3	94%	90%	91%	89%	86%	97%
	VGG16	93%	87%	94%	80%	83%	97%
MalVis (Classbyte)	MNv2	91%	84%	89%	79%	78%	96%
	DN201	94%	90%	92%	88%	86%	98%
	RN50	93%	87%	92%	83%	83%	97%
	INC-V3	94%	88%	92%	85%	84%	97%
	VGG16	92%	86%	87%	85%	81%	96%
MalVis	MNv2	95%	90%	91%	89%	87%	98%
	DN201	95%	90%	92%	89%	87%	98%
	RN50	95%	90%	92%	88%	87%	98%
	INC-V3	95%	90%	92%	89%	87%	98%
	VGG16	94%	89%	89%	90%	86%	97%

The highlighted approach MalVis (N-gram encoded) represents the most effective method for visualizing Android Malware classes.dex files.

Table 3. Performance results of different models on the MalVis imbalanced dataset.

Models	MalVis
Models	A	F1	P	R	MCC	ROC-AUC
MNv2	83%	83%	82%	83%	67%	93%
DN201	82%	81%	81%	82%	65%	95%
RN50	84%	84%	83%	84%	68%	94%
INC-V3	80%	78%	78%	80%	59%	93%
VGG16	82%	81%	81%	82%	65%	91%

Table 4. Performance results after undersampling approach on MalVis (N-gram encoded) dataset.

Models	MalVis
Models	A	F1	P	R	MCC	ROC-AUC
MNv2	61%	61%	62%	61%	57%	89%
DN201	66%	66%	65%	66%	62%	91%
RN50	65%	64%	64%	65%	61%	90%
INC-V3	64%	64%	64%	64%	60%	90%
VGG16	60%	60%	60%	60%	56%	89%

Table 5. Performance results of different ensemble methods on the MalVis multiclass dataset after undersampling evaluation.

Ensemble Methods	MalVis
Ensemble Methods	A	F1	P	R	ROC-AUC
Average Voting	66%	65%	65%	66%	81%
Majority Voting	63%	61%	63%	63%	79%
Weighted Voting	64%	63%	63%	64%	80%
Min Confidence Voting	88%	86%	89%	88%	86%
Soft Voting	66%	65%	65%	66%	81%
Median Voting	64%	63%	64%	64%	80%
Rank-Based Voting	63%	63%	64%	63%	79%
Stacking Ensemble	83%	83%	83%	83%	90%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Makkawy, S.J.; De Lucia, M.J.; Barner, K.E. MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection. J. Cybersecur. Priv. 2025, 5, 109. https://doi.org/10.3390/jcp5040109

AMA Style

Makkawy SJ, De Lucia MJ, Barner KE. MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection. Journal of Cybersecurity and Privacy. 2025; 5(4):109. https://doi.org/10.3390/jcp5040109

Chicago/Turabian Style

Makkawy, Saleh J., Michael J. De Lucia, and Kenneth E. Barner. 2025. "MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection" Journal of Cybersecurity and Privacy 5, no. 4: 109. https://doi.org/10.3390/jcp5040109

APA Style

Makkawy, S. J., De Lucia, M. J., & Barner, K. E. (2025). MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection. Journal of Cybersecurity and Privacy, 5(4), 109. https://doi.org/10.3390/jcp5040109

Article Menu

MalVis: Large-Scale Bytecode Visualization Framework for Explainable Android Malware Detection

Abstract

1. Introduction

1.1. Overview of the Android APK File Structure

1.2. Contributions

2. Related Works

2.1. Signature-Based Analysis

2.2. Dynamic Analysis

2.3. Static Analysis

2.4. Motivation

2.5. Existing Malware Image Datasets

2.6. Visualization Strategies for Malware Detection

2.6.1. Grayscale Image Encoding

2.6.2. RGB Image Encoding

3. Methodology

3.1. Data Collection and Label Generation

3.2. MalVis Bytecode-to-Image Visualization

3.2.1. Approach Classbyte

3.2.2. Approach N-Gram

3.3. The Impact of Entropy and N-Gram on MalVis Representations Experiments

3.3.1. Obfuscation Detection Captured by Entropy in Red and Blue Channels

3.3.2. Unstructured Bytecode Insertion Captured by N-Gram in Green Channel

3.4. Model Architecture and Experiment Settings

3.5. Environment Setup

4. Performance Measures

5. Results

5.1. Evaluation of MalVis (Classbyte Encoded) and MalVis Performance Compared to Other Methods on the Binary Classification Dataset

5.2. Evaluation of MalVis Performance on Imbalanced Multiclass Dataset

5.3. Evaluation of MalVis Performance Using Undersampling

5.4. Evaluation of MalVis Performance Using Ensemble Models

5.5. Current Limitations and Future Work

6. Explainability Through Grad-CAM and Grad-CAM++ Visualization

6.1. Grad-CAM

6.2. Grad-CAM++

6.3. Key Findings from Visualization Analysis

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI