After the extraction of representative features from volatile memory follows the final step of the methodology: classification of the memory features from AI models. In order to identify the most accurate solutions, various data engineering techniques and ML/DL models and architectures are tested. The derived architecture comprises a two-stage malware detection process capable of performing well on the source dataset as well on data collected from a different setup (using also a drift correction methodology), to ensure applicability in different environments. Furthermore, explainable AI techniques are adopted to support malware analysts’ understanding of the reasons behind the framework’s decisions.
3.3.1. Data Preprocessing
Several data pre-processing procedures were conducted before training and classification to guarantee consistency and model compatibility. Categorical malware family labels were first derived from hierarchical category strings using regular expressions. At the same time, tests for any missing or corrupt feature values were implemented to remove these cases. Duplicate records and high-cardinality features with excessive sparsity were eliminated to increase generalizability.
More specifically, before model training, we first merged the benign and malware subsets, resulting in 58,168 samples with 59 original features. We then removed 34 features (57.63%) that were either metadata that could not be used on real time detection (e.g., Year/Month that the malware was created) or features with increased memory dump time, yielding a retained set of 25 features (24 + Category or Class, depending on the case). After this feature filtering step, a few missing values were removed in the retained feature set (14 missing rows, 0.003%), while we also removed exact duplicate records (row-level duplicates over the full feature vector), eliminating 1821 samples (3.152%). Overall, 1835 (3.155%) rows were excluded.
To avoid information leakage, the final feature set was fixed a priori based on operational constraints (features that increased acquisition/extraction latency or were not available at run time were excluded). Correlation analysis was also used as an extra exploratory redundancy inspection on the training data to better understand inter-feature dependencies (
Figure 2). Tree-based approaches were trained on raw values (insensitive to scaling), whereas non-tree-based models were normalized with
StandardScaler [
39].
We then trained models via stratified 5-fold cross-validation (k = 5). In each fold, the model was trained on 4/5 of the data (80%) and evaluated on the remaining 1/5 (20%) [
60,
61,
62].
Further on, as described in
Section 3.3.5, we applied validation of our methodology on additional data extracted from custom malware execution. In the validation process, we noticed a significant domain drift between data from MemMal-D2024 dataset and the data collected from malware execution on a sandboxed VM. Thus, to mitigate this issue, we tested an affine transformation methodology on the MemMal-D2024 data. The associated results are described in
Section 4.2.5.
3.3.4. Two-Stage Host-Based Malware Detection
The initial approach was implemented on a combined dataset with four classes (Benign, Spyware, Ransomware, Trojan), as illustrated in
Figure 3. T distribution plot showcases a class imbalance between the Benign (29,298 cases), Spyware (10,020 cases), Ransomware (9791 cases), and Trojan (9059 cases). For the implementation, several models were tested, showcasing a multi-model comparison. The results of this approach are provided in
Section 4.2.1.
Based on the results of the initial detection, further implementation was conducted, including a two-stage host-based malware detection to boost performance.
The proposed framework employs a two-stage workflow that combines a DL model optimized for tabular data with a heterogeneous ensemble of gradient boosting classifiers. This approach enables both high accuracy and interpretability, while maintaining low inference latency.
A single multi-class classifier would need to jointly solve two qualitatively different problems: firstly, the benign–malware separation, and, secondly, fine-grained family attribution among malware classes. In our setting, the first task is highly separable using memory-forensics artifacts, whereas the second is inherently harder, due to class imbalance and feature overlap between families. As a result, a single 4-class (Benign, Spyware, Ransomware, Trojan) model can be dominated by the benign–malware decision boundary and may sacrifice minority-family recall (
Figure 3).
The first stage focused on distinguishing benign from malicious memory snapshots that contain malware (
Figure 4). The implementation used the TabNet Classifier from the PyTorch [
59] TabNet library, which utilizes sequential attention to select relevant features at each decision step, providing inherent interpretability through its feature-mask mechanism. The model was trained on malware samples whose creation metadata spans 2006–2021, included in the MemMal-D2024 dataset, with a stratified split between training and testing data. Prior to training, categorical malware family strings were label-encoded, unnecessary columns were dropped, and feature vectors were normalized, using a standard scaler. The available Year/Month fields reflect malware creation metadata and are not used as predictive features; thus, they should not be interpreted as chronological acquisition timestamps for temporal-split evaluation. The TabNet model outputs class probabilities, from which the final binary decision threshold is applied, in order to determine whether a sample proceeds to the second, multiclass stage of malware categorization.
The second stage performed the malware family classification for instances identified as malicious in the first stage. The process integrated a Voting Classifier using LGBM, HGB, and XGB, with the two algorithms trained independently on the same pre-processed feature space for multiclass malware-family classification. The training followed the same split and pre-processing as in the first stage of the classification, to enable like-for-like comparison, and each model output a discrete family label (e.g., Ransomware, Spyware, Trojan) for subsequent evaluation. All the quantitative evaluations of the model runs are reported in a later section concerning the implementation results.
The two-stage pipeline architecture enhances host-based malware classification’s detection accuracy and interpretability. The central concept is to divide the classification task into two distinct and sequential subtasks:
A binary classification task is modeled for the first step-Classifier 1 (C1) (
Figure 5), differentiating benign samples from malicious ones. This stage works as a high-level filter to distinguish non-threats before moving to multiclass classification for the malware samples. As mentioned, a TabNet DL model was implemented using labeled benign and malware memory dumps (
Figure 4).
If a sample is classified as benign, the pipeline terminates. If the input sample is labelled malware, it proceeds to Stage 2, utilizing Classifier 2 (C2) (
Figure 5).
The second step of the pipeline initiates if a sample is detected as malware, with a multiclass classifier C2 being activated to identify the specific malware family. The three malware types are Trojan, Ransomware, and Spyware, with the class distribution illustrated in
Figure 6. The two-step pipeline is illustrated in
Figure 5.
After a comprehensive comparison between several algorithms, the selected model for this step was a Voting Classifier ensemble combining the LGBM, XGB, and HGB models. The predicted class had the highest average class probability across the three base models.
3.3.5. Validation on Testbed Malware Data
In cybersecurity threat detection, the underlying data distributions often evolve over time, due to changing network patterns, new protocols, or user behavior, creating a phenomenon known as domain drift. Models trained on historical (source domain) data may therefore become less effective as the statistical properties of features change [
64]. Furthermore, a model trained in an environment configured in a certain way may exhibit very good performance in that environment but fail when applied in a different environment. In the host-based malware detection case, a different environment may include hosts with different hardware, OS versions, running processes/services, etc. Those differences heavily influence memory feature distributions and, therefore, model performance. However, it is crucial to ensure that the AI models can also perform well in real world deployment. For that purpose, external validation with data collected from a different source than the training dataset is an important task to perform.
For the AI pipeline’s prediction evaluation on independent data from a different environment, a laboratory testbed was set up, where three different attack scenarios were applied, leveraging custom Trojan, Spyware, and Ransomware malware that do not appear in the source dataset. A Windows Server 2016 host was configured to run typical enterprise services, including an HTTP/web service and a Microsoft SQL Server (MSSQL) database. This server stands in for a realistic production machine: IIS or another web server can serve web pages or APIs over HTTP, while MSSQL provides backend database services. The server runs standard Windows services under service-accounts (or built-in accounts) as needed by MSSQL, and these background services auto-launch on boot. Furthermore, a Debian machine was used for applying the attacker’s actions.
The first malware scenario represents a Trojan use case. A backdoor Windows executable was built with msfvenom tool. The malicious file was served on the attacker’s machine and downloaded on the target Windows server. Upon successful download, the malware was executed to open a reverse TCP Meterpreter session to the attacker’s machine. After the session was opened, migration to a legitimate Windows process (such as winlogon.exe) followed, in order to achieve stability and stealthiness.
The second malware scenario is an extension of the Trojan scenario, but focuses on Spyware activities. An attacker, having gained access to the target system through a reverse TCP Meterepreter session, can try to gain elevated SYSTEM privileges, via getsystem Meterpreter command. After elevating privileges, the attacker can gather OS and system information or even dump credentials from LSASS memory. In this Spyware scenario, kiwi Meterpreter extension was loaded, which contains commands to perform Mimikatz-style credential dumping.
The final malware scenario represents a Ransomware. First, the Ransomware payload was built and served from the attacker’s machine using a Python HTTP server. For the Ransomware, implementation PSRansom was used, a Ransomware simulation tool designed to demonstrates how Ransomware operates by encrypting files in a target directory using AES-256 encryption, communicating with a Command & Control (C2) server to exfiltrate encryption keys and data, and creating a ransom note. The Ransomware was downloaded and executed on the Windows server. As a result, critical data on the server were encrypted.
The validation scenarios were designed to cover a representative set of MITRE ATT&CK tactics and techniques across different phases of adversarial behavior. UC1 (Trojan) includes techniques associated with the Resource Development tactic, namely, T1587.001 (Develop Capabilities: Malware) and T1608.001 (Stage Capabilities: Upload Malware), followed by Command and Control through T1105 (Ingress Tool Transfer). The scenario further incorporates the Execution tactic via T1059.001 (Command and Scripting Interpreter: PowerShell) and achieves Persistence through T1055 (Process Injection).
UC2 (Spyware) emphasizes post-compromise activities and spans multiple tactics. It begins with Privilege Escalation using T1134.001 (Access Token Manipulation: Token Impersonation/Theft), followed by Discovery through T1082 (System Information Discovery). The scenario concludes with Credential Access using T1003.001 (OS Credential Dumping: LSASS Memory).
UC3 (Ransomware) focuses on later-stage attack behavior, incorporating Command and Control via T1105 (Ingress Tool Transfer) and Execution through T1059.001 (PowerShell). The scenario culminates in the Impact tactic, represented by T1486 (Data Encrypted for Impact), which reflects the primary objective of Ransomware operations.
Based on those malware scenarios, a validation dataset was created by collecting memory dump features from normal operation of the Windows server, as well as from several executions of each malware scenario. Specifically, the final dataset included 20 benign samples from the Windows server, 20 samples from Trojan execution after the migration of the Meterpreter session to the legitimate Windows process, 20 samples from the Spyware execution when the credential dumping commands were executed, and 34 Ransomware samples during encryption.
Evaluating the binary TabNet and multiclass voting models on the aforementioned dataset revealed that the two malware detection models do not generalize well beyond their training environment, performing very poorly in other domains.
Drift-aware pre-processing techniques that realign feature distributions can mitigate this problem, ensuring that ML–based intrusion detection systems continue to perform reliably [
65].
One effective approach for distribution alignment is based on minimizing the Wasserstein distance, which measures the optimal transport cost between two probability distributions. Recent research demonstrates the applicability of Wasserstein-based domain adaptation in cybersecurity, showing that minimizing this distance between source and target domains can improve intrusion detection performance [
66]. The 1-D Wasserstein distance
between two one-dimensional probability distributions P and Q, with cumulative distribution functions
and
, is defined as follows:
Building on these principles, an affine transformation method (using scale and shift) provides a lightweight and practical way to align the distributions of numerical features. By optimizing the transformation to minimize the Wasserstein distance between the historical (source) and current (target) distributions, source-domain training data can be reconfigured to align with the target domain, as seen in
Figure 7,
Figure 8 and
Figure 9. This specific approach leverages both linear and log-space scaling and shifting to directly correct the drift of each feature, maintaining the validity of model assumptions and improving robustness against both natural and adversarial distribution shifts [
67].
The results of implementing this method, along with comparisons of performance with and without its use, are presented in
Section 4.2.5.
As a comparative approach, a transformer-based domain adaptation method similar to that described in [
66] was also evaluated. Each feature was treated as a token and embedded using a lightweight transformer encoder to capture inter-feature dependencies. The encoder was trained to map both source and target samples in a shared latent representation space [
64]. The target data used for training the encoder were excluded from testing to prevent data leakage. The training goal of the encoder was to minimize the sliced Wasserstein distance between the latent domain distributions, thereby promoting geometric alignment, while maintaining computational efficiency. The malware detection and classification models were then trained using the latent domain representations of the source data. Additionally, for a more mainstream comparison, CORAL domain adaptation methodology, as explained in [
68], was also examined.
The evaluation of both approaches was performed using the same test protocol and target-domain testbed dataset as the ones employed for the affine transformation method in
Section 4.2.5, the results of which are demonstrated in
Table 3 and
Table 4. As shown, applying transformer-based domain adaptation with such a small available target dataset proved detrimental: the source-domain accuracy of the proposed two stage pipeline dropped below 50%, while the target-domain accuracy only saw marginal improvement compared to the target-domain accuracy of the original models. This behavior indicates that the learned mapping distorted task-relevant feature geometry. On the other hand, applying CORAL domain adaptation yielded a 7% improvement in target-domain multiclass classification performance, but dropped target-domain binary model performance to 50%. As a result, both approaches were not pursued further in subsequent experiments.