1. Introduction
The rapid growth of malware threats remains one of the most critical threats to modern cybersecurity infrastructure, with approximately 230,000 new malware samples identified daily [
1]. Of particular concern is the growing prevalence of obfuscated malware, which employs sophisticated evasion techniques—including polymorphism, metamorphism, code encryption, and redundant code insertion—to dynamically alter its structure and behavior at runtime [
2]. These techniques render conventional detection approaches, such as signature-based and heuristic methods, largely ineffective, as they depend on static, predefined patterns that fail to generalize to previously unseen or dynamically mutating threats [
3]. The consequences extend beyond individual systems: successful obfuscated malware attacks have been linked to large-scale data breaches, ransomware incidents, and the compromise of critical national infrastructure, underscoring the urgency of developing more adaptive and robust detection mechanisms [
4,
5,
6].
Memory-based analysis has emerged as a promising method for detecting obfuscated malware, as it captures behavioral artifacts at runtime that persist even when disk-based traces are deliberately concealed or erased. By analyzing memory dumps, this approach can expose patterns of malicious execution that static analysis cannot access. However, existing memory-based detection frameworks are constrained by several significant limitations. Most rely on handcrafted feature engineering, which demands extensive domain expertise and does not scale efficiently with large or heterogeneous datasets [
7]. Furthermore, current frameworks achieve high accuracy primarily on traditional malware signatures while lacking optimization, limiting their robustness against rapidly evolving malware families.The increasing prevalence of memory-resident malware—which operates exclusively within volatile memory to evade both static and dynamic analysis—further compounds these challenges [
2]. There is, however, a clear need for detection systems capable of autonomously learning complex behavioral patterns from raw memory data while maintaining higher performance.
Deep learning (DL) has demonstrated considerable potential in addressing the limitations of traditional malware detection by enabling automatic extraction of high-level features from raw, high-dimensional data without manual feature engineering [
8]. Deep Neural Networks (DNNs) are effective at capturing structural and static characteristics of malware, while Recurrent Neural Networks (RNNs) are well-suited to modeling sequential and temporal behavioral patterns inherent in malware execution traces [
9]. Despite these capabilities, the practical performance of DL-based detection systems is highly sensitive to hyperparameter configuration, including learning rates, network architecture, neuron counts, and dropout rates. Suboptimal hyperparameter selection can substantially degrade model accuracy and efficiency. Particle Swarm Optimization (PSO), a nature-inspired metaheuristic algorithm, has shown strong potential for efficient hyperparameter tuning in complex optimization landscapes [
10]. Nevertheless, the integration of PSO-optimized hybrid DL architectures within a memory-based malware detection framework remains underexplored in the literature.
To address these gaps, this study proposes a memory-based hybrid deep learning framework for the detection of obfuscated malware, optimized using Particle Swarm Optimization. The framework integrates DNNs for structural feature analysis with RNNs for capturing temporal execution patterns, forming a complementary architecture capable of comprehensive behavioral profiling. PSO is applied to enhance the model’s performance. A preprocessing pipeline, incorporating data cleaning, normalization, and encoding, ensures the integrity and compatibility of raw memory dump data with the hybrid architecture. The framework is evaluated on a comprehensive malware dataset using standard classification metrics, including accuracy, precision, recall, F1 score, and ROC-AUC.
The main contributions of this paper are summarized as follows:
- 1.
A novel memory-based hybrid DNN–RNN architecture is proposed for the detection of obfuscated malware. The DNN component learns nonlinear structural relationships among memory-derived features, while the RNN component captures sequential behavioral dependencies that persist in memory execution traces even when obfuscation techniques alter surface-level file characteristics. This complementary design enables simultaneous analysis of structural and temporal behavioral signals, improving robustness against dynamically mutating malware.
- 2.
PSO is incorporated as an automated hyperparameter optimization mechanism for the hybrid DNN-RNN model. Unlike manual or ad hoc tuning, PSO navigates the high-dimensional hyperparameter search space efficiently, improving model convergence, detection accuracy, and generalization to unseen malware variants without requiring extensive expert intervention.
- 3.
Comprehensive empirical evaluation framework is conducted on the MemMal-D2024 dataset, comparing the proposed hybrid DNN-RNN architecture against standalone DNN and RNN baselines under an identical preprocessing and validation protocol. This controlled comparison isolates the contribution of each architectural component and quantifies the added value of PSO-based optimization for memory-based obfuscated malware detection.
The remainder of this paper is organized as follows:
Section 2 reviews related work on machine and deep learning-based malware detection and hyperparameter optimization;
Section 3 describes the proposed methodology;
Section 4 details the experimental setup and evaluation metrics;
Section 5 presents and discusses the results;
Section 6 presents the practical implications of the proposed framework and
Section 7 concludes this paper with directions for future research.
2. Related Work
This section reviews existing research on malware detection across three progressive themes: machine learning-based approaches, deep learning-based approaches, and PSO-augmented detection systems. Each theme is assessed in terms of its contributions and limitations, progressively establishing the research gap addressed by the proposed framework.
2.1. Machine Learning-Based Malware Detection
Machine learning approaches have been widely applied to malware detection, demonstrating strong performance across diverse datasets. Several studies have explored a variety of classifiers and feature selection strategies, including parallel ensemble classifiers [
11], network traffic-based features [
12], API call pattern analysis [
13,
14], and natural language processing-inspired feature extraction [
15], achieving detection accuracies ranging from 94% to 99%. Studies targeting the memory-based dataset specifically have reported a particularly high accuracy. Hossain and Islam [
7] proposed an ensemble-based framework combining gradient boosting classifiers with SMOTE balancing and chi-squared feature selection, achieving 99% detection accuracy in both binary and multi-class scenarios. Smith et al. [
16] evaluated seven classifiers, including Decision Trees, Random Forest, and AdaBoost, achieving an average accuracy of 99%, while Talukder et al. [
17] assessed RF, DT, MLP, and KNN models with RF reaching 99.99%. Similarly, Louk and Tama [
18] demonstrated that ensemble models such as XGBoost and Random Forest achieve accuracies exceeding 99%, and Dener et al. [
19] applied Logistic Regression to attain 99.94%. Ghazi and Raghava [
20] further applied wrapper-based feature selection with Random Forest on the same dataset, reaching 99.99% accuracy for obfuscated malware detection.
Despite these results, ML-based approaches share a fundamental limitation: They rely heavily on handcrafted feature engineering, which demands domain expertise. Such approaches are also inherently constrained in their ability to capture complex, non-linear behavioral patterns characteristic of sophisticated malware. Furthermore, the reviewed memory-based studies do not emphasize model optimization, limiting their adaptability.
2.2. Deep Learning-Based Malware Detection
Deep learning addresses the core limitation of ML by enabling automatic feature extraction from raw data. A range of architectures have been proposed, including CNN-based image classification and hybrid CNN-RNN models [
21,
22,
23]; RNN-based sequential analysis, including LSTM and hybrid recurrent architectures [
24,
25,
26]; memory forensics combined with ensemble classification [
27]; and hybrid recurrent architectures for behavioral API sequence analysis [
28], with reported accuracies generally ranging from 95% to 99.87% across benchmark datasets. Of particular relevance to this study, Kang et al. [
24] utilized an LSTM-based model with word2vec feature representation to capture sequential opcode and API call patterns for malware identification, achieving an accuracy of 97.59%, while Wu et al. [
25] showed that BiLSTM-GNN hybrid architectures can model complex API call dependencies, though with reduced recall against obfuscation techniques. Shaukat et al. [
21] demonstrated that combining CNN with SVM for image-based feature extraction achieves strong detection performance, while Waqar et al. [
26] showed that hybrid BLSTM-GRU architectures are highly effective for real-time malware detection in AIoT environments, confirming that hybrid approaches consistently improve detection robustness. More recently, deep CNN frameworks combining static and dynamic analysis through memory dump image representations have demonstrated improved generalization against obfuscation techniques [
29]. Maniriho et al. [
30] proposed MeMalDet, a memory analysis-based malware detection framework that uses deep autoencoders and a stacked ensemble under temporal evaluation settings. Their work introduces MemMal-D2024, an improved memory-analysis dataset that extends earlier memory-based malware datasets with timestamp attributes to support temporal evaluation.
However, several limitations remain in existing DL-based malware detection approaches. First, model performance is highly sensitive to hyperparameter configurations—such as learning rate, network architecture, neuron counts, and batch size—yet many studies rely on manual or ad hoc tuning rather than optimization techniques. This limits the performance of models across diverse and evolving malware datasets. Second, while prior work has explored image-based, sequence-based, and hybrid architectures, limited attention has been given to detecting memory-based obfuscated malware using optimized deep learning frameworks.
2.3. PSO-Augmented Malware Detection
Particle Swarm Optimization (PSO) has been widely adopted as an effective optimization technique to enhance the performance of machine learning and deep learning models in malware detection. Abbasi et al. [
31] used PSO to automate feature selection over ransomware API call and registry behaviors, achieving 97.33% accuracy with Logistic Regression, while Hossain et al. [
32] combined PSO-based feature selection with SVM and Random Forest for Android ransomware detection, achieving 81.58% on the CICAndMal2017 dataset. Sharma and Agrawal [
33] paired Binary PSO feature selection with a DNN classifier, reaching 94.92% accuracy on the Drebin dataset and demonstrating the value of combining PSO with deep learning architectures. Alazab et al. [
34] applied PSO for feature subset selection in a deep learning-based Android malware detection system, with a PSO-optimized Random Forest model achieving a True-Positive Rate of 91.6%, outperforming conventional feature selection methods. Al-Andoli et al. [
35] proposed an ensemble-based parallel DL classifier using a hybrid PSO-BP optimization method to tune DNN parameters for malware detection, achieving up to 100% accuracy across five benchmark datasets and demonstrating the effectiveness of PSO in optimizing DL model parameters beyond simple feature selection. Adebayo and Aziz [
36] integrated PSO with Apriori Association Rule mining, achieving 98.17% accuracy for Android malware detection.
While these studies confirm PSO’s measurable impact on detection performance, a critical gap remains: none of the reviewed studies target memory-based obfuscated malware detection, and none integrate PSO optimization within a hybrid DNN-RNN architecture.
2.4. Summary and Research Gaps
Table 1 summarizes the key studies most directly relevant to this work, restricted to those using the memory-based dataset to provide the most focused comparison. While machine learning approaches achieve high accuracy on memory-based malware detection, they rely on handcrafted features and lack adaptability. Deep learning methods address feature extraction but remain sensitive to hyperparameter configurations, often without optimization, limiting robustness. Although PSO-based approaches have demonstrated effectiveness in improving feature selection and model performance, their integration with advanced deep learning architectures for memory-based malware detection remains limited. These limitations collectively highlight a critical research gap: the absence of a unified framework that combines automated feature learning, sequential modeling, and optimization for the robust detection of obfuscated memory-resident malware. To address this, the proposed framework integrates Deep Neural Networks and Recurrent Neural Networks with Particle Swarm Optimization to evaluate its detection performance of memory-based malware dataset.
3. Methodology
3.1. Problem Definition
This work addresses the detection of obfuscated malware using memory-based behavioral evidence. Obfuscation malware can alter surface-level characteristics of malicious programs (e.g., packing, renaming, control-flow transformations), allowing signature-based and shallow static detectors to miss threats. Our goal is to learn discriminative patterns from memory snapshots that capture execution-time artifacts and behavioral traces, making detection more resilient to obfuscation. Memory analysis is therefore useful because malicious programs may still leave execution-time artifacts in memory, such as abnormal process behavior, loaded modules, handles, injected-code indicators, API hooks, and other memory-resident traces.
However, memory-based malware detection remains challenging for three main reasons. First, memory-derived features can be high-dimensional and may contain redundant or weakly informative attributes. Second, the relationships among memory artifacts may be nonlinear, making them difficult to capture using simple linear models. Third, deep learning models are sensitive to architectural and training hyperparameters, such as learning rate, hidden-layer size, and dropout rate. Poorly selected features or hyperparameters may reduce model stability and generalization.
We formulate malware detection as a supervised binary classification problem. Given a memory-derived feature vector , the model predicts a label , where 0 denotes benign and 1 denotes malicious. The objective is to learn a decision function that accurately distinguishes benign from malicious memory samples while reducing redundant features and improving model configuration through PSO-based optimization.
3.2. Proposed Approach
This study proposes a PSO-optimized deep learning framework for memory-based malware detection using the MemMal-D2024 dataset. The framework is designed to address the challenge of detecting obfuscated malware, where malicious programs may evade static signature-based detection while still producing detectable memory-resident artifacts during execution. The proposed approach learns discriminative patterns from memory-derived features and evaluates whether PSO-based feature selection and hyperparameter optimization can improve model performance and stability.
As shown in
Figure 1, the framework consists of three main stages: (i) data preprocessing, (ii) deep learning classification with PSO-based optimization, and (iii) evaluation and prediction. These stages correspond directly to the workflow shown in the figure, from preparing the MemMal-D2024 dataset to generating the final benign or malware prediction.
3.2.1. Stage 1: Data Preprocessing
The first stage prepares the MemMal-D2024 feature set for model training and evaluation. Labels are encoded into numerical form to support binary classification. The feature transformation step includes scaling the numerical memory-derived features to a common range, which improves the stability of gradient-based learning. Duplicate and missing-value handling are performed before model training to improve data quality. Feature selection is then applied to retain informative memory-derived attributes while reducing redundant inputs. The main predictive pipeline uses the original selected memory features without PCA to preserve interpretability; PCA is considered only as an optional ablation setting rather than as the primary predictive representation.
3.2.2. Stage 2: Deep Learning Classification with PSO-Based Optimization
After preprocessing, the dataset is split into training and test partitions within the repeated cross-validation protocol. The training data are used to train the DNN, RNN, and hybrid DNN + RNN models. The DNN learns nonlinear relationships among memory-derived features, while the RNN is used as a feature-dependency learner over a fixed ordered feature representation rather than as a temporal execution model. The hybrid model combines both components to capture complementary representations. PSO is integrated into this stage to support feature selection and hyperparameter optimization, including the learning rate, hidden-layer sizes, and dropout rate. All PSO-based optimization is performed using training data only within each fold to avoid data leakage.
3.2.3. Stage 3: Evaluation and Prediction
The optimized and non-optimized models are evaluated on held-out test data using accuracy, precision, recall, specificity, F1 score, ROC-AUC, and PR-AUC. Classical baselines are also evaluated under the same protocol to provide a fair comparison with lighter machine learning models. The final trained detector produces a binary prediction for each memory-feature instance, classifying it as either benign or malware.
3.3. Deep Learning Models
To capture complementary aspects of malware behavior, we evaluate three neural architectures.
3.3.1. Deep Neural Network (DNN)
A feed-forward DNN learns nonlinear decision boundaries over memory-derived feature vectors. The DNN primarily captures structural relationships among features. The DNN model consists of two fully connected hidden layers with ReLU activation and dropout regularization. The default hidden size was 128, and the default dropout rate was 0.2. The output layer contains two neurons corresponding to the benign and malicious classes.
3.3.2. Recurrent Neural Network (RNN)
Although RNNs are commonly used for temporal or sequential data, the dataset used in this study consists of tabular memory-derived features. Therefore, the RNN component is used as a feature-dependency learner over a fixed ordered representation of memory-derived attributes. This can be useful because memory-forensics features are not independent; feature groups derived from process listings, loaded modules, handles, services, process-view inconsistencies, and memory-anomaly indicators may interact in ways that help distinguish benign and malicious samples. The recurrent layer allows information to be propagated across feature positions and may capture dependencies among related memory artifacts. The RNN model uses a simple recurrent layer followed by a fully connected output layer. The default RNN hidden size was 64, and the output layer maps the hidden representation to the two binary classes.
3.3.3. Hybrid DNN + RNN
The hybrid model integrates both DNN and RNN components to jointly learn structural and temporal characteristics. Conceptually, one branch focuses on feature interactions (DNN) while another captures temporal/behavioral dependencies (RNN); their representations are fused and passed to a final classifier layer. This design aims to strengthen detection under evasion by combining complementary signals.
The classification module employs a hybrid deep learning model that combines a DNN and an RNN. The DNN component is used to model complex nonlinear relationships among memory features, while the RNN component is intended to capture sequential dependencies and pattern evolution that may be useful in representing malware behavior in memory. The rationale for this hybridization is that DNNs provide strong feature abstraction capabilities, whereas RNNs can improve sensitivity to temporal or ordered structure in the data. The use of a DNN–RNN hybrid architecture is motivated by the need for robustness against malware obfuscation. By jointly learning structural and sequential signatures of malicious behavior, the model can become less dependent on superficial patterns that are easily altered by obfuscation techniques. Hybrid deep architectures have been reported to improve predictive performance and adaptability in related applications [
37], and the same principle is adopted here to support improved anomaly representation, better generalization across malware variants, and increased resilience to code-level transformations.
3.4. Particle Swarm Optimization (PSO) Optimization
To enhance model performance, Particle Swarm Optimization (PSO) is incorporated as a hyperparameter optimization mechanism for the hybrid DNN–RNN model. PSO is well-suited to high-dimensional search spaces and offers an efficient alternative to exhaustive manual tuning. Previous cybersecurity studies have demonstrated strong performance gains when PSO is combined with advanced learners, including PSO-optimized XGBoost models [
38].
In the present work, PSO is used to optimize parameters such as the learning rate, the number of hidden layers, and the dropout rate. This PSO-guided optimization is intended to improve convergence behavior and detection accuracy while increasing robustness to obfuscated malware patterns [
22,
39].
We enhance the hybrid DNN + RNN by optimizing critical training and architectural hyperparameters using Particle Swarm Optimization (PSO). Each particle encodes a candidate hyperparameter configuration:
where
is the learning rate,
is the dropout rate,
is the RNN hidden size, and
is the DNN hidden size. These hyperparameters are not treated as separate competing objectives. Instead, each vector
s represents one complete candidate configuration for the hybrid DNN + RNN model. PSO evaluates each candidate by training the corresponding model on the internal training split and selecting the configuration that achieves the highest validation accuracy on the internal validation split.
PSO iteratively updates particle velocities and positions [
40,
41]:
where
is the inertia weight,
and
are cognitive/social coefficients,
,
is particle
i’s best known position, and
is the global best position.
The objective function for PSO is the validation performance of the DNN + RNN model (i.e., accuracy) while maintaining stable generalization and low overfitting. For PSO-based feature selection, the algorithm used 30 particles and 20 iterations, resulting in 600 feature-subset evaluations per run/fold. The inertia weight was , the cognitive coefficient was , and the social coefficient was . The procedure selected the top 20 features based on the particle scores, using XGBoost performance on an internal validation split as the fitness function.
Table 2 reports the PSO hyperparameter search space used in the experiments. The optimized parameters include the learning rate, RNN hidden size, DNN hidden size, and dropout rate. Continuous parameters were encoded directly, while discrete parameters were selected using index-based encoding. For comparison, Random Search (RS) used the same search space as PSO and the same number of evaluations, with 25 randomly sampled configurations per run/fold. Each configuration was evaluated for 5 epochs using the same internal validation split.
3.5. Training Procedure
All models are trained using the same preprocessing, feature pipeline, and evaluation protocol for comparability. The neural models were trained using an Adam optimizer, cross-entropy loss, batch size of 64, base learning rate of 0.001, maximum 30 epochs, early-stopping patience of 30, and default dropout rate of 0.2. The PSO-optimized variant selects hyperparameters that improve convergence, reduce loss, and enhance generalization across unseen data.
4. Experimental Design
The evaluation is designed to examine whether integrating PSO with deep learning models, specifically DNNs and RNNs, improves malware detection performance and addresses limitations commonly observed in traditional machine learning approaches for cybersecurity tasks. More specifically, the evaluation investigates the extent to which PSO contributes to optimizing the DNN–RNN architecture and training process and whether the resulting hybrid PSO-based deep learning framework provides superior accuracy and efficiency for malware detection relative to conventional machine learning methods.
4.1. Research Questions
The experimental design is guided by the following questions:
- 1.
How does feature analysis affect dataset quality and class separability?
- 2.
Which architecture best captures memory-based behavioral signals for distinguishing benign from malicious samples?
- 3.
How does integrating PSO with the hybrid DNN + RNN model affect malware detection performance?
4.2. Dataset
A robust and well-characterized dataset is essential for evaluating malware detection models under realistic conditions. In this study, we use the MemMal-D2024 dataset, an improved memory-analysis-based malware detection dataset for Windows systems. MemMal-D2024 is associated with the MeMalDet framework and was introduced for memory-analysis-based malware detection [
30]. The dataset provides memory-derived feature files for benign and malware samples, including combined benign samples, combined malware samples, and yearly malware samples from 2006 to 2021. In total, the version used in this study contains 58,168 instances, including 29,298 benign samples and 28,870 malware samples.
The dataset comprises 60 features and two labels: a binary label indicating benign versus malicious samples and a multiclass label indicating the malware family. Among the malicious samples, the reported distribution is approximately 32.5% Trojan Horse, 33.67% Spyware, and 33.8% Ransomware. Each instance is described by 60 features and is associated with two labels: (i) a binary label indicating whether the sample is benign or malicious, and (ii) a multiclass label specifying the malware family for malicious samples.
Table 3 summarizes the dataset composition and key characteristics used in our experiments. By relying on memory-derived features, MemMal-D2024 provides a suitable benchmark for assessing detection methods targeting malware that employs obfuscation and other evasion strategies, aligning with the objective of this work to develop PSO-optimized deep learning models for accurate memory-based malware detection.
4.3. Data Preprocessing
To ensure reliable learning and fair comparison across models, we apply a standardized preprocessing pipeline prior to training. The pipeline converts categorical fields into numeric form (when applicable), normalizes feature scales, and reduces dimensionality to mitigate redundancy and improve computational efficiency. To reduce the risk of data leakage, all preprocessing and model-selection steps were performed within each training fold only. The test fold was kept unseen during preprocessing, feature selection, hyperparameter tuning, and model training, and it was used only for final evaluation.
4.3.1. Data Numericalization and Label Encoding
Most machine learning and deep learning models require numerical inputs. Although the dataset feature set is primarily numeric, categorical fields (e.g., class labels or any non-numeric attributes, if present) must be mapped to integers. We therefore apply label encoding to transform categorical values into numeric identifiers. For the binary detection task, labels are encoded as two classes (benign vs. malicious).
4.3.2. Missing Values and Removing Duplicates
The dataset was also examined for missing values and duplicate records. The preprocessing results indicated 8611 duplicate rows, which were removed to eliminate redundancy and reduce the risk of bias during training. In addition, several columns (e.g., psxview.not_in_session, svcscan.nservices, and others) contained zero-only values and were considered for removal because of their lack of variance. A column-wise uniqueness check was then performed to identify attributes with no variation across all samples. Columns with only one unique value do not contribute discriminative information and can degrade efficiency. Therefore, such columns were flagged for removal to reduce computational complexity and improve downstream learning performance.
4.3.3. Feature Scaling
Following numericalization, we standardize each feature to stabilize optimization and prevent variables with larger numeric ranges from dominating gradient updates. We use z-score standardization:
where
x denotes the original feature value, and
and
are the mean and standard deviation of the corresponding feature computed on the training data. Normalization improves convergence behavior and is commonly adopted in malware detection pipelines [
42].
4.3.4. Dimensionality Reduction
High-dimensional representations can increase model complexity, training time, and the risk of overfitting.
To reduce noise and redundancy while retaining the dominant behavioral signal, we apply Principal Component Analysis (PCA) as a feature transformation step. PCA projects the standardized feature matrix
X onto a lower-dimensional subspace:
where
Z denotes the transformed feature matrix,
W contains the top-
k eigenvectors of the covariance matrix of
X. The value of
k is selected to preserve most of the explained variance while improving computational efficiency and reducing overfitting risk. To support RQ1, PCA was used only as an exploratory dimensionality-reduction analysis and not as the primary predictive pipeline. The purpose of PCA in this study is to examine whether the memory-derived feature space contains redundant structure and whether the dominant variance directions preserve separability between benign and malicious samples. This helps characterize the dataset and assess the quality of the feature representation. However, because PCA transforms original memory artifacts into abstract orthogonal components, it reduces direct interpretability. Therefore, the main predictive experiments are conducted using the original memory-derived features without PCA, while PCA is retained only for feature-space analysis.
The final cleaned dataset used for classification contained 49,557 records and 55 columns, as shown in
Table 4. Class distribution remained relatively balanced, with 26,614 benign samples (class 0) and 22,943 malicious samples (class 1), which reduces the likelihood of severe class imbalance effects on model training and evaluation. To preserve interpretability in the applied cybersecurity setting, the main predictive pipeline is based on the original memory-derived features without PCA. This allows the retained features to remain semantically meaningful and enables security analysts to examine which memory artifacts contribute to detection decisions. PCA is therefore not used in the main reported experiments.
4.3.5. Cross-Validation Protocol
After preprocessing, model evaluation is conducted using repeated stratified k-fold cross-validation. Specifically, we use 5-fold stratified cross-validation repeated two times, resulting in 10 train–test evaluations. Stratification preserves the class distribution in each fold. For every fold, standardization and PCA are fitted only on the training partition and then applied to the corresponding test partition to avoid information leakage.
4.4. Evaluation Protocol and Baselines
To ensure reproducibility and consistency, the proposed framework is evaluated using a repeated KFold with five splits and two repeats. The primary evaluation setting focuses on binary malware detection (benign versus malicious), although the same methodology can be extended to multiclass malware-family classification. After training, the model is tested on unseen samples, and the predictions are analyzed using standard classification metrics and diagnostic curves.
We evaluate the proposed models under a consistent experimental protocol. The main neural architectures include the following:
- 1.
DNN, which learns nonlinear representations from the original memory-derived features,
- 2.
RNN, which is a feature-dependency learner over the fixed ordered feature representation,
- 3.
Hybrid DNN + RNN, which combines feedforward representation learning with recurrent feature-dependency modeling,
- 4.
Hybrid DNN + RNN with PSO, which integrates PSO-based optimization for feature selection and hyperparameter tuning.
To address the comparison with lighter baselines, we also include Logistic Regression, Random Forest, and XGBoost under the same evaluation protocol. All models use the same cleaned dataset and preprocessing procedure. The main predictive pipeline is based on the original memory-derived features without PCA to preserve feature interpretability.
4.5. Evaluation Metrics
Performance is assessed using classification metrics derived from the confusion matrix counts: true positives (), true negatives (), false positives (), and false negatives (). In this context, denotes malicious samples correctly classified as malicious, denotes benign samples correctly classified as benign, denotes benign samples incorrectly classified as malicious, and denotes malicious samples incorrectly classified as benign. This matrix provides a compact and interpretable view of both correct predictions and error patterns.
4.5.1. Accuracy
Accuracy is used to measure the proportion of correctly classified samples:
Although accuracy provides a useful overall indication of model performance, it may be insufficient on its own in malware detection settings, particularly when the practical costs of false positives and false negatives differ.
4.5.2. Precision
Precision quantifies the proportion of predicted malicious samples that are actually malicious:
This metric is especially important when false alarms are costly, as it reflects the trustworthiness of positive predictions.
4.5.3. Recall (Sensitivity)
Measures the proportion of actual malicious samples correctly detected by the model:
In malware detection, recall is critical because low recall implies that malicious samples are being missed.
4.5.4. F1-Score
To balance precision and recall, the F1 score is computed as the harmonic mean of the two metrics:
The F1 score is particularly informative when performance must be assessed under competing objectives, such as minimizing both false positives and false negatives.
4.5.5. Specificity
Specificity measures the proportion of actual benign samples correctly identified as benign by the model:
In malware detection, high specificity is essential to minimize false alarms, ensuring that legitimate processes are not incorrectly flagged as malicious.
4.5.6. AUC-ROC and Precision–Recall Curve
In addition to threshold-dependent metrics, the Receiver Operating Characteristic (ROC) curve and the Area Under the ROC Curve (AUC-ROC) are used to assess discrimination performance across a range of classification thresholds. The ROC curve plots the true positive rate against the false positive rate, thereby illustrating the trade-off between detection sensitivity and false alarm rate. The AUC-ROC provides a threshold-independent summary of this behavior, where an AUC of 1.0 indicates perfect discrimination, an AUC of 0.5 indicates random guessing, and an AUC below 0.5 indicates performance worse than random guessing. These measures are particularly useful in malware detection because they capture how classification behavior changes with threshold selection.
We report AUC-ROC to summarize separability across decision thresholds. We also analyze the precision–recall curve, which is especially informative when class proportions are not perfectly balanced.
4.6. Experimental Setup
The proposed method is implemented and evaluated in Python 3.9.0 on a 64-bit system with an Intel(R) Core(TM) i5-10300H CPU @ 2.50 GHz, an NVIDIA GTX 1650 GPU (4 GB GDDR6), and 16 GB of RAM. The software environment includes standard scientific libraries (e.g., NumPy 2.0.2, Pandas 2.2.2, and Matplotlib 3.10.0) and a deep learning framework (i.e., PyTorch 2.7.1); when needed, GPU-enabled cloud resources are used to accelerate training and hyperparameter optimization.
5. Results and Discussion
This section reports the empirical results with respect to the research questions. We evaluate (i) the impact of feature analysis on data quality and separability and (ii) three deep architectures (DNN, RNN, and hybrid DNN + RNN) and (iii) quantify the contribution of Particle Swarm Optimization (PSO) to detection performance under a memory-analysis benchmark designed to reflect obfuscation-resilient signals.
5.1. RQ1: How Do Feature Analysis Affect Dataset Quality and Class Separability?
Principal Component Analysis (PCA) was applied as a dimensionality reduction technique to extract the most informative features while preserving the dominant variance in the dataset. PCA transforms the original feature space into a smaller set of orthogonal principal components, each capturing a portion of the total variance. PCA was applied to the cleaned feature matrix. To determine an appropriate number of components, the cumulative explained variance ratio was computed and visualized in
Figure 2.
Figure 2a shows that the resulting curve rises sharply in the early components and begins to plateau after approximately 20 components, indicating that a relatively small number of principal components preserves most of the data variance. This observation supports efficient dimensionality reduction without substantial information loss.
Figure 2b reports the variance ratios for the first 20 components. The explained variance ratios of the first principal components further confirm the concentration of variance in the early components. In particular, the first component captures the largest share of variance, followed by a substantial contribution from the second component, while later components provide progressively smaller increments. This supports the use of PCA as an exploratory dimensionality-reduction analysis of the dataset feature space.
For visual comparison, PCA, t-SNE, and LDA were also used to project the high-dimensional dataset into low-dimensional representations, as shown in
Figure 3.
Figure 3a shows that PCA provided moderate separation between benign and malicious samples, indicating preservation of discriminative variance.
Figure 3b shows that t-SNE produced clearer nonlinear cluster separation, which is useful for exploratory analysis.
Figure 3c shows that LDA, being supervised, exhibited the strongest class separability; however, this should be interpreted as evidence of feature-space separability rather than direct evidence of real-world robustness. Overall, the projections confirm that the extracted feature space is highly discriminative for binary malware detection.
5.2. RQ2: Which Architecture Best Captures Memory-Based Behavioral Signals for Distinguishing Benign from Malicious Samples?
We report the performance of the evaluated models in
Table 5. To address the concern that marginal differences may not be meaningful in near-saturated results, the table reports the mean and standard deviation across the repeated runs/folds. Classical machine learning baselines are also included to assess whether the added complexity of deep learning provides a measurable benefit over lighter models.
Overall, all models achieve high performance, indicating that the memory-derived features are highly discriminative for distinguishing benign and malicious samples. Among the neural architectures, DNN + RNN achieves the highest F1 score, followed by DNN and RNN. However, the absolute differences are small. As later shown in
Section 5.4, the improvement of DNN + RNN over DNN is statistically significant in terms of F1 score, while the difference between DNN and RNN is not statistically significant. Therefore, the hybrid model provides a measurable but modest improvement over the standalone neural models.
The classical baselines also achieve highly competitive results. Random Forest and XGBoost obtain performance comparable to the deep learning models, confirming that the dataset is highly separable under the reported evaluation setting.
Table 6 reports the runtime comparison across the main models. Although Random Forest achieves the highest aggregate accuracy and F1 score, it has a substantially higher inference time than the neural models. In particular, Random Forest requires 0.1108 s on average for inference, compared with 0.0075 s for DNN + RNN. This indicates that the hybrid model provides a more favorable inference-time profile, which may be useful in deployment settings where rapid prediction is important.
The RNN baseline performs competitively, but the dataset consists of tabular memory-derived features rather than explicit temporal execution traces. Therefore, the RNN should be interpreted as a feature-dependency learner over a fixed ordered feature representation rather than as a model of chronological malware behavior. The DNN baseline also performs strongly, indicating that the original memory-derived features contain highly discriminative structural patterns. The hybrid DNN + RNN achieves the best performance among the neural models, but the improvement is modest and should not be overstated given the similarly strong performance of the classical baselines.
5.3. RQ3: How Does Integrating PSO-Optimized Hybrid Model Affect Malware Detection Performance?
The PSO-optimized hybrid model demonstrates clear convergence behavior across all three diagnostic plots in
Figure 4.
Figure 4a shows that training loss begins at approximately 0.0025 and decreases steadily, stabilizing near zero by epoch 10 with minor residual fluctuations thereafter. Test loss is consistently lower than training loss throughout training, beginning near zero from epoch 1 and remaining almost flat across subsequent epochs. As discussed, this pattern is expected under dropout regularization, which is active during training but disabled at inference, and does not indicate overfitting or data leakage given the careful partitioning of train and test folds.
Figure 4b shows the same dynamics for accuracy: Test accuracy remains near 100% from the early epochs, while training accuracy exhibits early instability before converging toward the test curve. This transient gap is consistent with the stochastic effect of dropout and mini-batch sampling during early training, where gradient updates are noisy before the optimizer settles. Importantly, training accuracy converges toward test accuracy rather than diverging from it, suggesting that the model is not overfitting.
Figure 4c presents the AUC behavior: both train and test AUC remain approximately 1.000 across all 30 epochs, with no observable degradation. Taken together, the three curves indicate stable optimization and robust convergence under the reported evaluation setting.
Table 7 compares the PSO-based configurations across the three evaluated neural architectures. The results show that (DNN + RNN) + PSO achieves the highest F1 score among the PSO-based neural configurations, followed by DNN + PSO and RNN + PSO. However, as shown in
Section 5.4, the improvement over the non-optimized DNN + RNN model is not statistically significant. Thus, PSO should be interpreted primarily as an automated tuning mechanism rather than as a source of substantial predictive improvement.
Table 8 compares PSO with RS (RS) for tuning the hybrid DNN + RNN model. Both tuning strategies achieve near-saturated performance. The PSO-tuned hybrid model achieves a slightly higher F1 score than the random-search-tuned hybrid model, while RS achieves slightly higher specificity and precision. However, the difference between the two tuning methods is not statistically significant, as shown in
Section 5.4. Therefore, PSO provides an automated tuning mechanism with competitive performance, but its advantage over simpler RS is limited in the present setting.
Table 9 reports the final PSO-selected values corresponding to the solution vector in Equation (
1), including the learning rate
, dropout parameter
, RNN hidden size, DNN hidden size, and the resulting accuracy. Seven of the ten PSO runs achieved a best validation accuracy of 1.0000, while the remaining runs achieved values between 0.9998 and 0.9999. This indicates that several hyperparameter configurations achieved near-saturated validation performance within the tested search space.
5.4. Statistical Significance Analysis
To determine whether the observed performance differences are statistically meaningful, we performed pairwise statistical tests using the F1 scores obtained across the repeated runs/folds. Since the same data splits were used across models, the comparisons were conducted as paired tests. Specifically, we used the Wilcoxon signed-rank test with a significance level of , which is suitable for comparing paired performance scores without assuming normality. We also report the effect size to indicate the magnitude of the difference between paired model performances.
The results show that the hybrid DNN + RNN model significantly improves over the standalone DNN model (, effect size ) and the standalone RNN model (, effect size ). In contrast, the difference between DNN and RNN is not statistically significant (, effect size ). This indicates that the hybrid architecture provides a statistically supported improvement over the individual neural architectures, although the absolute performance difference remains small.
For the optimized models, (DNN + RNN) + PSO significantly outperforms RNN + PSO (, effect size ), but its difference from DNN + PSO is not statistically significant (, effect size ). More importantly, the difference between the non-optimized DNN + RNN model and (DNN + RNN) + PSO is not statistically significant (, effect size ). Similarly, the difference between (DNN + RNN) + PSO and (DNN + RNN) + RS is not statistically significant (, effect size ). These results suggest that PSO provides competitive automated tuning, but it does not produce a statistically significant improvement over either the non-optimized hybrid model or random search in the reported setting.
Finally, Random Forest achieves a slightly higher mean F1 score than (DNN + RNN) + PSO; however, this difference is also not statistically significant (, effect size ). Therefore, the strongest classical baseline and the PSO-optimized hybrid model show comparable predictive performance, while the runtime results indicate that the hybrid model provides a substantially faster inference time.
6. Practical Implications
The proposed PSO-optimized hybrid DNN + RNN framework has several practical implications for real-world enterprise-level deployment scenarios. This section discusses how the framework can be integrated into existing security infrastructures, its applicability in enterprise-level environments, and the trade-offs that practitioners should consider when adopting it.
6.1. Integration with Existing Security Systems
The framework is designed to operate on memory dump data, making it compatible with existing Endpoint Detection and Response (EDR) tools and digital forensic platforms that already collect memory snapshots as part of their standard monitoring workflows. Security Operations Centers (SOCs) could incorporate the framework as an additional detection layer alongside traditional signature-based tools, enabling analysts to flag suspicious processes that evade conventional methods. Since the model produces binary classification outputs with very low false-positive rates, it can be integrated into automated alert pipelines without significantly increasing the workload of the SOC team.
6.2. Mitigation of False Alerts in SOC Environments
The high specificity achieved by our framework is particularly valuable in large-scale enterprise environments, where false alarms are costly in terms of the workload of the SOC team. For practical deployment, the high precision of our proposed framework ensures that automated incident response triggers—such as node isolation or process suspension—can be executed with high confidence, thereby reducing the manual verification burden on security analysts.
6.3. Cost–Benefit Trade-Off of PSO Optimization
The use of PSO for hyperparameter tuning reduces the need for manual configuration, which is both time-consuming and dependent on expert knowledge. In practice, this means that security teams without deep machine learning expertise can deploy a well-optimized model without extensive manual experimentation. The additional training cost introduced by PSO is a one-time investment that yields a more accurate and stable model, making it a worthwhile trade-off in most operational settings. That said, organizations with limited computational resources may want to consider lighter optimization alternatives, such as RS or Bayesian optimization, as discussed in
Section 7.
6.4. Computational Overhead and Edge Deployment
The hybrid DNN + RNN model introduces additional inference cost compared to a single architecture, and the PSO optimization loop adds one-time training overhead. For real-time SOC pipelines, inference costs remain tractable on commodity GPU hardware; however, deployment on resource-constrained endpoints would require model compression. A compressed model, using pruning, quantization, or knowledge distillation, could preserve most of the detection performance while substantially reducing memory and compute footprint, making on-host detection on edge endpoints practical.
6.5. Scalability to Enterprise Workloads
At enterprise scale, memory dumps are generated continuously across thousands of endpoints. The framework supports horizontal scaling because the inference is stateless per sample. PCA components can be fitted offline and serialized for reuse. Then, inference can be performed on shared GPU workers within the SOC backend. Periodic retraining (e.g., weekly) is recommended to incorporate newly collected samples, with PSO rerun only when retraining causes performance drift in order to keep the optimization budget bounded and to address the evolution of malware.
7. Conclusions and Future Directions
This study proposed a memory-based hybrid deep learning framework for detecting obfuscated malware, combining DNNs and RNNs optimized using PSO. By jointly learning structural and temporal behavioral patterns from memory dump data, the framework addresses key limitations of traditional detection approaches, particularly their inability to generalize to new and rapidly evolving malware. Experiments on the MemMal-D2024 dataset showed that the PSO-optimized hybrid model achieved 99.98% accuracy, outperforming all standalone neural models baselines, making it suitable for SOC pipelines.
Despite these advances, important deficiencies in the broader problem of memory-based obfuscated-malware detection remain unresolved. First, evaluation on a single benchmark—even one designed for obfuscated and memory-resident threats—is not sufficient evidence of generalization across malware families and packers seen in the wild. Second, the framework, like other learned detectors, is exposed to concept drift as obfuscation techniques evolve, and its behavior on zero-day variants is not yet characterized. Third, the strong class separation shown in the LDA projection may make the detection task appear easier compared with conditions encountered in real-world deployments. These are limitations of the problem and the available data, not only of the model.
Looking ahead, several future research directions remain open. These include exploring alternative hyperparameter optimization strategies such as Genetic Algorithms and Bayesian Optimization, applying model compression techniques such as pruning and knowledge distillation to support deployment on resource-constrained devices, and integrating explainable AI methods such as SHAP and LIME to improve model transparency and trust in operational settings. Extending the framework to support multiclass malware family classification and real-time detection scenarios would further strengthen its practical applicability. Overall, this work provides a strong and reliable foundation for advancing memory-based malware detection in modern cybersecurity environments.
Author Contributions
Conceptualization, A.A. (Amal Alazba), K.A., S.A. and A.A. (Abdulmajeed Alameer); methodology, A.A. (Amal Alazba), K.A., S.A. and A.A. (Abdulmajeed Alameer); software, K.A. and A.A. (Abdulmajeed Alameer); validation, S.A. and A.A. (Abdulmajeed Alameer); formal analysis, A.A. (Amal Alazba) and A.A. (Abdulmajeed Alameer); investigation, A.A. (Amal Alazba), K.A., S.A. and A.A. (Abdulmajeed Alameer); resources, A.A. (Amal Alazba); data curation, K.A.; writing—original draft preparation, A.A. (Amal Alazba), K.A., S.A. and A.A. (Abdulmajeed Alameer); writing—review and editing, A.A. (Amal Alazba), S.A. and A.A. (Abdulmajeed Alameer); visualization, K.A. and A.A. (Abdulmajeed Alameer); supervision, A.A. (Amal Alazba); project administration, A.A. (Amal Alazba). All authors have read and agreed to the published version of the manuscript.
Funding
This research project was supported by a grant from the Ongoing Research Funding program (ORF-2026-1320), King Saud University, Riyadh, Saudi Arabia.
Data Availability Statement
Acknowledgments
The authors would like to thank Ongoing Research Funding Program, (ORF-2026-1320), King Saud University, Riyadh, Saudi Arabia for financial support.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Tayyab, U.U.H.; Khan, F.B.; Durad, M.H.; Khan, A.; Lee, Y.S. A Survey of the Recent Trends in Deep Learning Based Malware Detection. J. Cybersecur. Priv. 2022, 2, 800–829. [Google Scholar] [CrossRef]
- Kim, J.Y.; Cho, S.B. Obfuscated Malware Detection Using Deep Generative Model based on Global/Local Features. Comput. Secur. 2022, 112, 102501. [Google Scholar] [CrossRef]
- Naseer, M.; Rusdi, J.F.; Shanono, N.M.; Salam, S.; Muslim, Z.B.; Abu, N.A.; Abadi, I. Malware Detection: Issues and Challenges. J. Phys. Conf. Ser. 2021, 1807, 012011. [Google Scholar] [CrossRef]
- Gandhi, V.; Gajjar, T. Enhancing Fraud Detection in Financial Transactions through Cyber Security Measures. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2024, 10, 364–371. [Google Scholar] [CrossRef]
- Saeed, M.A.H. Malware in Computer Systems: Problems and Solutions. IJID Int. J. Inform. Dev. 2020, 9, 1–8. [Google Scholar] [CrossRef]
- Jameel, N.F.M.; Jawhar, M.M.T. A Survey on Malware Attacks Analysis and Detected. Int. Res. J. Innov. Eng. Technol. 2023, 7, 32–40. [Google Scholar] [CrossRef]
- Hossain, M.A.; Islam, M.S. Enhanced detection of obfuscated malware in memory dumps: A machine learning approach for advanced cybersecurity. Cybersecurity 2024, 7, 16. [Google Scholar] [CrossRef]
- Sahin, M.; Bahtiyar, S. A Survey on Malware Detection with Deep Learning. In Proceedings of the 13th International Conference on Security of Information and Networks, SIN 2020, New York, NY, USA, 4–7 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Redhu, A.; Choudhary, P.; Srinivasan, K.; Das, T.K. Deep learning-powered malware detection in cyberspace: A contemporary review. Front. Phys. 2024, 12, 1349463. [Google Scholar] [CrossRef]
- Tayebi, M.; El Kafhali, S. Deep Neural Networks Hyperparameter Optimization Using Particle Swarm Optimization for Detecting Frauds Transactions. In Proceedings of the Advances on Smart and Soft Computing; Saeed, F., Al-Hadhrami, T., Mohammed, E., Al-Sarem, M., Eds.; Springer: Singapore, 2022; pp. 507–516. [Google Scholar]
- Garg, S.; Baliyan, N. A novel parallel classifier scheme for vulnerability detection in Android. Comput. Electr. Eng. 2019, 77, 12–26. [Google Scholar] [CrossRef]
- Wang, S.; Chen, Z.; Yan, Q.; Yang, B.; Peng, L.; Jia, Z. A mobile malware detection method using behavior features in network traffic. J. Netw. Comput. Appl. 2019, 133, 15–25. [Google Scholar] [CrossRef]
- Bahtiyar, Ş.; Yaman, M.B.; Altıniğne, C.Y. A multi-dimensional machine learning approach to predict advanced malware. Comput. Netw. 2019, 160, 118–129. [Google Scholar] [CrossRef]
- Lu, X.; Jiang, F.; Zhou, X.; Yi, S.; Sha, J.; Pietro, L. ASSCA: API sequence and statistics features combined architecture for malware detection. Comput. Netw. 2019, 157, 99–111. [Google Scholar] [CrossRef]
- Karbab, E.B.; Debbabi, M. MalDy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports. Digit. Investig. 2019, 28, S77–S87. [Google Scholar] [CrossRef]
- Smith, D.; Khorsandroo, S.; Roy, K. Supervised and Unsupervised Learning Techniques Utilizing Malware Datasets. In Proceedings of the 2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC), Houston, TX, USA, 7–9 February 2023; pp. 1–7. [Google Scholar] [CrossRef]
- Talukder, M.A.; Hasan, K.F.; Islam, M.M.; Uddin, M.A.; Akhter, A.; Yousuf, M.A.; Alharbi, F.; Moni, M.A. A Dependable Hybrid Machine Learning Model for Network Intrusion Detection. arXiv 2023, arXiv:2212.04546. [Google Scholar] [CrossRef]
- Louk, M.H.L.; Tama, B.A. Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit. Algorithms 2022, 15, 332. [Google Scholar] [CrossRef]
- Dener, M.; Ok, G.; Orman, A. Malware Detection Using Memory Analysis Data in Big Data Environment. Appl. Sci. 2022, 12, 8604. [Google Scholar] [CrossRef]
- Ghazi, M.R.; Raghava, N.S. Machine Learning Based Obfuscated Malware Detection in the Cloud Environment with Nature-Inspired Feature Selection. In Proceedings of the 2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), Aligarh, India, 26–27 November 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Shaukat, K.; Luo, S.; Varadharajan, V. A novel deep learning-based approach for malware detection. Eng. Appl. Artif. Intell. 2023, 122, 106030. [Google Scholar] [CrossRef]
- Venkatraman, S.; Alazab, M.; Vinayakumar, R. A hybrid deep learning image-based analysis for effective malware detection. J. Inf. Secur. Appl. 2019, 47, 377–389. [Google Scholar] [CrossRef]
- Zhu, H.; Wei, H.; Wang, L.; Xu, Z.; Sheng, V.S. An effective end-to-end android malware detection method. Expert Syst. Appl. 2023, 218, 119593. [Google Scholar] [CrossRef]
- Kang, J.; Jang, S.; Li, S.; Jeong, Y.S.; Sung, Y. Long short-term memory-based Malware classification method for information security. Comput. Electr. Eng. 2019, 77, 366–375. [Google Scholar] [CrossRef]
- Wu, Y.; Shi, J.; Wang, P.; Zeng, D.; Sun, C. DeepCatra: Learning flow- and graph-based behaviours for Android malware detection. IET Inf. Secur. 2023, 17, 118–130. [Google Scholar] [CrossRef]
- Waqar, M.; Fareed, S.; Kim, A.; Ur, S.; Imran, M.; Yaseen, M. Malware Detection in Android IoT Systems Using Deep Learning. Comput. Mater. Contin. 2022, 74, 4399–4415. [Google Scholar] [CrossRef]
- Mystakidis, A.; Kalogiannnis, G.; Vakakis, N.; Altanis, N.; Milousi, K.; Somarakis, I.; Mihalachi, G.; Mazi, M.; Sotos, D.; Voulgaridis, A.; et al. XAI-Driven Malware Detection from Memory Artifacts: An Alert-Driven AI Framework with TabNet and Ensemble Classification. AI 2026, 7, 66. [Google Scholar] [CrossRef]
- Vladov, S.; Vysotska, V.; Varlakhov, V.; Nazarkevych, M.; Bolvinov, S.; Piadyshev, V. Innovative Method for Detecting Malware by Analysing API Request Sequences Based on a Hybrid Recurrent Neural Network for Applied Forensic Auditing. Appl. Syst. Innov. 2025, 8, 156. [Google Scholar] [CrossRef]
- Ashawa, M.; McGregor, R.; Owoh, N.; Osamor, J.; Adejoh, J. Static and Dynamic Malware Analysis Using CycleGAN Data Augmentation and Deep Learning Techniques. Appl. Sci. 2025, 15, 9830. [Google Scholar] [CrossRef]
- Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. MeMalDet: A memory analysis-based malware detection framework using deep autoencoders and stacked ensemble under temporal evaluations. Comput. Secur. 2024, 142, 103864. [Google Scholar] [CrossRef]
- Abbasi, M.S.; Al-Sahaf, H.; Mansoori, M.; Welch, I. Behavior-based ransomware classification: A particle swarm optimization wrapper-based approach for feature selection. Appl. Soft Comput. 2022, 121, 108744. [Google Scholar] [CrossRef]
- Hossain, M.S.; Hasan, N.; Samad, M.A.; Shakhawat, H.M.; Karmoker, J.; Ahmed, F.; Fuad, K.F.M.N.; Choi, K. Android Ransomware Detection From Traffic Analysis Using Metaheuristic Feature Selection. IEEE Access 2022, 10, 128754–128763. [Google Scholar] [CrossRef]
- Sharma, R.M.; Agrawal, C.P. A BPSO and Deep Learning Based Hybrid Approach for Android Feature Selection and Malware Detection. In Proceedings of the 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT), Indore, India, 23–24 April 2022; pp. 628–634. [Google Scholar] [CrossRef]
- Alazab, M.; Alazab, M.; Shalaginov, A.; Mesleh, A.; Awajan, A. Intelligent mobile malware detection using permission requests and API calls. Future Gener. Comput. Syst. 2020, 107, 509–521. [Google Scholar] [CrossRef]
- Al-Andoli, M.N.; Sim, K.S.; Tan, S.C.; Goh, P.Y.; Lim, C.P. An Ensemble-Based Parallel Deep Learning Classifier with PSO-BP Optimization for Malware Detection. IEEE Access 2023, 11, 76330–76346. [Google Scholar] [CrossRef]
- Adebayo, O.S.; Abdul Aziz, N. Improved Malware Detection Model with Apriori Association Rule and Particle Swarm Optimization. Secur. Commun. Netw. 2019, 2019, 2850932. [Google Scholar] [CrossRef]
- Sudhamathi, T.; Perumal, K. A novel hybrid DNN-RNN framework for precise crop yield prediction. Int. J. Syst. Assur. Eng. Manag. 2024, 1–13. [Google Scholar] [CrossRef]
- Sheikhi, S.; Kostakos, P. Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Comput. Secur. 2024, 142, 103885. [Google Scholar] [CrossRef]
- Fu, X.; Jiang, C.; Li, C.; Li, J.; Zhu, X.; Li, F. A hybrid approach for Android malware detection using improved multi-scale convolutional neural networks and residual networks. Expert Syst. Appl. 2024, 249, 123675. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
- Shi, Y.; Eberhart, R. A modified particle swarm optimizer. In Proceedings of the Evolutionary Computation Proceedings, Anchorage, AK, USA, 4–9 May 1998; Volume 890, pp. 69–73. [Google Scholar]
- Gao, X.; Hu, C.; Shan, C.; Liu, B.; Niu, Z.; Xie, H. Malware classification for the cloud via semi-supervised transfer learning. J. Inf. Secur. Appl. 2020, 55, 102661. [Google Scholar] [CrossRef]
Figure 1.
Overview of the proposed memory-based malware detection framework. The pipeline consists of data preprocessing, train–test partitioning, deep learning classification using DNN, RNN, and hybrid DNN + RNN models, PSO-based optimization, evaluation, and final binary prediction as benign or malware.
Figure 1.
Overview of the proposed memory-based malware detection framework. The pipeline consists of data preprocessing, train–test partitioning, deep learning classification using DNN, RNN, and hybrid DNN + RNN models, PSO-based optimization, evaluation, and final binary prediction as benign or malware.
Figure 2.
PCA variance analysis: (a) cumulative explained variance ratio versus number of principal components and (b) individual explained variance ratios for the first 20 principal components.
Figure 2.
PCA variance analysis: (a) cumulative explained variance ratio versus number of principal components and (b) individual explained variance ratios for the first 20 principal components.
Figure 3.
Class separability analysis for MemMal-D2024: (a) PCA projection using the first two principal components, (b) t-SNE two-dimensional projection, and (c) LDA one-dimensional mirrored projection.
Figure 3.
Class separability analysis for MemMal-D2024: (a) PCA projection using the first two principal components, (b) t-SNE two-dimensional projection, and (c) LDA one-dimensional mirrored projection.
Figure 4.
Training and testing curves for the PSO-optimized hybrid DNN + RNN model over 30 epochs: (a) loss, (b) accuracy, and (c) AUC.
Figure 4.
Training and testing curves for the PSO-optimized hybrid DNN + RNN model over 30 epochs: (a) loss, (b) accuracy, and (c) AUC.
Table 1.
Comparison of related work and the proposed approach.
Table 1.
Comparison of related work and the proposed approach.
| Authors | Feature Selection | Detection Tech. | Opt. Tech. |
|---|
| Ghazi and Raghava [20] | Wrapper-based MA | RF | — |
| Smith et al. [16] | — | DT, RF, AB, KNN, SGD, ET, GNB | — |
| Talukder et al. [17] | — | RF, DT, MLP, KNN | — |
| Louk and Tama [18] | — | XGB, RF | — |
| Dener et al. [19] | — | LR | — |
| Maniriho et al. [30] | Autoencoder | Deep autoencoders, stacked ensemble | — |
| Our approach | PCA, t-SNE, LDA | Hybrid DNN + RNN | PSO |
Table 2.
PSO hyperparameter search space and encoding.
Table 2.
PSO hyperparameter search space and encoding.
| Hyperparameter | Search Range | Scale |
|---|
| Learning rate | to | Log |
| RNN hidden size | | Discrete |
| DNN hidden size | | Discrete |
| Dropout rate | 0.05 to 0.50 | Linear |
Table 3.
Dataset composition and details (MemMal-D2024).
Table 3.
Dataset composition and details (MemMal-D2024).
| Attribute | Description/Details |
|---|
| Total records | 58,168 instances |
| Class distribution | 29,298 benign and 28,870 malicious samples |
| Features | 60 Memory-derived features extracted from Windows memory artifacts |
| Labels | Binary label (benign vs. malicious) and multiclass malware-category label |
Table 4.
Dataset (MemMal-D2024) details after preprocessing.
Table 4.
Dataset (MemMal-D2024) details after preprocessing.
| Attribute | Description/Details |
|---|
| Total records | 49,557 memory dump samples |
| Class distribution | Benign (26,614) and malicious (22,943) |
| Number of features | 55 memory-derived features |
Table 5.
Final performance comparison across evaluated models.
Table 5.
Final performance comparison across evaluated models.
| Model | Accuracy | Specificity | Precision | Recall | F1 Score |
|---|
| Logistic Regression | 99.95% ± 0.02 | 0.9996 ± 0.0002 | 0.9996 ± 0.0002 | 0.9994 ± 0.0003 | 0.9995 ± 0.0002 |
| Random Forest | 99.99% ± 0.01 | 0.9999 ± 0.0001 | 0.9999 ± 0.0001 | 0.9999 ± 0.0001 | 0.9999 ± 0.0001 |
| XGBoost | 99.99% ± 0.01 | 0.9999 ± 0.0001 | 0.9998 ± 0.0001 | 0.9999 ± 0.0002 | 0.9999 ± 0.0002 |
| RNN | 99.93% ± 0.07 | 0.9992 ± 0.0008 | 0.9991 ± 0.0009 | 0.9994 ± 0.0005 | 0.9992 ± 0.0007 |
| DNN | 99.94% ± 0.04 | 0.9991 ± 0.0005 | 0.9990 ± 0.0006 | 0.9996 ± 0.0004 | 0.9993 ± 0.0004 |
| DNN + RNN | 99.98% ± 0.02 | 0.9998 ± 0.0002 | 0.9998 ± 0.0003 | 0.9999 ± 0.0001 | 0.9998 ± 0.0002 |
Table 6.
Runtime comparison across the main evaluated models.
Table 6.
Runtime comparison across the main evaluated models.
| Model | Train Time (s) | Inference Time (s) |
|---|
| Logistic Regression | 0.10 ± 0.01 | 0.0032 ± 0.0064 |
| XGBoost | 0.34 ± 0.03 | 0.0016 ± 0.0047 |
| Random Forest | 1.18 ± 0.07 | 0.1108 ± 0.0144 |
| RNN | 91.56 ± 0.86 | 0.0095 ± 0.0078 |
| DNN | 87.70 ± 1.02 | 0.0048 ± 0.0073 |
| DNN + RNN | 115.05 ± 0.97 | 0.0075 ± 0.0076 |
Table 7.
Performance comparison of PSO-optimized architectures.
Table 7.
Performance comparison of PSO-optimized architectures.
| Model | Accuracy | Specificity | Precision | Recall | F1-Score |
|---|
| DNN + PSO | 99.96% ± 0.06 | 0.9997 ± 0.0004 | 0.9996 ± 0.0004 | 0.9995 ± 0.0009 | 0.9995 ± 0.0006 |
| RNN + PSO | 99.92% ± 0.06 | 0.9993 ± 0.0006 | 0.9992 ± 0.0007 | 0.9990 ± 0.0009 | 0.9991 ± 0.0007 |
| (DNN + RNN) + PSO | 99.98% ± 0.02 | 0.9997 ± 0.0003 | 0.9996 ± 0.0003 | 0.9999 ± 0.0001 | 0.9998 ± 0.0002 |
Table 8.
Comparison between PSO and RS for tuning the hybrid DNN + RNN model.
Table 8.
Comparison between PSO and RS for tuning the hybrid DNN + RNN model.
| Model | Accuracy | Specificity | Precision | Recall | F1 Score |
|---|
| (DNN + RNN) + PSO | 99.98% ± 0.02 | 0.9997 ± 0.0003 | 0.9996 ± 0.0003 | 0.9999 ± 0.0001 | 0.9998 ± 0.0002 |
| (DNN + RNN) + RS | 99.97% ± 0.04 | 0.9998 ± 0.0003 | 0.9997 ± 0.0003 | 0.9997 ± 0.0006 | 0.9997 ± 0.0004 |
Table 9.
Selected PSO hyperparameters across repeated runs/folds.
Table 9.
Selected PSO hyperparameters across repeated runs/folds.
| Run | Learning Rate | RNN Hidden Size | Dropout | DNN Hidden Size | Val. Acc. |
|---|
| 1 | 0.001593 | 128 | 0.059263 | 256 | 1.0000 |
| 2 | 0.000646 | 32 | 0.084498 | 256 | 1.0000 |
| 3 | 0.000717 | 32 | 0.148054 | 256 | 1.0000 |
| 4 | 0.003921 | 64 | 0.188730 | 64 | 1.0000 |
| 5 | 0.000423 | 128 | 0.069289 | 128 | 0.9998 |
| 6 | 0.002583 | 32 | 0.165197 | 64 | 0.9999 |
| 7 | 0.003838 | 32 | 0.439120 | 128 | 1.0000 |
| 8 | 0.002694 | 128 | 0.325125 | 64 | 0.9999 |
| 9 | 0.000975 | 32 | 0.164963 | 128 | 1.0000 |
| 10 | 0.001459 | 64 | 0.121378 | 64 | 1.0000 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |