This section presents the experimental results obtained from the comparative study between the proposed methods in this paper. In these results, we aimed to illustrate the impact of randomizing each technique through statistical analysis, in addition to examining relationships among p-values. The datasets used in this research include training, validation, and test sets to train and validate the ranking models on the training and validation datasets, respectively, and to evaluate their predictive performance on the test datasets. All experiments were executed on an HP ZBook 15 G6 mobile workstation (HP Inc., Palo Alto, CA, USA). The system is equipped with an Intel® Core™ i7-9850H processor (12 logical cores), 32 GB of RAM, and dual-GPU support, including the integrated Intel® UHD Graphics 630 (CFL GT2) and an NVIDIA Quadro T2000 GPU. The operating system used for all experiments was Ubuntu 24.04.3 LTS running Linux Kernel 6.14.0-35, with GNOME 46 and X11 windowing system. The implementation environment included Python 3.10, Scikit-learn 1.3.2, NumPy 1.24.2, Pandas 2.0, and XGBoost 1.7.6. All metaheuristic feature-selection algorithms (GA, PSO, and VNS) were executed on CPU mode, while RandomForest and XGBoost models were trained using multi-threaded CPU execution.
4.2. Computational Performance Evaluation
This section aims to evaluate the effects of parallel ensemble learning (RF and XGBoost) on the scalability and efficiency of VNS, GA, and PSO in feature selection. The following questions will be addressed:
For Multidimensional Data,
Table 10 summarizes the feature reduction achieved by intrusion detection models, with an original feature count of 84, with fitness accuracies represented in
Figure 12 and
Figure 13. The PSO-RandomForest model achieved the highest reduction, selecting 40 features and yielding 52.38%. The VNS-XGBoost model followed closely, selecting 42 features, resulting in a 50.00% reduction. The VNS-RandomForest model reduced the features to 43 (48.81% reduction), while GA-XGBoost selected 45 features (46.43% reduction). Both GA-RandomForest and PSO-XGBoost selected 46 features, resulting in 45.24% reductions. These results indicate that all models effectively reduced the feature set by approximately 45–52%, with PSO-RandomForest being the most efficient in minimizing the number of features, followed closely by VNS-XGBoost, enhancing computational efficiency while maintaining performance in intrusion detection.
Table 10 summarizes the feature reduction achieved by intrusion detection models for multidimensional data, starting with 84 original features. The PSO-RF model achieved the highest feature reduction, selecting 40 features, resulting in a 52.38% reduction. The VNS-XGBoost model was then applied, selecting 42 features, resulting in a 50.00% reduction. The VNS-RF model reduced the feature set to 43 features (48.81% reduction), while GA-XGBoost selected 45 features (46.43% reduction). Both GA-RF and PSO-XGBoost selected 46 features, achieving reductions of 45.24%. These results demonstrate that all models effectively reduced the feature set by approximately 45–52%, with PSO-RF being the most efficient in minimizing the number of features, followed closely by VNS-XGBoost, enhancing computational efficiency while maintaining performance in intrusion detection.
Table 11 summarizes the feature reduction achieved by the intrusion detection models using the CSE-CIC-IDS2018 dataset, with an original feature count of 256 (derived from 16 × 16 image representations). The PSO-XGBoost model achieved the highest reduction, selecting 130 features and yielding 49.22%. The GA-RF model followed closely, selecting 131 features, resulting in a 48.83% reduction. Both GA-XGBoost and VNS-RF, along with VNS-XGBoost, selected 137 features each, resulting in a 46.48% reduction, while PSO-RF selected 135 features with a 47.27% reduction. These results indicate that all models effectively reduced the feature set by approximately 46–49%, with PSO-XGBoost being the most efficient in minimizing the number of features while maintaining performance. For IDS2017, GA-XGBoost achieves the highest reduction percentage (50.39%, selecting 127 features), closely followed by VNS-XGBoost (50.00%, 128 features), demonstrating their effectiveness in identifying compact, high-performing feature subsets, leveraging GA’s evolutionary search and VNS’s systematic neighborhood exploration, respectively, paired with XGBoost’s gradient boosting optimization (n_estimators = 100, learning_rate = 0.1). The VNS-RF model recorded the shortest training time at 1 min 6 s, while VNS-XGBoost had a slightly longer training time of 1 min 14 s. For optimization time, VNS-XGBoost was the fastest, taking only 12 min 15 s, followed by VNS-RF at 20 min 53 s. In contrast, the GA-XGBoost model had the longest optimization time at 6 h 11 min 4 s, despite a relatively short training time of 1 min 6.5 s. The PSO-XGBoost and PSO-RF models required 5 h 24 min 26 s and 3 h 22 min 14 s for optimization, respectively, with training times of 1 min 56.2 s and 1 min 52.6 s. The GA-RF model took 3 h, 1 min, 6 s for optimization and 1 min, 20.8 s for training. These results highlight that VNS-based models, particularly VNS-XGBoost, are significantly more efficient in terms of optimization time, making them highly suitable for practical deployment in intrusion detection systems.
For Multidimensional Data,
Table 12 compares the training and optimization times of six hybrid models that combine evolutionary algorithms (GA, PSO, VNS) with machine learning models (RandomForest, XGBoost). VNS-based models show the fastest optimization times (20.88 min for RandomForest and 12.25 min for XGBoost), while GA and PSO take significantly longer (up to 6+ hours). Training times are relatively short across all models, with VNS having a longer training time due to its iterative neighborhood search; PSO being faster due to lightweight position updates; and XGBoost being slower than RandomForest due to boosting overhead. For image converted data in
Table 13, VNS-XGBoost achieves the fastest optimization time at 8 min 58 s, followed closely by VNS-RandomForest at 9 min 56 s, making them significantly more efficient at feature selection than other models. GA-RandomForest and PSO-RandomForest require much longer optimization times (1 h 45 min 20 s and 1 h 3 min 40 s, respectively), while GA-XGBoost and PSO-XGBoost are the slowest, taking 3 h 22 min and 3 h 26 min, respectively. In contrast, RandomForest-based models (GA-RandomForest, PSO-RandomForest, VNS-RandomForest) exhibit higher inference times (0.07290, 0.07488, and 0.13096 s, respectively) and greater peak memory usage (0.02–0.03 MB), due to the computational overhead of evaluating multiple decision trees (n_estimators = 100, n_jobs = −1). All models share a consistent initial memory footprint (0.00095 MB), indicating similar preprocessing overheads. These results underscore the superior inference efficiency of XGBoost-based models, particularly GA-XGBoost, for real-time applications on high-dimensional image data, aligning with their high accuracy (e.g., 0.99930 for GA-XGBoost in
Table 1). In contrast, Random Forest models trade off inference speed for robustness.
Table 14 compares the memory usage per single sample for intrusion detection models trained on multidimensional Data_2017 (ICIDS-2017) Dataset. The PSO-XGBoost model demonstrated the lowest inference time at 0.0031 s and a peak memory usage of 10.24 KB, making it the most computationally efficient. The VNS-XGBoost model performed closely, with an inference time of 0.0042 s and a peak memory usage of 10.24 KB. In contrast, GA-XGBoost had a higher inference time (0.0220 s) and peak memory usage (81.92 KB). The RandomForest-based models (GA-RF, PSO-RF, VNS-RF) exhibited higher inference times, ranging from 0.0612 s (PSO-RF) to 0.0773 s (GA-RF), with peak memory usage between 20.48 KB (VNS-RF) and 30.72 KB (GA-RF, PSO-RF). All models had approximately 10 KB of initial memory, except for GA-XGBoost, which had 81.92 KBhese results indicate that PSO-XGBoost and VNS-XGBoost are the most efficient in terms of speed and memory, making them highly suitable for real-time intrusion detection applications.
Table 15 compares the inference time and memory usage per single sample for various models on the CSE-CIC-IDS2018 dataset. All models have an identical initial memory usage of 0.00095 MB, indicating minimal baseline memory requirements. For peak memory usage, GA-XGBoost, PSO-XGBoost, and VNS-XGBoost are the most memory-efficient, each using 0.01 MB, while GA-RF uses the highest at 0.04 MB, followed by VNS-RF (0.03 MB) and PSO-RF (0.02 MB). In terms of inference time, PSO-XGBoost is the fastest at 0.00270 s, followed by GA-XGBoost (0.00495 s), while VNS-RF is the slowest at 0.09035 s, with GA-RF (0.07743 s) and VNS-XGBoost (0.03281 s) also relatively slower. Overall, XGBoost-based models, particularly PSO-XGBoost, demonstrate superior efficiency in inference time and memory usage, making them ideal for real-time applications, whereas RandomForest-based models, especially VNS-RF, require more time and memory during inference.
4.3. Multi-Class Performance Evaluation
In this section, we evaluate the model’s performance across various incursion categories and analyze the effects of evolutionary feature selection on false positives and false negatives for each class. Emphasizing:
Table 16 compares statistical performance across models trained on multidimensional data using Pearson correlation, Chi-square, and ANOVA tests. All models show strong Pearson correlation (0.94–0.99) with significant
p-values (
p = 0.0000), indicating high linear relationships. Chi-square results are incredibly high (up to 953k for VNS-XGBoost) with
p = 0.0000, rejecting independence. For ANOVA, PSO/VNS-RandomForest shows significant differences (
p = 0.0000), while XGBoost-based models are non-significant (
p > 0.88), suggesting better stability. VNS-XGBoost has the highest Pearson (0.9986) and Chi-square, highlighting its robustness.
Table 17, comparing statistical analyses for intrusion detection models using the CSE-CIC-IDS2018 dataset, evaluates Random Forest (RF) and XGBoost models optimized with GA, PSO, and VNS. The VNS-XGBoost model achieved the highest Pearson correlation coefficient (1.0000) and the highest Chi-Square statistic (147,346.67), both with
p-values of 0.0000, indicating a perfect correlation between predictions and accurate labels and confirming strong significance. However, it yielded the lowest ANOVA F-statistic (0.0000) and a
p-value of 0.9976, suggesting no significant differences in the means. Similarly, for CIC-IDS2017, VNS-XGBoost achieves the strongest statistical performance, with a near-perfect Pearson Coefficient (0.9999) and Chi-Square value (956385.88), both with
p-values of 0.0000, indicating a highly significant linear and categorical association between predictions and accurate labels, corroborated by an ANOVA F-value of 0.0000 (
p-value = 1.0000), suggesting no considerable variance in errors across classes. Other models (GA-RF, GA-XGBoost, PSO-RF, PSO-XGBoost, VNS-RF) showed closely competitive performance, with high Pearson coefficients (0.9974–0.9997) and Chi-Square values (146515.72–147201.70), reflecting robust accuracy and stability, with VNS-XGBoost slightly outperforming the rest.
Table 18 presents the corrected Type 2 errors (false negatives, or missed detections) for intrusion detection models across eight classes: Benign, DoS, Portscan, DDoS, Infiltration, Brute Force, Web Attack, and Botnet. The VNS-XGBoost model achieved the lowest total Type 2 errors with 212 missed detections, primarily in the Infiltration (94) and Portscan (112) classes, with zero errors in DDoS, Brute Force, Web Attack, and Botnet, demonstrating superior detection capability. The PSO-XGBoost model followed with 424 errors, with notable misses in Infiltration (179) and Portscan (171). The GA-XGBoost model had 447 errors, also struggling with Infiltration (189) and Portscan (174). In contrast, the RandomForest-based models (GA-RF, PSO-RF, VNS-RF) exhibited significantly higher errors, with totals of 3582, 3519, and 3822, respectively, driven largely by poor detection in the Infiltration class (2816–2875 errors) and moderate errors in DoS (290–352) and Botnet (76–345). These results highlight VNS-XGBoost’s exceptional performance in minimizing missed detections, particularly for critical attack classes, making it highly effective for intrusion detection on numeric data.
Table 19 summarizes the false positives (misclassified samples from other classes) for intrusion detection models across eight classes in image converted data: Benign, DoS, Portscan, DDoS, Infiltration, Brute Force, Web Attack, and Botnet. The VNS-XGBoost model achieved the lowest total false positives with 212 errors, primarily in the Infiltration (112) and Portscan (91) classes, with minimal errors in Benign (5), DoS (2), Web Attack (2), and zero in DDoS, Brute Force, and Botnet, indicating high precision in classification. The PSO-XGBoost model followed with 424 false positives, with notable errors in Infiltration (164) and Portscan (162). The GA-XGBoost model recorded 447 false positives, also struggling with Infiltration (168) and Portscan (173). The RandomForest-based models (GA-RF, PSO-RF, VNS-RF) showed significantly higher false positives, totaling 3582, 3519, and 3822, respectively, primarily driven by misclassifications in the Portscan (2642–2654) and Benign (771–1060) classes. These results underscore VNS-XGBoost’s superior performance in minimizing false positives, making it highly effective for reducing false alarms in intrusion detection systems using numeric data.
4.4. Discussion, and Limitations
Detecting malicious traffic on IoT-based smart devices is becoming increasingly challenging due to the diversity of network traffic and the continually evolving nature of attacks. To combat the growing number of sophisticated cyberattacks, traditional intrusion detection systems (IDSs) have been used. However, Traditional models struggle to capture cross-modal relationships and the subtle perturbations introduced by such attacks, leading to several fundamental problems, including low detection effectiveness against unknown network threats, a high false-positive rate (FPR), and excessive resource consumption. By integrating spatial and inter-feature correlation analysis into intelligent cybersecurity workflows that leverage both tabular and image-encoded traffic, this work lays the groundwork for robust, real-time protection mechanisms in AI-driven IoT and energy-aware infrastructure systems. PSO, GA, and VNS are used to improve model performance and decrease dimensional redundancy. The three feature optimizers are paired with two classifiers to produce six configurations using GA-RF, PSO-RF, VNS-RF, GA-XGB, PSO-XGB, and VNS-XGB. This hybrid system is particularly suitable for energy-optimized smart grids and smart city IoT networks where adversarial attacks on sensor telemetry or control signals may result in energy waste, service disruptions, or false alerts.
GA combined with RandomForest is a robust evolutionary approach for feature selection, integrating the bio-inspired optimization principles of GA with the ensemble-learning strengths of RandomForest to address multi-class classification tasks on datasets such as CICIDS-IDS2017 (78 numeric features after preprocessing) and image encoded data (256 features from flattened 16 × 16 images), as shown in
Table 6 and
Table 7. Additionally, when combined with XGBoost, it integrates the bio-inspired optimization capabilities of GA with its advanced gradient-boosting framework. Results for different modalities represented in
Section 4.1 demonstrate that data modality selection significantly affects intrusion detection robustness and accuracy. In comparisons between tabular and image-based representations of identical feature subsets, the image-based modality consistently performed better in terms of accuracy, F1 Scores, and Kappa scores. Since ensemble classifiers encode tabular information spatially, they can capture latent inter-feature connections and correlation patterns that are otherwise difficult to characterize. Using the image modality, high-dimensional IoT traffic data is efficiently transformed into structured matrices, thereby improving the distinction between attack patterns and benign events.
Table 10 and
Table 11 show that VNS and XGBoost were combined to achieve a good balance between detection accuracy and computational complexity. Compared with the original dataset, the VNS-based feature optimization reduced the feature dimensionality by approximately 50% while maintaining—and even enhancing—the overall model’s performance. Consequently, this significant decrease directly reduced training time and memory consumption, demonstrating VNS’s ability to conduct localized searches over several neighborhoods, effectively avoiding local optima, and finding compact, discriminative feature sets.
Table 14 and
Table 15 compare inference efficiency and memory usage across models. XGBoost-based models are significantly faster (0.00315–0.02197 s) than Random Forest (0.06118–0.07733 s), with PSO-XGBoost being the quickest. Memory usage is low overall, with XGBoost models having slightly higher initial memory (up to 0.07086 MB) but lower peak memory (0.01–0.08 MB vs. 0.02–0.03 MB for RF). VNS and PSO-XGBoost achieve the best balance, combining fast inference (≤0.00421 s) and minimal peak memory (0.01 MB). This suggests XGBoost hybrids, especially with PSO/VNS, optimize both speed and resource efficiency. The convergence curve for the CIDS2017 dataset showed (
Figure 12 and
Figure 13) a notable upward trend, with the first solution achieving an accuracy of 0.986 and steadily increasing to 0.998 after approximately 17.5 evaluation iterations. This improvement demonstrates the versatility of VNS in high-dimensional search spaces and its ability to extract highly discriminative characteristics from various IoT traffic patterns. With little redundancy, the algorithm was able to separate significant feature subsets, as evidenced by the final configuration’s low Type II Error and False Positive rate of 4. The IDS2018 dataset, on the other hand, demonstrated greater stability. After 14 assessment cycles, the accuracy plateaued at 0.990, indicating that fewer neighborhood expansions were needed to achieve optimal performance in the current dataset, which had more uniform traffic distributions and fewer minority classes. The consistency of results across datasets confirms the stability and dependability of the single-objective VNS optimization strategy. In addition, the convergence behavior demonstrates that the search procedure retains significant generalization capacity across years while adapting successfully to the inherent complexity of each dataset.
It was demonstrated that VNS was appropriate for real-time or resource-constrained IoT scenarios, achieving faster stabilization with fewer evaluations than metaheuristics such as GA and PSO, which often require many iterations for convergence. XGBoost’s parallel tree-boosting architecture significantly improved its computational scalability and robustness against class imbalances. The results show that neighborhood parameter initialization affects VNS performance, potentially requiring adjustments to preserve convergence stability across diverse datasets. The combination of VNS and XGBoost performed better than alternative setups, achieving a near-ideal trade-off between accuracy, runtime efficiency, and feature reduction due to its computationally light and highly accurate intrusion detection model.
The results of RQ3 in
Section 4.3 indicate that the proposed VNS–XGBoost framework performs well in multiclass intrusion detection, achieving high accuracy and generalization on both the IDS2017 and IDS2018 datasets given in
Table 16 and
Table 17. RF models differ significantly in their ability to handle different types of attacks, as indicated by the low
p-values (≤0.0284) for the RF models, which lead us to reject the null hypothesis. The VNS-XGBoost model’s ANOVA
p-value of 0.9588 indicates that only the most discriminatory and non-redundant variables are present in its 50% feature subset. Through this feature, you can obtain a model that does not favor any particular type of attack, such as a benign attack, a DoS attack, a DDoS attack, etc. In combination with the error tables (
Table 6 and
Table 7), this demonstrates that RF models are challenging to apply to some classes (such as Infiltration), leading to substantial variability in their performance. According to the VNS-RandomForest F-statistic of 141.5094, the performance of the groups varies significantly. VNS-XGBoost has the highest
p-value (0.9588) among the XGBoost models, which is a significant strength. The performance of these models is superior to that of the other classes (Benign, DoS, DDoS, etc.). When an intrusion detection system has multiple classes, it becomes fundamentally more reliable and trustworthy.
All classes of IoT attacks were successfully differentiated by the model, which maintained a balanced precision and recall across all classes as shown in
Table 18 and
Table 19. The error analysis encompassing Type 2 Errors (missed detections) and False Positives (misclassified samples) reveals a persistent and striking performance difference between the machine learning algorithms under evaluation. The main conclusion is that XGBoost-based models (GA-XGBoost, PSO-XGBoost, and VNS-XGBoost) outperform Random Forest (RF)-based models. The RF models consistently produced error counts of 3500–3800 for Type 2 errors and false positives. The XGBoost models, on the other hand, reduced these totals to 212–447. The boosting mechanism offered by XGBoost is considerably more effective than the bagging mechanism of Random Forest for this dataset and classification problem. The Type II error rate dropped by more than 90% compared to GA- and PSO-based configurations, demonstrating that the method can minimize missed detections that could otherwise compromise network integrity. The VNS-XGBoost model (0.9986 and 0.9588) is unquestionably the most robust and reliable of the three models, due to its Extremely High Pearson r (near-perfect association) and Very High ANOVA
p-value (uniform, low-variance performance).
Combining XGBoost with Variable Neighborhood Search yields the most equitable and efficient categorization across the sample set. VNS-RF (3822 total FN) models are prone to type 2 errors. This is the most significant security mistake. The model is essentially blind to thousands of attacks when the number of FNs is high. The most important contributor to FNs for the RF models was the system’s failure to detect infiltration attempts, allowing an attacker to enter the network without triggering an alarm. With an FN count of 212, VNS-XGBoost had the lowest FN count of all the candidates. The use of this methodology significantly reduces the probability of a significant security breach because fewer threats are detected. A parallelized tree ensemble in XGBoost handles heterogeneous data distributions with minimal overfitting, and a local search in VNS maximizes discriminative feature subsets for each attack class. In addition to improving the model’s ability to distinguish between overlapping attack patterns, the spatial feature representation enhanced its ability to detect minute variations across visually similar incursion profiles. However, there are a few limitations. It is most likely that the model’s recall was erratic across closely related classes, such as DoS and DDoS, because the samples shared specific statistical and temporal characteristics. Further, since feature importance and class boundaries may change dynamically in large, streaming IoT systems, model interpretability and scalability may be limited. Although VNS is a computationally efficient method, it still requires manual parameter adjustments to maintain neighborhood variety and to avoid convergence stagnation. Despite these limitations, the results show that the VNS–XGBoost pipeline produces an exceptionally accurate and computationally efficient intrusion detection (IDS) system that supports strong multiclass detection and cross-dataset generalization without the complexity of deep learning architectures. The results of this study demonstrate that the model can be applied to IoT intrusion-detection deployments where speed, explainability, and adaptability are significant.