Innovating Intrusion Detection Classification Analysis for an Imbalanced Data Sample
Abstract
1. Introduction
- ―
- How do IDC assessments direct the classifiers’ analysis domains related to cybersecurity aspects?
- ―
- What kind of output(s) dominates IDS classification exploring the dataset?
- ―
- What implications do these findings have for IDS policy and strategy?
2. Related Works
- ―
- Recruit a sophisticated metric that is capable of clarifying the model’s performance form, particularly with a minority set. Such metrics include recall, precision, F1-score, ROC, and AUC curves.
- ―
- SMOTE, which stands for synthetic minority oversampling technique, is used to perfectly balance/reduce the number of majority sets. It is highly effective; however, it causes data loss.
- ―
- Algorithm-based tuning is used to balance or adapt higher weights for the minority set to minimize bias. It is major sampling method that offers cost-sensitive learning and class-weighted models, and it fixes imbalance influence. Algorithms such as random forest and XGBoost are the most effective ones.
- ―
- Data augmentation is an amazingly effective technique used to augment the minority set by rotating or flipping to balance the dataset.
- ―
- The anomaly technique is an anomaly detection method used to control the minority class.
- ―
- Understandability of the domain is useful, granted that the minority class is rare.
- ―
- Experimental methods are used to examine different resampling methods, models, and measures.
- ―
- CV-based methods use different combinations/permutation to validate the system’s performance to achieve generalization.
3. Materials and Methods
3.1. Overview of the Dataset
3.2. Preprocessing
3.3. Visualization
3.4. Core Classification Formulas
- Linear classifiers separate classes by using a linear decision boundary/hyperplane, using the following form:
- The decision rule is
- Logistic regression:
- Linear discriminant analysis (LDA) uses class means of (μ0, μ1) and shared covariance, ∑:
- Linear support vector machine (SVM) is used to maximize margin based on constraints for
- 2.
- Nonlinear classifiers separate data using a decision function and take the form of
- Kernel SVM replaces linear dot product with kernel K(Xi, Xj):
- For the Decision Tree, split space recursively using thresholds:
- The k-Nearest Neighbor (k-NN) method assigns class based on majority vote and as
- Neural Networks (MLP) use layers of weighted sums plus nonlinear activation functions:
- Output, i.e., softmax for multiclass, is calculated by
- A model learns mapping as follows:
- Decision rule, y = 1, if P(y = 1/x) ≥ 0.5.
- SMOTE used synthetic samples for minority-imbalanced datasets. For each minority sample, xi, perform the following:
3.5. Classification Metrics
4. Experiments and Results
4.1. Confusion Matrix
- TP (true positive): Threat correctly detected.
- TN (true negative): Normal traffic correctly classified.
- FP (false positive): Normal traffic flagged as a threat (false alarm).
- FN (false negative): Threat missed, classified as normal.
4.2. Accuracy
4.3. Precision, Recall, and F1-Score
4.4. ROC Curve and AUC
4.5. Precision–Recall Curve
4.6. Logarithmic Loss (Log Loss)
- is the true label indicator. It is 1 if the sample belongs to class c, and 0 otherwise.
- is the model’s predicted probability that sample belongs to class c. It is obtained from class predict_proba().
4.7. Complete Example
4.7.1. Confusion Matrix
4.7.2. Accuracy
4.7.3. Precision, Recall, and F1-Score
4.7.4. ROC Curve and AUC
4.7.5. Precision–Recall Curve
4.7.6. Logarithmic Loss (Log Loss)
4.8. Handling Imbalance Data
- Understand the problem of data imbalance.
- Learn various strategies to address imbalance (undersampling, oversampling, and hybrid).
- Implement common techniques like RandomOverSampler and SMOTE.
- Understand the importance of applying these techniques after splitting data into train/test sets.
- -
- Step 1: Load data and check for imbalance.
- -
- Step 2: Split the data.
- -
- Step 3: Resample to address imbalance.
4.8.1. Random Oversampling
4.8.2. SMOTE
4.8.3. Random Undersampling (Use with Caution)
4.9. Sampling Dataset
4.10. Advanced Feature Engineering
- -
- Step 1: Copy the DataFrame.
- -
- Step 2: Select features for polynomial transformation.
- -
- Step 3: Generate polynomial features.
4.11. Advanced Dimensionality Reduction
5. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Uddin, A.; Aryal, S.; Bouadjenek, M.R.; Al-Hawawreh, M.; Talukder, A. Hierarchical classification for intrusion detection system: Effective design and empirical analysis. Ad Hoc Netw. 2025, 178, 103982. [Google Scholar] [CrossRef]
- Alotaibi, M.; Mengash, H.A.; Alqahtani, H.; Al-Sharafi, A.M.; Yahya, A.E.; Alotaibi, S.R.; Khadidos, A.O.; Yafoz, A. Hybrid GWQBBA model for optimized classification of attacks in Intrusion Detection System. Alex. Eng. J. 2025, 116, 9–19. [Google Scholar] [CrossRef]
- Suarez-Roman, M.; Tapiador, J. Attack structure matters: Causality-preserving metrics for Provenance-based Intrusion Detection Systems. Comput. Secur. 2025, 157, 104578. [Google Scholar] [CrossRef]
- Shana, T.B.; Kumari, N.; Agarwal, M.; Mondal, S.; Rathnayake, U. Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning. Intell. Syst. Appl. 2025, 27, 200543. [Google Scholar] [CrossRef]
- Araujo, I.; Vieira, M. Enhancing intrusion detection in containerized services: Assessing machine learning models and an advanced representation for system call data. Comput. Secur. 2025, 154, 104438. [Google Scholar] [CrossRef]
- Devi, M.; Nandal, P.; Sehrawat, H. Federated learning-enabled lightweight intrusion detection system for wireless sensor networks: A cybersecurity approach against DDoS attacks in smart city environments. Intell. Syst. Appl. 2025, 27, 200553. [Google Scholar] [CrossRef]
- Le, T.-T.; Shin, Y.; Kim, M.; Kim, H. Towards unbalanced multiclass intrusion detection with hybrid sampling methods and ensemble classification. Appl. Soft Comput. 2024, 157, 111517. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Othman, S.M.; Ba-Alwi, F.M.; Alsohybe, N.T.; Al-Hashida, A.Y. Intrusion detection model using machine learning algorithm on Big Data environment. J. Big Data 2018, 5, 34. [Google Scholar] [CrossRef]
- Rao, Y.N.; Babu, K.S. An Imbalanced Generative Adversarial Network-Based Approach for Network Intrusion Detection in an Imbalanced Dataset. Sensors 2023, 23, 550. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, J.; Zhu, X.; Xu, D. How Far Have We Progressed in the Sampling Methods for Imbalanced Data Classification? An Empirical Study. Electronics 2023, 12, 4232. [Google Scholar] [CrossRef]
- El-Gayar, M.M.; Alrslani, F.A.F.; El-Sappagh, S. Smart Collaborative Intrusion Detection System for Securing Vehicular Networks Using Ensemble Machine Learning Model. Information 2024, 15, 583. [Google Scholar] [CrossRef]
- Gong, W.; Yang, S.; Guang, H.; Ma, B.; Zheng, B.; Shi, Y.; Li, B.; Cao, Y. Multi-order feature interaction-aware intrusion detection scheme for ensuring cyber security of intelligent connected vehicles. Eng. Appl. Artif. Intell. 2024, 135, 108815. [Google Scholar] [CrossRef]
- Gou, W.; Zhang, H.; Zhang, R. Multi-Classification and Tree-Based Ensemble Network for the Intrusion Detection System in the Internet of Vehicles. Sensors 2023, 23, 8788. [Google Scholar] [CrossRef]
- Moulahi, T.; Zidi, S.; Alabdulatif, A.; Atiquzzaman, M. Comparative Performance Evaluation of Intrusion Detection Based on Machine Learning in In-Vehicle Controller Area Network Bus. IEEE Access 2021, 9, 99595–99605. [Google Scholar] [CrossRef]
- Wang, W.; Sun, D. The improved AdaBoost algorithms for imbalanced data classification. Inf. Sci. 2021, 563, 358–374. [Google Scholar] [CrossRef]
- Abedzadeh, N.; Jacobs, M. A Reinforcement Learning Framework with Oversampling and Undersampling Algorithms for Intrusion Detection System. Appl. Sci. 2023, 13, 11275. [Google Scholar] [CrossRef]
- Sayegh, H.R.; Dong, W.; Al-Madani, A.M. Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data. Appl. Sci. 2024, 14, 479. [Google Scholar] [CrossRef]
- Palma, Á.; Antunes, M.; Bernardino, J.; Alves, A. Multi-Class Intrusion Detection in Internet of Vehicles: Optimizing Machine Learning Models on Imbalanced Data. Futur. Internet 2025, 17, 162. [Google Scholar] [CrossRef]
- Zayid, E.I.M.; Humayed, A.A.; Adam, Y.A. Testing a rich sample of cybercrimes dataset by using powerful classifiers’ competences. TechRxiv 2024. [Google Scholar] [CrossRef]
- Faul, A. A Concise Introduction to Machine Learning, 2nd ed.; Chapman and Hall/CRC: New York, NY, USA, 2025; pp. 88–151. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2018, 17, 168–192. [Google Scholar] [CrossRef]
- Wu, Y.; Zou, B.; Cao, Y. Current Status and Challenges and Future Trends of Deep Learning-Based Intrusion Detection Models. J. Imaging 2024, 10, 254. [Google Scholar] [CrossRef]
- Aljuaid, W.H.; Alshamrani, S.S. A Deep Learning Approach for Intrusion Detection Systems in Cloud Computing Environments. Appl. Sci. 2024, 14, 5381. [Google Scholar] [CrossRef]
- Isiaka, F. Performance Metrics of an Intrusion Detection System Through Window-Based Deep Learning Models. J. Data Sci. Intell. Syst. 2023, 2, 174–180. [Google Scholar] [CrossRef]
- Sajid, M.; Malik, K.R.; Almogren, A.; Malik, T.S.; Khan, A.H.; Tanveer, J.; Rehman, A.U. Enhancing Intrusion Detection: A Hybrid Machine and Deep Learning Approach. J. Cloud Comput. 2024, 13, 1–24. [Google Scholar] [CrossRef]
- Neto, E.C.P.; Iqbal, S.; Buffett, S.; Sultana, M.; Taylor, A. Deep Learning for Intrusion Detection in Emerging Technologies: A Comprehensive Survey and New Perspectives. Artif. Intell. Rev. 2025, 58, 1–63. [Google Scholar] [CrossRef]
- Nakıp, M.; Gelenbe, E. Online Self-Supervised Deep Learning for Intrusion Detection Systems (SSID). arXiv 2023, arXiv:2306.13030. Available online: https://arxiv.org/abs/2306.13030 (accessed on 1 September 2025).
- Liu, J.; Simsek, M.; Nogueira, M.; Kantarci, B. Multidomain Transformer-based Deep Learning for Early Detection of Network Intrusion. arXiv 2023, arXiv:2309.01070. Available online: https://arxiv.org/abs/2309.01070 (accessed on 10 September 2025). [CrossRef]
- Zeng, Y.; Jin, R.; Zheng, W. CSAGC-IDS: A Dual-Module Deep Learning Network Intrusion Detection Model for Complex and Imbalanced Data. arXiv 2025, arXiv:2505.14027. Available online: https://arxiv.org/abs/2505.14027 (accessed on 10 September 2025).
- Chen, Y.; Su, S.; Yu, D.; He, H.; Wang, X.; Ma, Y.; Guo, H. Cross-domain industrial intrusion detection deep model trained with imbalanced data. IEEE Internet Things J. 2022, 10, 584–596. [Google Scholar] [CrossRef]
- Sangaiah, A.K.; Javadpour, A.; Pinto, P. Towards data security assessments using an IDS security model for cyber-physical smart cities. Inf. Sci. 2023, 648, 119530. [Google Scholar] [CrossRef]
- Disha, R.A.; Waheed, S. Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity 2022, 5, 1–22. [Google Scholar] [CrossRef]
- Sarhan, M.; Layeghy, S.; Portmann, M. Towards a standard feature set for network intrusion detection system datasets. Mob. Networks Appl. 2022, 27, 357–370. [Google Scholar] [CrossRef]
Approach | Implication | + | − | Domain(s) | Criteria | Review | Article(s), Year | Method | Imbalanced | Metrics Used | Contribution |
---|---|---|---|---|---|---|---|---|---|---|---|
Data level [10,11] | Random oversampling, SMOTE, ADASYN. Random undersampling Tomek links | Simple, model-agnostic, balances class distribution | Oversampling may cause overfitting. Undersampling may lose useful data | Medical diagnosis, text classification, fraud detection | F1-score, precision–recall curve (PR) curve, G-mean | Suited for PR curves and G-mean, since they directly affect class distributions | [10], 2023 | GAN-based | GAN synthetic oversampling | Acc, precision, recall, F1 | GANs for minority attack classes |
[11], 2023 | Empirical (under/over/hybrid-sampling) | Sampling metric | Precision, recall, F1, AUC | Comprehensive review | |||||||
Algorithm level [12,13,14,15,16] | Cost-sensitive learning. Decision Trees with class weights, SVM penalties, boosting | Directly modifies algorithms, avoids altering data distribution. | Requires algorithm-specific tuning, may increase computational cost | Intrusion detection, financial risk assessment, cybersecurity | ROC-AUC, MCC, balanced accuracy | Better assessed with ROC-AUC and MCC, as they optimize decision boundaries | [12], 2024 | Ensemble ML | limited | Acc, recall, F1 | Collaborative IDS across edge systems |
[13], 2024 | DL | indirect | Acc, precision, recall, F1 | Feature interaction modeling | |||||||
[14], 2023 | Tree-based ensemble | Algorithm-level imbalance robustness | Acc, recall, F1 | Multiclass ensemble IDS | |||||||
[15], 2021 | ML-based (SVM, RF, KNN) | Nor addressed | Acc, detection rate | Comparative, baseline CAN-bus IDS | |||||||
[16], 2021 | AdaBoost | Algorithm-level | Acc, AUC, F1 | Improved AdaBoost for imbalance | |||||||
Hybrid [17,18,19] | SMOTE, boosting, ensemble with resampling, GAN-based synthetic data + classifier | Combines strengthens of both approaches, often best performance | More complex to implement, risk of high computational cost | Rare-disease detection, image recognition | ROC-AUAUC, F1-score, MCC | Typically evaluated with a mix (PR-AUC + ROC-AUC + MCC), since they combine both strategies | [17], 2023 | Reinforcement learning + sampling | Adaptive over/under sampling | Acc, precision, recall, F1 | RL-driven dynamic imbalance handling |
[18], 2024 | LSTM, feature selection, SMOTE | SMOTE oversampling | Acc, precision, recall, F1 | Merges DL plus resampling | |||||||
[19], 2025 | ML (RF, XGB, DL) | evaluation | Acc, precision, recall, F1 | Top IDS handles multiclass imbalance |
Column Name | Data Type | Unique Values Count | NaN Count |
---|---|---|---|
Domain | int64 | 1 | 0 |
Receive Time | datetime64[ns] | 53,466 | 0 |
Serial # | int64 | 1 | 0 |
Type | object | 1 | 0 |
Threat/Content Type | object | 7 | 0 |
Config Version | int64 | 1 | 0 |
Generate Time | datetime64[ns] | 53,466 | 0 |
Source Address | object | 9744 | 0 |
Destination Address | object | 6624 | 0 |
NAT Source IP | object | 10 | 1,033,000 |
NAT Destination IP | object | 818 | 1,033,000 |
Rule | object | 17 | 549,532 |
Source User | object | 4234 | 391,310 |
Destination User | object | 1421 | 842,969 |
Application | object | 15 | 0 |
Virtual System | object | 1 | 0 |
Source Zone | object | 5 | 0 |
Destination Zone | object | 6 | 1 |
Inbound Interface | object | 5 | 1 |
Outbound Interface | object | 6 | 549,532 |
Log Action | object | 2 | 1,037,148 |
Time-Logged | datetime64[ns] | 53,466 | 0 |
Session ID | int64 | 462,478 | 0 |
Repeat Count | int64 | 102 | 0 |
Source Port | int64 | 42,788 | 0 |
Destination Port | int64 | 969 | 0 |
NAT Source Port | int64 | 13,400 | 0 |
NAT Destination Port | int64 | 7 | 0 |
Flags | object | 11 | 0 |
IP Protocol | object | 3 | 0 |
Action | object | 5 | 0 |
URL/Filename | object | 28 | 1,048,035 |
Threat/Content Name | object | 43 | 0 |
Category | object | 13 | 0 |
Severity | object | 5 | 0 |
Direction | object | 2 | 0 |
Sequence Number | int64 | 1056 | 0 |
Action Flags | object | 2 | 0 |
Source Country | object | 35 | 0 |
Destination Country | object | 79 | 0 |
Unnamed: 40 | float64 | 0 | 1,048,575 |
Contenttype | float64 | 0 | 1,048,575 |
pcap_id | int64 | 1 | 0 |
Filedigest | float64 | 0 | 1,048,575 |
Cloud | float64 | 0 | 1,048,575 |
url_idx | int64 | 9 | 0 |
user_agent | float64 | 0 | 1,048,575 |
Filetype | float64 | 0 | 1,048,575 |
Xff | float64 | 0 | 1,048,575 |
Referrer | float64 | 0 | 1,048,575 |
Sender | float64 | 0 | 1,048,575 |
Subject | float64 | 0 | 1,048,575 |
Recipient | float64 | 0 | 1,048,575 |
Reported | int64 | 1 | 0 |
DG Hierarchy Level 1 | int64 | 1 | 0 |
DG Hierarchy Level 2 | int64 | 1 | 0 |
DG Hierarchy Level 3 | int64 | 1 | 0 |
DG Hierarchy Level 4 | int64 | 1 | 0 |
Virtual System Name | object | 1 | 0 |
Device Name | object | 1 | 0 |
file_url | float64 | 0 | 1,048,575 |
Source VM UUID | float64 | 0 | 1,048,575 |
Destination VM UUID | float64 | 0 | 1,048,575 |
http_method | float64 | 0 | 1,048,575 |
Tunnel ID/IMSI | int64 | 1 | 0 |
Monitor Tag/IMEI | float64 | 0 | 1,048,575 |
Parent Session ID | int64 | 1 | 0 |
Parent Session Start Time | float64 | 0 | 1,048,575 |
Tunnel | float64 | 0 | 1,048,575 |
thr_category | object | 11 | 0 |
Contentver | object | 5 | 0 |
sig_flags | object | 2 | 0 |
SCTP Association ID | int64 | 1 | 0 |
Payload Protocol ID | int64 | 1 | 0 |
http_headers | float64 | 0 | 1,048,575 |
URL Category List | float64 | 0 | 1,048,575 |
UUID for Rule | object | 17 | 549,532 |
HTTP/2 Connection | int64 | 1 | 0 |
dynusergroup_name | float64 | 0 | 1,048,575 |
XFF Address | float64 | 0 | 1,048,575 |
Source Device Category | float64 | 0 | 1,048,575 |
Source Device Profile | float64 | 0 | 1,048,575 |
Source Device Model | float64 | 0 | 1,048,575 |
Source Device Vendor | float64 | 0 | 1,048,575 |
Source Device OS Family | float64 | 0 | 1,048,575 |
Source Device OS Version | float64 | 0 | 1,048,575 |
Source Hostname | float64 | 0 | 1,048,575 |
Source Mac Address | float64 | 0 | 1,048,575 |
Destination Device Category | float64 | 0 | 1,048,575 |
Destination Device Profile | float64 | 0 | 1,048,575 |
Destination Device Model | float64 | 0 | 1,048,575 |
Destination Device Vendor | float64 | 0 | 1,048,575 |
Destination Device OS Family | float64 | 0 | 1,048,575 |
Destination Device OS Version | float64 | 0 | 1,048,575 |
Destination Hostname | float64 | 0 | 1,048,575 |
Destination Mac Address | float64 | 0 | 1,048,575 |
Container ID | float64 | 0 | 1,048,575 |
POD Namespace | float64 | 0 | 1,048,575 |
POD Name | float64 | 0 | 1,048,575 |
Source External Dynamic List | float64 | 0 | 1,048,575 |
Destination External Dynamic List | float64 | 0 | 1,048,575 |
Host ID | float64 | 0 | 1,048,575 |
Serial Number | float64 | 0 | 1,048,575 |
domain_edl | float64 | 0 | 1,048,575 |
Source Dynamic Address Group | float64 | 0 | 1,048,575 |
Destination Dynamic Address Group | float64 | 0 | 1,048,575 |
partial_hash | int64 | 1 | 0 |
High-Res Timestamp | object | 85,581 | 0 |
Reason | float64 | 0 | 1,048,575 |
Justification | float64 | 0 | 1,048,575 |
nssai_sst | float64 | 1 | 49,9043 |
Subcategory of App | object | 8 | 0 |
Category of App | object | 6 | 0 |
Technology of App | object | 4 | 0 |
Risk of App | int64 | 4 | 0 |
Characteristics of App | object | 9 | 549,532 |
Container of App | object | 5 | 752,386 |
Tunneled App | object | 9 | 0 |
SaaS of App | object | 2 | 0 |
Sanctioned State of App | object | 1 | 0 |
Actual Values | Positive | TP | FN |
Negative | FP | TN | |
Positive | Negative | ||
Predicted value |
Metric | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
0 | 1.00 | 1.00 | 1.00 | 14 |
1 | 1.00 | 1.00 | 1.00 | 6 |
Accuracy | 1.00 | 20 | ||
Macro avg | 1.00 | 1.00 | 1.00 | 20 |
Weighted avg | 1.00 | 1.00 | 1.00 | 20 |
Article, Year | Algorithm(s) | Processes | Metrics | IDS | Imbalanced | Prediction | Learnability | IDAT | Best for |
---|---|---|---|---|---|---|---|---|---|
Wu, Y.; Zou, B.; Cao, Y., 2024 [23] | CNN, RNN, GANs | Summarize, preprocess, feature engineering, transform | Acc, recall, precision, F1, etc. | ✓ | ✓ | ✓ | 🗶 | 🗶 | Policy, strategy, framework |
Wa’ad H. Aljuaid and Sultan S. Alshamrani, 2024 [24] | CNN, SMOTE | Balancing, feature selection | Acc, precision, recall, | ✓ | ✓ | ✓ | ✓ | 🗶 | Cloud-based, example/scenario |
Fatima Isiaka, 2023–2024 [25] | CNN, RNN | Autoencoder-based | Precision, recall, etc. | ✓ | ✓ | ✓ | ✓ | 🗶 | Metrics discussion (scalar vs. visual) |
Sajid, M.; Malik, K.R.; Almogren, A. et al., 2024 [26] | ML, DL, hybrid | Preprocess, explore | Acc, precision, F1-score, etc. | ✓ | ✓ | ✓ | 🗶 | 🗶 | Comparison-based models |
A. Khan, M. S. Hossain, M. K. Hasan, et al., 2025 [27] | Review/survey | Explore emerging techs | FP, explore, F1, precision, etc. | ✓ | ✓ | ✓ | 🗶 | 🗶 | Justify metrics selection |
Mert Nakıp, Erol Gelenbe, 2023 [28] | Supervised-based | Acquire online, auto-associate NN, trust evaluation | Acc, precision, recall, F1, etc. | ✓ | ✓ | ✓ | 🗶 | 🗶 | Useful in adaptive, streaming IDS |
Jinxin Liu, Murat Simsek, Michele Nogueira, Burak Kantarci et al., 2023 [29] | Transfer arcitechure-based | Combine features and compare | F1-score | ✓ | ✓ | ✓ | 🗶 | 🗶 | Modern modeling, useful with metric trade-offs |
Yifan Zeng et al., 2025 [30] | CNN, SC-CGAN | Generate, combine, oversample | Acc, F1, SHAP, LIME, cost-sensitive, attention | ✓ | ✓ | ✓ | 🗶 | 🗶 | Interpretability purposes |
Ours, 2025 | ML, DL, SMOTE, visualization, empirical | Preprocessing, undersampling, oversampling, feature engineering, dimensionality reduction | Acc, F1, precision, recall, ROC-AUC, visualization, confusion matrix, coding | ✓ | ✓ | ✓ | ✓ | ✓ | Modeling assessment, methodology development |
Parameter | Value |
---|---|
sampling_strategy | ‘auto’ |
random_state | 47 |
k_neighbors | 5 |
Parameter | Value |
---|---|
penalty | ‘l2’ |
tol | 0.0001 |
C | 1 |
solver | ‘lbfgs’ |
max_iter | 100 |
intercept_scaling | 1 |
Metric | Precision | Recall | F1-Score | Support | Test Size | Accuracy |
---|---|---|---|---|---|---|
0 | 1.00 | 1.00 | 1.00 | 109,917 | 10,000 | 1.000000 |
1 | 1.00 | 1.00 | 1.00 | 20,000 | 1.000000 | |
accuracy | 1.00 | 209,706 | 30,000 | 1.000000 | ||
macro avg | 1.00 | 1.00 | 1.00 | 209,706 | 40,000 | 0.999875 |
weighted avg | 1.00 | 1.00 | 1.00 | 209,706 | 50,000 | 1.000000 |
weighted avg | 1.00 | 1.00 | 1.00 | 209,706 | 60,000 | 0.999917 |
70,000 | 1.000000 | |||||
80,000 | 1.000000 | |||||
90,000 | 0.999833 | |||||
100,000 | 0.999750 | |||||
110,000 | 0.999909 | |||||
120,000 | 0.999792 | |||||
130,000 | 0.999962 | |||||
140,000 | 0.999929 | |||||
150,000 | 1.000000 | |||||
160,000 | 0.999844 | |||||
170,000 | 0.999853 | |||||
180,000 | 0.999889 | |||||
190,000 | 0.999921 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zayid, E.I.M.; Isah, I.; Humayed, A.A.; Adam, Y.A. Innovating Intrusion Detection Classification Analysis for an Imbalanced Data Sample. Information 2025, 16, 883. https://doi.org/10.3390/info16100883
Zayid EIM, Isah I, Humayed AA, Adam YA. Innovating Intrusion Detection Classification Analysis for an Imbalanced Data Sample. Information. 2025; 16(10):883. https://doi.org/10.3390/info16100883
Chicago/Turabian StyleZayid, Elrasheed Ismail Mohommoud, Ibrahim Isah, Abdulmalik A. Humayed, and Yagoub Abbker Adam. 2025. "Innovating Intrusion Detection Classification Analysis for an Imbalanced Data Sample" Information 16, no. 10: 883. https://doi.org/10.3390/info16100883
APA StyleZayid, E. I. M., Isah, I., Humayed, A. A., & Adam, Y. A. (2025). Innovating Intrusion Detection Classification Analysis for an Imbalanced Data Sample. Information, 16(10), 883. https://doi.org/10.3390/info16100883