Next Article in Journal
Comparative Assessment of an IMU-Based Wearable Device and a Marker-Based Optoelectronic System in Trunk Motion Analysis: A Cross-Sectional Investigation
Previous Article in Journal
Low-Frequency Square Kilometer Array Pattern Optimization via Convex Programming
Previous Article in Special Issue
Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Semi-Supervised Learning for Intrusion Detection in Large Computer Networks

CREDIT Center, Department of Electrical and Computer Engineering, Prairie View A&M University, Prairie View, TX 77446, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(11), 5930; https://doi.org/10.3390/app15115930 (registering DOI)
Submission received: 20 October 2024 / Revised: 17 May 2025 / Accepted: 21 May 2025 / Published: 24 May 2025
(This article belongs to the Special Issue Data Mining and Machine Learning in Cybersecurity)

Abstract

In an increasingly interconnected world, securing large networks against cyber-threats has become paramount as cyberattacks become more rampant, difficult, and expensive to remedy. This research explores data-driven security by applying semi-supervised machine learning techniques for intrusion detection in large-scale network environments. Novel methods (including decision tree with entropy-based uncertainty sampling, logistic regression with self-training, and co-training with random forest) are proposed to perform intrusion detection with limited labeled data. These methods leverage both available labeled data and abundant unlabeled data. Extensive experiments on the CIC-DDoS2019 dataset show promising results; both the decision tree with entropy-based uncertainty sampling and the co-training with random forest models achieve 99% accuracy. Furthermore, the UNSW-NB15 dataset is introduced to conduct a comparative analysis between base models (random forest, decision tree, and logistic regression) when using only labeled data and the proposed models when using partially labeled data. The proposed methods demonstrate superior results when using 1%, 10%, and 50% labeled data, highlighting their effectiveness and potential for improving intrusion detection systems in scenarios with limited labeled data.
Keywords: machine learning; big data; intrusion detection; semi-supervised learning machine learning; big data; intrusion detection; semi-supervised learning

Share and Cite

MDPI and ACS Style

Williams, B.; Qian, L. Semi-Supervised Learning for Intrusion Detection in Large Computer Networks. Appl. Sci. 2025, 15, 5930. https://doi.org/10.3390/app15115930

AMA Style

Williams B, Qian L. Semi-Supervised Learning for Intrusion Detection in Large Computer Networks. Applied Sciences. 2025; 15(11):5930. https://doi.org/10.3390/app15115930

Chicago/Turabian Style

Williams, Brandon, and Lijun Qian. 2025. "Semi-Supervised Learning for Intrusion Detection in Large Computer Networks" Applied Sciences 15, no. 11: 5930. https://doi.org/10.3390/app15115930

APA Style

Williams, B., & Qian, L. (2025). Semi-Supervised Learning for Intrusion Detection in Large Computer Networks. Applied Sciences, 15(11), 5930. https://doi.org/10.3390/app15115930

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop