1. Introduction
In recent years, with the rapid development of the Internet and information technology, network security problems have become increasingly serious [
1]. Network intrusion, defined as unauthorized access to a computer system or network, may lead to data leakage, system paralysis, or even serious economic losses. Intrusion detection techniques [
2] are widely used to monitor and identify abnormal behaviors in real time to enhance network security. Traditional intrusion detection methods are mainly based on rules [
3] and statistical analysis, such as expert-defined feature rules, signature detection [
4], and anomaly detection. Traditional intrusion detection methods are ineffective because rule-based methods are difficult to update in a timely manner, and statistical-based methods suffer from high false alarm rates and low generalization capabilities.
With the development of deep learning, intrusion detection methods based on deep learning have received increasing attention [
5]. In 2015, Wang et al. pioneered the reconstruction of raw network packets from byte streams into grayscale images [
6], a groundbreaking work that revealed the potential of convolutional neural networks (CNNs) in handling protocol field spatial correlation. Hindy et al. proposed a dual temporal–spatial convolutional architecture [
7], which effectively solves the heterogeneous characterization problem of discrete log data and continuous flow data by transforming the temporal features recorded by NetFlow into a two-dimensional spectrogram. Mazid et al. developed a 3D-CNN framework by fusing three-channel inputs—data packet length, time interval, and protocol type [
8]—which achieved a high recall rate for detecting abnormal behavior in IoT devices, verifying the unique advantage of CNNs in high-dimensional spatiotemporal feature extraction.
Currently, existing binary classification detection models lack fine-grained identification of specific attack types. Developing multi-classification schemes is crucial for formulating precise, attack-specific defense strategies, and effective data preprocessing plays a key role in enhancing model generalization and accuracy. Furthermore, existing models struggle with large-scale, heterogeneous traffic data. Specifically, reliance on a single CNN model often fails to meet detection requirements, and insufficient feature generalization impedes the balance between accuracy and robustness across diverse intrusion types.
To address these limitations, this paper designs an interpretable, high-precision network intrusion detection (HMK-ST) model integrating feature dimensionality reduction, semantic mapping, and transfer learning. To address the high dimensionality, nonlinearity, and low signal-to-noise ratio of network data, this paper proposes a hybrid multi-model feature selection and kernel dimensionality reduction algorithm, which maps the high-dimensional features to a low-dimensional space to achieve feature dimensionality reduction and enhance nonlinear separability. A network traffic visualization algorithm is proposed based on semantic feature mapping that encodes structured data into color semantic images. Furthermore, an ensemble convolutional neural network model is constructed to perform multiclass detection on these generated images, enabling highly accurate network intrusion detection.
In this paper, we validate the performance of the HMK-ST model on the UNSW-NB15 and CICIDS 2017 datasets. Performance analysis shows that on the UNSW-NB15 dataset, the HMK-ST model achieved an accuracy of 99.99% and an F1 score of 99.98%. On the CICIDS 2017 dataset, it achieved an accuracy of 99.96% and an F1 score of 99.91%.
The rest of this document is organized as follows:
Section 2 examines the current literature for network intrusion detection methods and introduces the general framework of the model in this paper.
Section 3 describes the web intrusion detection (HMK-ST) model with feature dimensionality reduction, semantic mapping, and transfer learning.
Section 4 validates the performance of the HMK-ST model proposed in this paper using a dataset.
Section 5 summarizes the experimental results of the model in this paper and suggests future research directions for the model.
2. Related Work
Due to the increasing complexity of network environments and the diversification of attack methods, building an efficient, accurate, and generalizable network intrusion detection system (IDS) has become a research hotspot. In recent years, researchers have proposed a large number of intrusion detection methods based on traditional machine learning, deep learning, and multi-model fusion, and have made significant progress.
In order to improve the model’s ability to express complex attack patterns, many studies have made innovative designs of the neural network structure. Geng Zhiqiang et al. [
9] proposed an incremental intrusion detection model, incorporating a sparse self-attention mechanism, which significantly reduces computational complexity and maintains a good ability to memorize the old classes in a dynamic environment. Lu Haotian et al. [
10] designed a continuous detection method based on a two-layer structure and indicator distribution, which effectively improved the model’s ability to recognize unknown classes. In terms of feature extraction, Tikhe et al. [
11] constructed the FACS2ANet model by combining feature attention and a convolutional sparse self-encoder, which is suitable for intrusion detection in NFV virtual network environments. Jawad et al. [
12], on the other hand, proposed the XGRU-IDS model, which fuses Extra Trees classifiers with GRU networks, and introduced the SHAP interpretation mechanism, thereby improving the interpretability of the model in industrial IoT environments. Deep reinforcement learning techniques are widely used in intrusion detection tasks with high adaptability and dynamically changing samples. Hu Yutao et al. [
13] constructed an IoT-ONDDQN model based on the CIC-IoT dataset, introduced the Noisynet strategy to replace the traditional
-greedy algorithm, realized a more exploratory training process, and improved the detection effect of malicious traffic in IoT. Jue Bo et al. [
14], on the other hand, proposed an adaptive feature fusion mechanism and a metric learning strategy, which realized the effective identification of multiple classes of attacks under small samples. The automation of model structure design has also been investigated.
The automation of model architecture design has also attracted considerable research attention. Yang Yeming et al. [
15] proposed the EMR-NID method to improve the model’s generalization ability and adversarial robustness through multi-task robust architecture search, while Prabu et al. [
16] combined the greedy sand cat swarm optimization algorithm with a dual attention graph convolutional network to construct an IDS architecture with strong expressive ability for complex network traffic features. Graph neural networks (GNNs) and sequence modeling techniques are widely used to capture structured and temporal information in network traffic. Specifically, GNNs excel at modeling the natural graph structure of network data, such as the connections between hosts, packets, or users, to detect sophisticated relational patterns that traditional methods might miss. Francis et al. [
17] designed a feature delineation mechanism based on the firefly algorithm to enhance the robustness of attack detection in software-defined networks (SDNs). In the domain of GNNs, Sun et al. [
18] proposed GNN-IDS, which incorporates both an attack graph and real-time measurements to represent complex computer networks. By learning network connectivity, their model can quantify the importance of neighboring nodes and features to make more reliable predictions and identify malicious actions causing anomalies. Lo et al. [
19] presented E-GraphSAGE, a practical GNN approach that captures both edge features and topological information for intrusion detection in IoT networks using flow-based data. Their extensive evaluation showed that it outperforms state-of-the-art methods, demonstrating the strong potential of GNNs for NIDS. Additionally, Tran [
20] addressed the challenge of preprocessing network data for GNN models by proposing an innovative method to extract relevant features from flow data to create nodes and edges, significantly enhancing IDS performance in detecting network attacks. Mohammed et al. [
21] evaluated the performance of various recurrent neural network architectures and proposed introducing a DNN with an attention mechanism to enhance the model’s recognition ability. Derui Guo et al. [
22] constructed a TRBMA model that incorporates 1D-TCN, ResNet18, BiGRU, and a multi-head attention mechanism, achieving excellent detection results on the CIC-IDS2017 dataset. Congcong Li et al. [
23], on the other hand, proposed the DSC-Inception-BiLSTM network, which uses a parallel structure to extract the spatial and temporal information of image and textual traffic features, respectively, to further improve the classification accuracy.
In resource-constrained IoT and industrial control systems, research focuses mainly on model lightweighting, accuracy, and multi-protocol compatibility. Mohamed et al. [
24] constructed a CNN-GRU-LSTM three-layer model, which takes into account both spatial and temporal feature extraction in industrial CPS environments. Kalaivani et al. [
25] combined CNN with a restricted Boltzmann machine (RBM) to effectively improve the modeling capability of complex features, while Alabbadi et al. [
26] proposed a cross-domain residual LSTM structure, incorporating an attention mechanism to adapt to the task of intrusion detection in heterogeneous IoT environments. Liu Liwei et al. [
27] proposed an MRID model using a multi-scale residual temporal convolutional network with a traffic attention mechanism to achieve efficient real-time intrusion detection in fog computing architecture. Song et al. [
28] constructed a TGA model that fuses TCN, BiGRU, and an attention mechanism to effectively capture temporal feature variations. Ling et al. [
29] proposed a two-phase EGAT structure that introduces a graph attention mechanism to further enhance the edge information modeling capability. Mao et al. [
30] constructed the FCN-Transformer architecture, which realizes the extraction of multi-level features from raw traffic.
In feature engineering, Qingyuan Peng et al. [
31] used the gray wolf algorithm to optimize DBN-SVM, which significantly improves the classification performance on NSL-KDD and UNSW-NB15 datasets. Bouzaachane et al. [
32] fused integrated learning and deep neural networks to construct a multilevel IDS model adapted to the modern complex network environment. Govindarajan et al. [
33] proposed a cloud-based intrusion detection framework fusing graph features and contrast learning, which effectively improves the accuracy of the model in heterogeneous environments. Jessie Rani et al. [
34] combined fuzzy integration and genetic optimization to maintain a high accuracy rate under small sample conditions. Farouqui et al. [
35] proposed the SafetyMed system for IoMT, which combines CNN and LSTM to simultaneously cope with image and sequence intrusion data. Du et al. [
36] constructed an MBConv-ViT model based on the fusion of Vision Transformer and MobileNet to effectively improve the IoT attack detection capability. Sun et al. [
37] proposed an adversarial sample generation method based on the analysis of model simulation to improve the model robustness through adversarial training. Du Xinying et al. [
38] proposed a multi-encoder feature fusion mechanism for the structural characteristics of the CAN bus to achieve accurate identification of injection attacks. Luo et al. [
39] designed the MCLDM, a multi-channel contrastive learning network that combines supervised and unsupervised feature reconstruction strategies, which is suitable for feature modeling and classification of high-dimensional nonlinear traffic data. Neeraj Kumar et al. [
40] proposed a multi-channel deep learning network for feature modeling and classification of high-dimensional nonlinear traffic data by introducing an improved deep learning structure and feature selection strategy, which demonstrated superior performance on multiple publicly available datasets.
Notably, with the development of distributed learning paradigms such as federated learning [
41,
42,
43], intrusion detection research is expanding into privacy-preserving directions. Federated learning enables multiple devices to jointly train shared models without sharing raw data, thereby protecting data privacy [
41]. However, this distributed learning paradigm is vulnerable to adversarial attacks, particularly data poisoning [
41] and model poisoning attacks [
42]. Research indicates that among data poisoning attacks like label flipping, feature poisoning, and VagueGAN, feature poisoning subtly degrades model performance by modifying high-impact features identified by random forest techniques [
41]. Regarding model poisoning attacks, studies have found that federated networks exhibit strong robustness even when servers randomly adopt aggregation rules like Krum and Trimmed Mean in each federated learning round [
42]. To enhance the security of federated learning systems, researchers have proposed various defense mechanisms. Randomized deep feature selection effectively mitigates the impact of feature poisoning attacks by randomizing server features of varying sizes [
41]. Additionally, robust federated aggregation (RFA) methods, based on geometric median aggregation updates, demonstrate greater resilience against potential poisoning of local data or model parameters. Variants include one-step robust aggregation and device-side personalized schemes [
43]. Meanwhile, while large language models show promise in threat intelligence analysis, their computational overhead in real-time detection remains a consideration.
In summary, current intrusion detection technology exhibits trends of diversified model structures, fused feature modeling, and lightweight deployment methods. The research focus gradually shifts from traditional static identification toward dynamic adaptation, multi-source data fusion, and enhanced cross-domain generalization capabilities. In this paper, we use hybrid multi-model feature selection and kernel-based dimension reduction (HMK) to map the high-dimensional features of network traffic data into a low-dimensional space, achieving feature dimensionality reduction. Subsequently, the structured traffic data resulting from dimensionality reduction is transformed into visualized images using semantic features. The structured traffic data after dimensionality reduction is transformed into visualized images using semantic features. Then, we construct a detection model based on transfer learning and an ensemble convolutional neural network to perform network intrusion detection. The overall framework of the HMK-ST model is shown in
Figure 1, which is mainly composed of four modules, namely, data preprocessing, feature selection and dimensionality reduction, traffic data visualization, and intrusion type detection. The data preprocessing module cleans, encodes, and normalizes the data; the feature selection and dimensionality reduction module maps the high-dimensional features into a low-dimensional space, enhances nonlinear separability, and balances accuracy and complexity; the traffic data visualization module transforms the structured traffic data into visual semantic images; the intrusion type detection module constructs an ensemble network model composed of ResNet50, VGG16, and Inception_v3. The sub-model fine-tuning is performed through transfer learning, and the static fusion strategy is utilized to assign the sub-model weights to perform multi-classification detection of network intrusions.
5. Conclusions
Driven by 5G and other ultra-high-speed network technologies, the complex environment of the Internet of Everything makes network attack methods increasingly diverse, and traditional detection methods face the severe challenges of high-dimensional data and nonlinear features. The HMK-ST model proposed in this paper not only achieves anomaly detection of network traffic, but also accurately classifies multiple types of attacks (e.g., DoS, port scanning) by integrating transfer learning and semantic visualization techniques. For high-dimensional noisy data, the model combines hybrid multi-evaluation feature selection with a supervised kernel dimensionality reduction algorithm (HMFS-SKDR) to effectively extract key features and enhance nonlinear differentiability; meanwhile, it converts structured traffic into discriminative images through semantic feature mapping, which significantly improves the pattern recognition capability of convolutional neural networks. On the UNSW-NB15 and CICIDS 2017 datasets, which are close to real scenarios, the accuracy of HMK-ST reaches 99.99% and 99.96%, and the F1 scores are 99.98% and 99.91%, respectively, which verifies its high efficiency and robustness in complex attack detection. However, the computational complexity of the model is high, which may affect real-time performance, and the generalization ability to unknown attacks still needs to be strengthened. In future work, the model will focus on lightweight architectural design and incremental-learning strategies to further enhance its adaptive defense capability in a dynamic network environment.
Although the HMK-ST model demonstrates outstanding performance on two benchmark datasets, it still has certain limitations.
First, the computational complexity of the model poses a potential bottleneck for deployment on resource-constrained edge devices or in network environments with stringent real-time requirements. Future research will focus on model lightweighting and optimization. By adopting advanced techniques such as pruning, quantization, and knowledge distillation, the model’s parameter count and computational overhead can be significantly reduced. This will enhance detection speed, lower energy consumption, and improve practicality.
Second, current models require strengthened defense capabilities against unknown zero-day attacks. While their core strength lies in fine-grained classification of known attack types, their potential to detect unknown attacks as anomalies is constrained by predefined visualization mapping rules. Therefore, a key future direction is developing adaptive or learning-based visualization mechanisms that can automatically discover optimal visual representations for novel attack patterns without human intervention. Advanced methods such as generative models or attention mechanisms may play a pivotal role in this endeavor. Simultaneously, we will incorporate incremental learning algorithms, enabling models to continuously learn from emerging attack data and update themselves without full retraining. This approach facilitates long-term, effective defense against dynamic network threat environments.
Furthermore, novel datasets like UWF-ZeekData24, featuring updated attack vectors and rich Zeek log context, provide an ideal testbed for evaluating model generalization in real-world network environments. We plan to immediately validate the HMK-ST model’s performance on this dataset and further refine its feature selection and visualization strategies to ensure resilience against evolving cyber threats.
Finally, we will explore innovatively applying the core technological innovations of this research to a privacy-preserving federated learning framework. Specifically, we plan to deploy the HMK feature selection and dimensionality reduction modules across client nodes in a federated learning setup. This enables each client to extract standardized, low-dimensional, and highly discriminative features locally without sharing data, before uploading them to the server for aggregation and global model training. This approach holds promise for rigorously safeguarding data privacy while pooling collective efforts to enhance the robustness and accuracy of intrusion detection models. It also effectively mitigates potential issues such as poisoning attacks in federated learning environments.