1. Introduction
Network Intrusion Detection Systems (NIDS) have been actively studied as a core technology for securing modern infrastructure by identifying anomalous behaviors from network traffic [
1,
2,
3]. However, modern network attacks are becoming increasingly difficult to distinguish based solely on the statistical features of individual packets or flows. In particular, many threats manifest incrementally through temporal traffic variations and behavioral patterns that accumulate over time. This creates security blind spots for conventional static analysis frameworks that treat traffic as isolated observations.
Consequently, modern NIDS are transitioning toward time-series-based approaches that model network traffic as continuous sequences. By modeling temporal dependencies between current observations and their past context, time-series-based NIDS can analyze the broader flow and historical context of malicious activities. This allows them to capture not only short-term statistical anomalies but also the continuity of actions and evolutionary patterns in traffic. As a result, time-series-based NIDS provide a more dynamic and sophisticated detection capability than static packet/flow-based and rule-based NIDS. To effectively learn such temporal dependencies, various neural network models have been adopted in time-series-based NIDS, including RNN, LSTM, and GRU [
4,
5,
6]. These recurrent architectures retain information from previous time steps and model sequence patterns over time, enabling the temporal representation of network traffic sequences.
However, these characteristics can also pose structural limitations for time-series-based NIDS. Since such systems often learn dataset-specific temporal patterns, their performance depends heavily on the traffic structure and protocol interactions observed during training. Consequently, their performance can degrade sharply when deployed in heterogeneous network environments that differ from the training domain. For example, a system optimized for enterprise networks often performs poorly in wireless or IoT environments, making it necessary to adapt the model to each new domain rather than relying on direct transfer alone. To address this issue, recent studies have explored various cross-domain adaptation approaches, including feature standardization across datasets, federated learning frameworks, and ensemble-based architectures. However, these approaches often require complex training procedures or large model architectures, resulting in inefficient deployment in resource-constrained environments.
In this paper, to demonstrate the performance degradation of time-series-based NIDS under heterogeneous network conditions, we first conduct a cross-domain performance analysis using multiple benchmark datasets. Specifically, we train RNN- and LSTM-based models on four benchmark datasets with distinct attack scenarios and feature sets [
7]. By evaluating each pre-trained model on the remaining unseen datasets, we show that detection performance drops significantly when the model is exposed to unfamiliar traffic patterns and protocol interactions. To address this degradation, we then propose a LoRA-based cross-domain adaptation method that selectively adjusts the fully connected layers of time-series-based NIDS models. Instead of retraining the entire model, we freeze the backbone responsible for temporal feature extraction and apply LoRA modules only to the fully connected layers that determine classification decisions. This design enables efficient adaptation to new network environments by preserving temporal representations learned by the backbone while adjusting the classification boundaries to the target domain.
The main contributions of this paper are summarized as follows:
Analysis of cross-domain performance degradation: We demonstrate that time-series-based NIDS suffer from substantial performance degradation when models trained on a source domain are evaluated on unseen target datasets with different traffic distributions.
LoRA-based cross-domain adaptation for time-series-based NIDS: We propose a LoRA-based cross-domain adaptation method for time-series-based NIDS to mitigate performance degradation under domain shift through parameter-efficient adaptation without full model retraining. To the best of our knowledge, this is the first work that applies LoRA to time-series-based NIDS in a cross-domain adaptation setting.
Experimental validation across multiple network datasets: From the experimental results using multiple network datasets, we show that the proposed LoRA-based adaptation method consistently improves cross-domain detection performance while maintaining parameter efficiency.
The rest of the paper is organized as follows: In
Section 2, we provide preliminaries on time-series-based NIDS and LoRA, followed by a review of related works on domain adaptation. In
Section 3, we describe the data preprocessing procedure and alignment process, and then we analyze cross-domain performance degradation across different datasets. In
Section 4, we present a LoRA-based domain adaptation method. In
Section 5, we provide the implementation details and evaluate the effectiveness of the proposed method through cross-domain experiments. In
Section 6, we discuss the limitations and implications of the proposed approach in terms of data selection, domain variability, and scalability. Finally, we conclude the paper in
Section 7.
2. Preliminaries and Related Work
2.1. Time-Series-Based NIDS
Time-series-based NIDS models network traffic as time-ordered input sequences to capture temporal correlations and dynamic variability beyond individual packet analysis. Specifically, traffic statistics are segmented by a sliding window into fixed-length sequences of length
T, denoted as
, where
. Here,
represents the feature vector at time
t, and
d denotes the feature dimension. These sequences are then processed by recurrent architectures, such as RNN and LSTM, that model sequential dependencies [
8]. Typically, such architectures maintain temporal context by updating the hidden state using the current input and the previous hidden state, i.e.,
, where
denotes the recurrent operation. A classifier is then applied to the last hidden state
or to an aggregated output over
. Through this process, intrusion detection is formulated as a binary or multi-class classification problem that determines whether the input sequence is normal or malicious.
Previous studies have shown that recurrent architectures are effective for modeling temporal patterns in network traffic [
8]. For example, Yin et al. proposed a deep learning-based intrusion detection system (RNN-IDS) that utilizes recurrent neural networks to model network traffic as sequential data [
4]. Unlike traditional machine learning approaches, the RNN architecture introduces recurrent connections that enable the model to retain information from previous time steps. This allows the model to capture temporal dependencies in network traffic and improves intrusion detection performance in both binary and multi-class classification tasks. Kim et al. proposed an LSTM-based intrusion detection system to address the limitations of conventional RNNs in learning long-term dependencies in network traffic [
5]. By mitigating the gradient-related issues in standard RNNs, the model incorporates LSTM cells with input, forget, and output gates, allowing it to selectively retain and propagate relevant temporal information. As a result, the model can effectively capture sequential patterns in network traffic and enhance detection performance across various attack types.
2.2. LoRA
LoRA (Low-Rank Adaptation) is a representative parameter-efficient fine-tuning (PEFT) technique designed to mitigate the inefficiency of full-parameter fine-tuning [
9,
10]. The core idea is to freeze a pre-trained weight matrix
and approximate task-specific updates using the product of two small low-rank matrices,
B and
A. Specifically, the adapted weight is expressed as
where only the low-rank matrices
A and
B are updated during training. This design is motivated by the observation that task-specific adaptation can be achieved with a limited number of parameter adjustments, without updating the entire parameter space. As a result, LoRA reduces the number of trainable parameters and the associated training overhead, leading to lower memory consumption and faster training. Moreover, since the backbone parameters remain shared and only the LoRA modules need to be adapted or replaced for each target domain, LoRA enables efficient domain-specific adaptation. During inference, LoRA operates by merging the low-rank correction term into the pre-trained weights, thereby introducing negligible overhead. These properties make LoRA well-suited to cross-domain adaptation in NIDS, where models must be efficiently adapted to diverse and evolving network environments.
2.3. Related Work
Most studies on NIDS evaluate models by splitting a single dataset into training and testing sets. In such settings, the training and testing data belong to the same domain. However, this approach has limitations in assessing whether a model can maintain reliable detection performance in new network environments that are not represented during training. Even within the same domain, temporal changes in traffic distribution can degrade the performance of NIDS trained on historical data. These limitations motivate the need for effective cross-domain adaptation methods that can transfer intrusion detection models to new network environments. According to prior research, AI-based NIDS tends to show a significant decline in detection performance when evaluated on data outside the training domain. Cantone et al. demonstrated that existing machine learning-based NIDS suffer from limited cross-dataset generalization due to dataset heterogeneity [
11]. This result highlights the need for methods that can adapt intrusion detection models to new environments more effectively. Existing efforts to address this problem have mainly focused on feature standardization, federated learning, and ensemble-based modeling. For example, Sarhan et al. applied a standardized feature set based on NetFlow and CICFlowMeter across multiple NIDS datasets [
12]. They showed that a common feature set can improve both cross-dataset detection performance and explainability in machine learning-based NIDS. In addition, de Carvalho Bertoli et al. proposed a stacked structure combining unsupervised Autoencoders (AE) and Energy Flow Classifiers, trained via Federated Learning (FL) [
13]. Their approach achieved improved cross-domain intrusion detection performance without requiring direct data sharing.
More recently, several studies have explored advanced domain adaptation techniques to improve cross-domain NIDS performance. For instance, Amin et al. proposed a stacked ensemble structure combining heterogeneous deep learning models to enhance detection performance across different network domains [
14]. Their results showed that ensemble-based architectures can improve cross-domain intrusion detection performance compared to single-model methods. Furthermore, Guerziz et al. proposed an adversarial domain adaptation framework (E-ADDA) that aligns feature representations between source and target domains to improve generalization across heterogeneous network environments [
15]. Their framework incorporates adversarial training along with domain-aware design components, such as data augmentation and uncertainty-aware ensemble mechanisms, to enhance robustness under domain shift. While recent domain adaptation methods have shown promising improvements in cross-domain NIDS performance, they often rely on complex feature alignment mechanisms or require substantial architectural modifications, which may limit their applicability in resource-constrained or real-time environments.
To address these limitations, we adopt LoRA as a practical and efficient adaptation strategy for cross-domain NIDS. Unlike structured adaptation approaches, LoRA enables parameter-efficient fine-tuning without modifying the original model architecture, allowing seamless integration into pre-trained models. This lightweight and modular design facilitates plug-in adaptation and efficient management of domain-specific adapters, while introducing negligible inference overhead. These properties make LoRA particularly suitable for real-world time-series NIDS, where flexibility, scalability, and low deployment cost are essential. Furthermore, as demonstrated in our ablation study, LoRA achieves a favorable trade-off between performance and efficiency, providing substantial improvements in detection performance with minimal additional computational cost.
3. Cross-Domain Performance Analysis
In this section, we experimentally analyze the cross-domain performance degradation of time-series-based NIDS.
3.1. Data Alignment and Preprocessing
3.1.1. Datasets
In this paper, we selected four datasets with diverse attack scenarios and feature configurations: KDD99 [
16,
17], UNSW_NB15 [
18,
19], CICIDS2017 [
20], and AWID3 [
21,
22]. These datasets are chosen because they span heterogeneous traffic environments, attack scenarios, and feature spaces, thereby providing an appropriate benchmark for analyzing cross-domain performance degradation in time-series-based NIDS. The characteristics of each dataset are summarized as follows:
KDD99 [
16]: A classic benchmark based on the 1998 DARPA evaluation data. It consists of 41 features and a label, and its attacks are categorized into four major groups comprising 22 specific attack types. A key characteristic of KDD99 is the distributional discrepancy between its training and test sets. Specifically, the test set includes 14 novel attack types that do not appear in the training set.
UNSW_NB15 [
18]: It was developed at the UNSW Canberra Cyber Range Lab to address the limitations of legacy datasets. It includes hybrid traffic that combines real-world normal activities with synthetically generated attack behaviors. The dataset consists of 49 features and nine attack classes, and it reflects diverse and sophisticated modern threats compared to traditional datasets.
CICIDS2017 [
20]: It was created by the Canadian Institute for Cybersecurity and includes recent attack scenarios collected in a realistic environment. It contains 79 features and 14 attack classes, and the traffic was collected under diverse operating systems and network settings.
AWID3 [
22]: It is a state-of-the-art wireless NIDS dataset that extends the previous AWID2 corpus. It captures diverse attacks in an IEEE 802.1X EAP environment across more than 10 different client devices and cloud-based applications. The dataset contains 254 features manually extracted from both MAC and application layers, which enables detailed analysis of wireless network intrusions.
3.1.2. Data Preprocessing
In this section, we describe a unified preprocessing pipeline that is applied to all datasets before training to ensure consistent feature-space alignment and reproducibility in cross-domain experiments. This step is particularly important because cross-domain evaluation is highly sensitive to inconsistencies in feature definitions and data formats across datasets.
First, we standardized the feature schema by removing unnecessary whitespace from column names and labels. This step prevents semantically identical features from being treated as distinct variables and ensures uniform feature representation across datasets. Next, we corrected invalid numerical values in the raw traffic features. Since negative values were treated as invalid observations in our preprocessing setting, we replaced all negative entries with 0. In addition, after converting infinity (Inf) values to NaN, we removed columns containing NaN values and duplicate records to avoid numerical instability during training. Then, we encoded categorical features into integer representations using LabelEncoder. This avoids the dimensionality increase caused by one-hot encoding and helps maintain a consistent feature representation across heterogeneous datasets. Finally, we normalized numerical features to the range of [0, 1] for stable optimization and balanced feature scaling. Here, we excluded integer-encoded categorical attributes and labels from normalization. For labeling, we adopted a binary labeling scheme by mapping normal traffic to 0 and all attack traffic to 1. We also stored the feature schema and preprocessing parameters as metadata so that the same preprocessing rules could be applied consistently across all source and target domains.
3.1.3. Synchronization Strategies
Since the datasets used in this study were collected from heterogeneous environments and protocols, their features and formats are not fully aligned. This heterogeneity makes it difficult to train and evaluate time-series-based NIDS using a consistent input structure. To ensure consistent cross-domain analysis, we define an explicit feature synchronization protocol based on a reference schema.
First, the dataset used to train the backbone model is designated as the reference dataset, and its feature schema is used as the standard input structure. All other target datasets are then aligned to this reference schema. To ensure reproducibility, each feature is assigned to exactly one primary semantic family. The semantic families include temporal features, traffic volume/size features, count/rate features, protocol/service/connection-state features, error/retransmission/anomaly indicators, header/window/control features, host/destination/traffic-distribution features, content/application/authentication features, identifier/device-specific metadata, and wireless PHY/MAC security metadata. To avoid ambiguity, feature assignment follows a fixed priority rule. Identifier/device-specific fields and wireless PHY/MAC security metadata are first assigned to their respective semantic families, even when they overlap with other categories. Host-level statistical features are then prioritized, followed by discrete protocol and state-related fields. The remaining continuous features are assigned based on their closest semantic role in temporal, size, count/rate, error, header/window/control, or content/application/authentication categories.
Feature mapping is performed only within the same semantic family according to a strict priority rule. Exact semantic matching is first applied when features share the same definition, directionality, and statistical meaning. If an exact match is not available, aggregated-equivalent matching is used when the same physical quantity is represented at a different granularity, such as mapping packet-level statistics to flow-level aggregates. When neither exact nor aggregated matches exist, behavioral proxy matching is applied to align features that serve a similar functional role in traffic representation. Finally, features without reliable semantic correspondence are treated as unsupported and are consistently zero-filled to preserve the input structure.
The feature mapping rules are manually defined, and the transformation is applied deterministically to ensure consistent and reproducible alignment across datasets. As a result, all experiments are conducted under a unified input structure, enabling consistent cross-domain evaluation while preserving as much semantic correspondence as possible.
3.2. Model Training Strategy
In this paper, we used RNN- and LSTM-based models as representative backbone architectures for time-series-based NIDS. In total, we trained eight backbone models using RNN- and LSTM-based architectures that take network flow sequences as input. RNN-based models capture sequential temporal dependencies, whereas LSTM-based models further capture long-term dependencies through their internal gating mechanisms. The input was represented as a three-dimensional time-series tensor with shape (batch_size, sequence_length, feature_dim). To prevent data leakage, each dataset is split into training, validation, and test sets prior to sequence construction. Sequences are generated within each split using a sliding window (stride = 1), and each sequence is labeled based on its last timestep to reflect the current traffic state.
Each model was trained on a single source-domain dataset and evaluated on the remaining target datasets to analyze cross-domain performance degradation. In addition, we performed random undersampling at the sequence level on the majority (benign) class (0) so that its sample size matches that of the attack class (1), thereby mitigating the severe class imbalance during training. This strategy reduces the tendency of the models to be biased toward the majority class and enables a more consistent evaluation of detection performance. Based on these settings, we trained backbone models using different source datasets. Specifically, we constructed eight backbone models, consisting of RNN and LSTM architectures trained on four datasets (KDD99, UNSW_NB15, CICIDS2017, and AWID3). Each model was evaluated across three target datasets, resulting in a total of 24 cross-domain adaptation settings. These pre-trained models are subsequently used as backbone models for the cross-domain evaluation and LoRA-based adaptation described in the following sections.
3.3. Evaluation Metrics
To quantify the degradation of detection performance under cross-domain settings, we assess the models using standard classification metrics, including accuracy, precision, recall, and F1-score. These metrics are defined for the binary classification setting adopted in this study, where the positive class corresponds to attack traffic and the negative class corresponds to normal traffic. Specifically, True Positive (TP) denotes attack traffic correctly predicted as an attack, and True Negative (TN) denotes normal traffic correctly predicted as normal. False Positive (FP) denotes normal traffic incorrectly predicted as an attack, and False Negative (FN) denotes attack traffic incorrectly predicted as normal.
Each evaluation metric is defined as follows. Accuracy is defined as the proportion of correctly classified samples among all predictions
Accuracy provides useful information when the class distribution is relatively balanced. However, NIDS datasets are typically highly imbalanced, with normal traffic significantly outnumbering attack traffic. In this setting, a model can achieve high accuracy by predicting the majority (normal) class while failing to detect attacks. This limitation becomes more critical in cross-domain settings, where class distributions and decision boundaries may vary across datasets. Thus, accuracy alone is insufficient to evaluate detection performance in NIDS.
Precision is defined as the proportion of correctly detected attack traffic among all samples predicted as attacks
Precision quantifies the reliability of attack predictions by measuring how often predicted attacks are truly attacks. A higher precision indicates fewer false alarms caused by normal traffic being misclassified as attacks. Thus, precision is important in NIDS for evaluating the false alarm behavior of the detection system.
Recall is defined as the proportion of actual attack traffic that is correctly detected
In this study, the recall values reported in the main experimental tables correspond to weighted recall, which aggregates per-class recall values defined above using class-support-based weighting under the original data distribution. Recall measures attack detection capability by quantifying the proportion of attack traffic that is successfully detected. A higher recall indicates fewer false negatives, meaning that fewer attacks are missed by the detection system. Thus, recall is critical in NIDS, where missed attacks can lead to significant security risks.
The F1-score is defined as the harmonic mean of precision and recall
By jointly considering precision and recall, the F1-score provides a balanced measure of detection performance. Thus, it is suitable for imbalanced settings such as NIDS datasets and is useful for assessing overall detection performance under cross-domain shifts.
3.4. Cross-Domain Performance Degradation
In this section, we experimentally evaluate the performance of RNN- and LSTM-based NIDS models on unseen target datasets and analyze performance degradation under cross-domain settings.
Table 1 and
Table 2 present the detection performance of RNN- and LSTM-based models when trained on different source datasets and evaluated on both same-domain and cross-domain datasets. When training and evaluation are performed on the same domain, both models achieve consistently high performance across all datasets. For example, on the KDD99 dataset, both RNN and LSTM models achieve accuracy, precision, recall, and F1-score values exceeding 0.99. Similarly, for the other datasets, the models maintain high performance with average scores above 0.97.
However, when evaluated on unseen target datasets, a substantial degradation in detection performance is observed compared to the same-domain results. For example, when an RNN model trained on the UNSW_NB15 dataset is evaluated on the KDD99 dataset, the accuracy drops to 0.7615, indicating a moderate absolute performance level but a substantial decline relative to the same-domain setting.
Precision and recall also decrease to 0.6743 and 0.6684, respectively. Overall, cross-domain detection performance often declines to below 0.6, and in some cases approaches zero, indicating a severe degradation in attack detection capability. For example, when a model trained on the KDD99 dataset is evaluated on the CICIDS2017 dataset, both accuracy and recall decrease sharply to around 0.2411, indicating that the model fails to maintain effective detection performance.
In particular, recall shows the most critical degradation, indicating that the models frequently fail to detect actual attacks under cross-domain settings. This pattern is especially evident for the AWID3 dataset, which reflects wireless network environments. In this case, accuracy remains relatively high (above 0.7), whereas precision, recall, and F1-score drop significantly to below 0.01. This indicates that the model tends to classify most traffic as normal, thereby failing to detect actual attack traffic. This pattern further confirms that accuracy alone is insufficient for evaluating NIDS under cross-domain settings.
These results demonstrate that deep learning-based NIDS models are highly dependent on the characteristics of the source-domain training data, and their cross-domain detection performance degrades substantially when the network environment and traffic distribution change. These findings motivate the need for cross-domain adaptation methods that can recover detection performance under domain shift without requiring full model retraining.
4. LoRA-Based Domain Adaptation
In this paper, we formulate cross-domain adaptation as a low-rank adjustment of a source-trained model, rather than full retraining for each target domain. Thus, we apply a LoRA-based adaptation strategy to improve cross-domain detection performance under domain shift.
Specifically, the backbone layers of the source-trained RNN- and LSTM-based NIDS models are kept entirely frozen. LoRA modules are then applied only to the fully connected layers (fc1 and fc2) responsible for final classification decisions. These layers directly map the extracted temporal representations to the final intrusion labels and are therefore the most likely to require domain-specific adjustment. The overall structure of the proposed LoRA-based adaptation framework is illustrated in
Figure 1. This design is motivated by the following considerations:
Hierarchical Role Separation and Partial Cross-Domain Transferability: The RNN and LSTM backbones extract time-series patterns and temporal dependencies from network flows. These temporal feature representations may capture partially reusable temporal patterns across domains, even though they are not fully invariant to domain shift.
Domain Sensitivity of Classification Boundaries: Conversely, distributional shifts between domains are often reflected in changes to the classification decision boundaries that map extracted features to benign or malicious labels. Therefore, rather than modifying the entire backbone, we consider it more effective to calibrate only the decision boundaries within a low-dimensional space at the classifier level.
When applying LoRA, the original weight matrix
of a selected fully connected layer remains frozen. Here,
and
denote the input and output dimensions of the selected fully connected layer, respectively, and
r denotes the LoRA rank. Instead, we introduce low-rank matrices
and
to define the weight update as follows:
In this configuration, the trainable parameters are restricted to the low-rank matrices A and B. This approach shifts the adaptation process from optimization in the full parameter space to optimization in a lower-dimensional space, which improves parameter efficiency and can help reduce the risk of overfitting under domain shift. Under this setting, the source-trained backbone remains fixed, and only the low-rank adaptation parameters are optimized using target-domain data. In our experiments, the source-trained RNN- and LSTM-based NIDS models serve as fixed backbone models. We then apply feature mapping and LoRA-based low-rank adaptation to unseen target datasets. For adaptation, each target dataset is divided into two disjoint subsets, where 80% is used for adaptation and the remaining 20% is reserved for evaluation, ensuring that no overlap exists between adaptation and evaluation data. This pipeline enables efficient cross-domain adaptation without requiring full model retraining. The detailed training procedure of the proposed LoRA-based adaptation method is summarized in Algorithm 1.
| Algorithm 1: Cross-domain NIDS training with LoRA adaptation |
| Input: Source dataset , Target dataset , sequence length L, LoRA rank r, scaling factor , epochs E |
| Output: Adapted model |
Initialization:- 1:
Load source-trained base model trained on - 2:
Construct and from - 3:
Attach LoRA modules to selected layers - 4:
Initialize LoRA parameters - 5:
Freeze all source-trained parameters in - 6:
Initialize optimizer for - 7:
|
Procedure:- 1:
for epoch to E do - 2:
Set model to training mode - 3:
for each do - 4:
Preprocess input x - 5:
- 6:
- 7:
Backpropagate ℓ and update - 8:
end for - 9:
Evaluate on - 10:
if then - 11:
- 12:
- 13:
- 14:
end if - 15:
end for - 16:
return
|
5. Experiments
5.1. Implementation Details
In this section, we describe the implementation details and training settings of the LoRA-based adaptation method introduced in
Section 4. Consistent with the adaptation strategy described in
Section 4, only the LoRA parameters
A and
B are updated during adaptation, while all backbone parameters remain frozen to maintain parameter efficiency. We applied LoRA modules to the fully connected layers of the model, for which the input–output dimensions of the fc1 and fc2 layers are
and
, respectively. Here,
denotes the hidden size and
denotes the number of classes. We also configured LoRA with a rank of
and a scaling factor of
.
The backbone model consists of two-layer bidirectional recurrent architectures (RNN and LSTM), with a hidden size of 256 and a dropout rate of 0.2. We set the input sequence length to 10 and the batch size to 512, and trained the models using the AdamW optimizer [
23] with a learning rate of
. The sequence length was set to 10 as a trade-off between capturing short-term temporal dependencies in network traffic and maintaining computational efficiency during training. To improve numerical stability during backpropagation, we applied gradient norm clipping with a threshold of 1.0 [
24]. We also applied NaN-to-zero replacement and value clipping to all input tensors to reduce instability caused by missing values or extreme outliers during the feature mapping process. These preprocessing steps help prevent numerical divergence within the network.
All experiments were implemented using PyTorch and conducted on a workstation equipped with an NVIDIA GeForce RTX 5070 GPU with CUDA 12.8 support. This design allows us to manage lightweight LoRA adapters in a plug-in manner for each domain, enabling rapid and flexible domain adaptation across diverse network environments while reusing a single backbone model. The detailed training configuration and experimental environment are summarized in
Table 3.
5.2. Results and Analysis
In this section, we evaluate attack detection performance under cross-domain settings by applying LoRA-based adaptation across multiple source-target dataset pairs. We analyze performance in terms of accuracy, precision, recall, and F1-score, with particular emphasis on recall because it is a critical metric in NIDS. Since the primary objective of NIDS is to accurately detect malicious traffic, improvements in recall directly indicate enhanced detection capability under cross-domain settings.
Table 4 and
Table 5 present the cross-domain detection performance after LoRA-based adaptation. As shown in
Table 4 and
Table 5, applying LoRA-based adaptation leads to consistent improvements in cross-domain detection performance across most dataset combinations. In particular, the most pronounced gains are observed in recall, indicating improved attack detection under domain shift. Specifically, when the backbone model suffers from severe performance degradation, adaptation yields substantial recovery in detection performance. In such cases, adaptation not only improves performance but also restores effective attack detection capability. For example, as shown in
Table 1, when a model trained on the KDD99 dataset is applied to the CICIDS2017 dataset, the backbone recall drops to 0.2411, indicating a significant degradation in attack detection capability. However, after adaptation (
Table 4), recall increases to 0.8916, recovering to a level at which malicious traffic can be reliably detected. This demonstrates that, in certain domain combinations, the effect of adaptation goes beyond incremental improvement and results in substantial recovery of detection performance.
A more extreme pattern is observed for the AWID3 dataset, which represents a wireless network environment. In this case, the backbone model tends to classify most traffic as normal, resulting in near-zero recall, which indicates a near-complete failure in attack detection. This behavior is further illustrated in
Figure 2, which presents confusion matrices for a representative cross-domain setting. As shown in
Figure 2, the backbone model fails to correctly identify attack samples, whereas LoRA-based adaptation significantly improves attack detection performance. After adaptation, recall increases to above 0.7, showing that the model can identify a substantial portion of previously undetected attack traffic. This indicates that the degradation caused by domain shift is not an inherent limitation of the model, but rather a problem that can be effectively mitigated through appropriate adaptation strategies.
In addition, the following observations can be drawn from
Table 4 and
Table 5. First, from a model architecture perspective, both RNN and LSTM models exhibit similar performance improvement patterns after adaptation. In particular, recall consistently shows the most significant improvement across both models. However, the LSTM model generally showed more stable performance than the RNN model and achieved higher post-adaptation performance in most cases. This suggests that the stronger temporal modeling capability of LSTM may help capture complex sequential patterns more effectively, and that this property may be better leveraged under domain shift when combined with adaptation.
Second, in terms of domain discrepancy, the experimental results indicate that larger domain differences lead to greater performance gains after adaptation. When the datasets share similar network environments, such as wired network datasets, both models already exhibited moderate source-trained backbone performance, and the additional benefit of adaptation remains relatively limited. In contrast, when the domain difference is substantial, such as between wired and wireless environments, backbone performance drops significantly, and adaptation leads to substantial recovery. These observations suggest that domain discrepancy between datasets may have a greater impact on adaptation effectiveness than the choice of model architecture.
Overall, these results demonstrate that LoRA-based adaptation effectively restores degraded detection performance in cross-domain settings, particularly when the domain discrepancy is large. This suggests that, in our setting, addressing distribution mismatch between datasets may play a more critical role than modifying the model architecture itself. Furthermore, these findings highlight that domain adaptation techniques play an important role in achieving robust detection performance in real-world network environments, where diverse traffic patterns coexist.
5.3. Ablation Study
5.3.1. LoRA Hyperparameters
In this section, we conduct an ablation study to analyze performance changes according to LoRA hyperparameters, namely the scaling factor
and the rank
r. Specifically, we consider
and
, and evaluate all combinations across multiple source-target dataset pairs. As shown in
Figure 3, the effect of LoRA hyperparameters is dataset-dependent. Some source-target pairs show clear performance changes depending on
r and
, whereas others exhibit only minor variations. This indicates that the optimal LoRA configuration for one source-target dataset pair does not necessarily generalize to all other domain shifts. In particular, higher-rank configurations such as
can achieve the highest score in some cases, but they do not consistently dominate across all settings.
Therefore, rather than selecting the best-performing configuration for a single dataset pair, we seek a stable hyperparameter setting that performs consistently across diverse cross-domain conditions. Based on these observations, the choice of and provides a practical trade-off between adaptation capacity, stability, and parameter efficiency.
Since the number of trainable LoRA parameters increases linearly with r, using requires only half the trainable parameters of , while still achieving competitive performance across the evaluated source-target pairs. Moreover, corresponds to a moderate scaling factor of , thereby avoiding excessively small or overly aggressive low-rank updates. Therefore, although , is not the absolute best configuration for every dataset pair, the ablation results support it as a stable and parameter-efficient default for general cross-domain adaptation.
5.3.2. Low-Label Adaptation
In this section, we analyze the effectiveness of domain adaptation under low-label settings using LoRA-based adaptation. As shown in
Table 6 and
Table 7, applying target domain adaptation leads to consistent improvements in both F1-score and recall across multiple dataset pairs, even when only a small fraction of labeled target data is available. Importantly, adaptation maintains competitive detection performance in many cases, even with only 5% or 10% of labeled data, demonstrating robustness under limited supervision. This indicates that a significant portion of cross-domain performance degradation can be mitigated using a small amount of labeled target data.
Although performance further improves as more labeled data becomes available, the overall trend suggests that adaptation does not rely heavily on large-scale labeled data. Instead, it achieves meaningful performance gains even in low-label scenarios. The degree of improvement varies across dataset pairs, particularly in more challenging domain shifts. Nevertheless, recall remains relatively high in many cases, indicating that attack detection capability is largely preserved under limited supervision.
To ensure the reliability of these results, each low-label experiment is repeated five times using different random samplings, and the results are reported as mean ± standard deviation. For each run, the target-domain training subset is constructed using stratified random sampling to preserve the original class distribution. Although the original class distribution is maintained during sampling, random undersampling is applied at the sequence level during training, resulting in an approximately 1:1 ratio between benign and attack samples. Furthermore, the dataset splitting and adaptation protocol follow the procedures described in
Section 3.2 and
Section 4. Specifically, each dataset is divided into training, validation, and test subsets prior to sequence construction to prevent data leakage, and the same protocol is consistently applied across all low-label experiments. Overall, these results demonstrate that domain adaptation is the primary factor in recovering cross-domain performance, while the proposed LoRA-based approach enables efficient adaptation with minimal additional parameters.
5.4. Baseline Comparison
In this section, we compare LoRA-based adaptation with full fine-tuning and head-only tuning in terms of both performance and computational efficiency. These baselines were selected to represent two complementary adaptation strategies: full parameter adaptation and classifier-level adaptation. This allows us to more clearly isolate the contribution of LoRA as a parameter-efficient alternative within a controlled comparison setting. As shown in
Table 8, full fine-tuning consistently achieves the highest performance across all dataset pairs, particularly in recall, reflecting its strong capacity to fully adapt model parameters to the target domain. However, full fine-tuning requires updating all model parameters and retraining the entire network for each target domain, which leads to substantial computational cost, training time, and memory usage. This makes it impractical in scenarios involving multiple domains or frequent domain shifts, where efficient and repeated adaptation is required.
In contrast, both head-only tuning and LoRA-based adaptation achieve comparable performance in most cases. This suggests that a significant portion of cross-domain performance recovery can be attributed to classifier-level adaptation, without modifying the entire model. Notably, the proposed LoRA-based adaptation shares a similar adaptation objective with head-only tuning, as both focus on modifying the classifier-level decision boundary. While head-only tuning directly updates the classifier parameters, the proposed approach applies low-rank updates to the same layer in a parameter-efficient manner.
From an efficiency perspective, the proposed LoRA-based approach provides clear advantages over full fine-tuning by significantly reducing the number of trainable parameters and overall computational cost. Compared to head-only tuning, the proposed approach achieves similar performance while requiring fewer trainable parameters, indicating improved parameter efficiency. While the gains in training time and memory usage are comparable, the proposed approach offers an additional advantage through its structured and modular design. First, from a parameter allocation perspective, the proposed LoRA-based approach enables more efficient utilization of a limited parameter budget through low-rank decomposition, compared to directly updating dense classifier weights. Second, the proposed approach naturally supports adaptation across multiple target domains by maintaining separate lightweight adaptation modules for each domain. These modules can be independently stored, reused, and selectively applied without modifying the backbone model. In contrast, head-only tuning requires maintaining separate classifier parameters for each domain, which becomes less flexible as the number of target domains increases. Third, the modular design of the proposed approach allows efficient storage and management of domain-specific adapters. Since each adaptation module consists of only a small number of parameters, it is more suitable for scalable deployment across multiple environments compared to storing multiple independently fine-tuned classifier heads. Finally, the low-rank constraint in the proposed approach can act as an implicit regularization mechanism, which may help reduce overfitting under limited target-domain data. This property is particularly beneficial in low-label or domain-shift scenarios, where overfitting is more likely to occur.
Overall, these results demonstrate that while full fine-tuning yields the best performance, the proposed LoRA-based approach achieves a favorable trade-off between performance and efficiency. In addition, it offers further benefits in terms of scalability, modularity, and robustness in multi-domain deployment scenarios.
6. Discussion
In this section, we discuss the implications and limitations of the experimental design in terms of data selection, domain variability, and scalability, with a focus on cross-domain adaptation.
First, regarding data selection, the datasets were intentionally selected from different collection periods and network environments, including both wired and wireless domains, to increase heterogeneity and reduce direct overlap in traffic characteristics. This setting provides a meaningful benchmark for analyzing cross-domain performance degradation and the effectiveness of adaptation. However, the study still relies on a limited number of public benchmark datasets, which may not fully represent all real-world deployment scenarios. In addition, differences in collection conditions do not completely eliminate potential inconsistencies in labeling or partial overlap in attack characteristics.
Second, with respect to domain variability, the inclusion of both wired and wireless datasets increases the diversity of domain conditions considered in this work. The experimental results show that performance degradation and recovery are strongly dependent on the degree of domain discrepancy. However, the observed performance degradation cannot be attributed solely to intrinsic traffic differences. Since the datasets are not fully aligned in the feature schema, part of the degradation may reflect abstraction introduced during feature synchronization. In particular, while coarse temporal, size, and protocol-related features can be aligned, some dataset-specific semantics, especially wireless-specific attributes, cannot be fully preserved and must be simplified during mapping. Therefore, the observed performance reflects the combined effect of domain discrepancy and unavoidable information loss during alignment, which also influences the effectiveness of adaptation.
Finally, regarding scalability, the proposed pipeline enables efficient repeated experimentation by reusing the same preprocessed and feature-aligned input data across multiple experimental settings. This design is particularly beneficial in evaluating adaptation strategies across multiple domain pairs. However, extending the framework to new datasets still requires manual schema inspection, feature mapping design, and validation of unsupported fields. As the number of datasets increases, this manual effort may become a bottleneck. This suggests that future work should focus on automated feature alignment and scalable adaptation mechanisms to further improve applicability in diverse real-world environments. Furthermore, this study focuses on aggregate performance under a binary classification setting to ensure consistent cross-domain evaluation across heterogeneous datasets. While this design enables a unified evaluation framework, further analysis at the attack-type or category level could provide deeper insights into the detection of rare or more challenging attacks. We consider this an important direction for future work. In addition, considering the increasing prevalence of wireless network environments, such as IoT and 5G networks, future work may further extend the proposed framework to domain-specific scenarios.
7. Conclusions
In this paper, we demonstrated that time-series-based NIDS suffer from substantial performance degradation in cross-domain environments, particularly when the source and target domains differ significantly. We showed that applying target-domain adaptation is essential for recovering degraded detection performance. In particular, adaptation consistently improves detection capability across diverse source-target pairs, with the most notable gains observed in recall, indicating effective recovery of attack detection performance under domain shift. In this framework, LoRA serves as a parameter-efficient mechanism to implement adaptation without requiring full model retraining.
However, several limitations remain. First, the degree of recovery strongly depended on the dataset pair and the magnitude of domain discrepancy. In particular, wireless-domain cases such as AWID3 exhibited severe backbone failure before adaptation, and although adaptation consistently improves performance, the extent of recovery varies across different domain combinations. Second, performance is influenced not only by traffic-domain differences but also by feature synchronization constraints, as some dataset-specific semantics cannot be fully preserved due to manual mapping and zero-filling. Third, the ablation study showed that the selected LoRA configuration , provides a stable trade-off, but is not universally optimal, suggesting that adaptation behavior remains dependent on the characteristics of each domain pair.
Overall, these results suggest that domain adaptation is the primary factor in mitigating cross-domain performance degradation, while the proposed LoRA-based approach provides an efficient and scalable approach to implement such adaptation. Although adaptation significantly alleviates performance degradation, it does not fully resolve the broader generalization problem.
Future work will focus on improving both the robustness and scalability of cross-domain adaptation. In particular, we will investigate automated semantic feature alignment methods to reduce the reliance on manual mapping and improve scalability across heterogeneous datasets. We will also explore adaptive LoRA deployment strategies, such as maintaining a library of domain-specific LoRA adapters and dynamically selecting or combining them based on the target domain. Furthermore, we plan to extend our experiments to more realistic low-label and unlabeled target-domain settings to better reflect practical deployment scenarios.