You are currently viewing a new version of our website. To view the old version click .
Future Internet
  • Article
  • Open Access

4 December 2025

Evaluating Synthetic Malicious Network Traffic Generated by GAN and VAE Models: A Data Quality Perspective

,
,
and
Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Adversarial Attacks and Cyber Security

Abstract

The limited availability and imbalance of labeled malicious network traffic data remain major obstacles in developing effective AI-driven cybersecurity solutions. To mitigate these challenges, this study investigates the use of deep generative models, specifically Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), for producing realistic synthetic attack data. A comprehensive data quality assessment (DQA) framework is proposed to thoroughly evaluate the fidelity, diversity, and practical utility of the generated data samples. The findings support the adoption of data synthesis as a viable strategy to address data scarcity, improving robustness and reliability in modern cybersecurity applications and sectors.

1. Introduction

The continuous advancement of communication technologies has facilitated the transmission of heterogeneous and multimodal data across different network environments [1]. In addition to this, the proliferation of Internet of Things (IoT) devices has brought about unparalleled levels of connectivity. From laptops, tablets, and smartphones to smart appliances and industrial equipment, an ever-expanding range of devices is continuously producing enormous volumes of data. While this interconnectivity offers important benefits and drives innovation, it also increases the malicious attack surface [2]. Malicious actors swiftly exploit this expanded attack surface, targeting vulnerabilities in traditional and IoT devices [3]. Some examples of attacks launched by malicious actors include malware dissemination [4], botnet attacks [5], zero-day exploits [6], and man-in-the-middle (MitM) attacks [7]. The cost of cybercrime at a global level is expected to increase from 9.22 trillion dollars in 2024 to 13.82 trillion dollars in 2028 [8]. The cost of a single data breach is also growing. During 2024, a data breach cost 4.88 million dollars on average, 10% higher than the average cost in 2023 [9].
Based on the above, the effective tackling of cyberattacks and, more specifically, the detection of network intrusion attempts is crucial. Modern technologies like AI network intrusion models are highly dependent on quality training data. In this light, the augmentation of existing or newly collected data is an active research topic that is constantly evolving. Generative Adversarial Networks (GANs), which were introduced by Ian Goodfellow et al. [10] in 2014, have the ability to generate almost identical data records to those provided as inputs. More specifically, GANs use two competing neural networks: a generator creates synthetic data samples, while a discriminator tries to distinguish them from real data. Through this adversarial training, the generator learns to produce highly realistic synthetic data that mimics the original dataset’s distribution, thereby expanding and diversifying the training set for improved model performance. Thus, GANs can prove quite effective and useful for applications that require data augmentation.
Variational Autoencoders (VAEs) are another emerging generative technique for data augmentation, especially in data-intensive applications. VAEs were first introduced by Kingma and Welling in 2013 [11] and further explained by the same authors in 2019 [12]. As the GANs described earlier, the VAEs consist of two linked models: an encoder (recognition model) and a decoder (generative model). The encoder approximates the latent posterior distribution for the decoder, enabling Expectation Maximization-style learning. Following this procedure, a VAE can sample from this distribution and generate new, synthetic data points that are similar to the original training data.
However, while the use of generative models for network intrusion detection has gained significant momentum, the quality of the produced synthetic data remains a critical concern. Most existing works evaluate synthetic malicious traffic only in terms of downstream detection accuracy, i.e., whether a model trained on synthetic data can classify real malicious samples. Although useful, this approach does not guarantee that the generated samples preserve the statistical properties, diversity, or behavioral structure of real attack traffic. Poor-quality synthetic data may, therefore, make the detection results look better than they really are, while it may fail to accurately represent real-world threats. This gap highlights the need for a systematic and multi-dimensional evaluation methodology that considers not only classification utility but also statistical similarity, feature-level fidelity, structural coherence, and distinguishability of synthetic data samples from real data samples. This perspective is essential to ensure that synthetic malicious traffic is truly representative and reliable for further intrusion detection research and deployment.
In this study, the focal point is the data augmentation of network intrusion detection systems (NIDS). The promising capabilities of GAN and VAE technologies enable the research community to lift existing barriers that are posed, from data scarcity and inadequate data quality to the field of cybersecurity, and specifically, for network intrusion detection. By examining both GAN and VAE technologies and their performance in the task of data generation, useful insights are provided that can aid future research to better align efforts for synthetic data generation and augmentation, which can be used as training data in NIDS. In this light, four different synthesizers were employed, and the data generated from them was evaluated based on a set of different metrics. Those data synthesizers are as follows:
  • Gaussian Copula.
  • CTGAN.
  • Tabular Variational Autoencoder (TVAE).
  • CopulaGAN.
This study evaluates the synthesizer efficiency and synthetic data quality based on comprehensive fidelity and similarity metrics, using the CICIDS2017 dataset [13] as original data. The goal is to determine the optimal approach for this specific task by assessing both data similarity and generative performance.
The remainder of the paper is organized as follows: Section 2 presents related works, focusing on the domain of cybersecurity and mainly revolving around GANs, (T)VAEs, and their combination. Section 3 describes in detail the proposed methodology that was designed and developed, whilst Section 4 elaborates on the produced results. Finally, Section 5 concludes the paper.

3. Methodology

Section 3 presents the methodological approach used to assess the quality and utility of synthetic malicious network traffic generated by deep generative models. The focus of the study lies in a comparative evaluation of four synthesizers, namely, the Gaussian Copula [48], representing statistical generative models, CTGAN [43,49], a deep generative adversarial network model tailored for tabular data, TVAE [43,50], a variational autoencoder designed to handle mixed-type tabular datasets and a deep generative adversarial network model tailored for tabular data, and CopulaGAN [43,51], a hybrid model that combines statistical techniques with deep generative learning. Through a series of controlled experiments, these models are trained on real malicious traffic data, and the extracted synthetic datasets are evaluated using various quantitative metrics and visual diagnostic tools. These evaluations encompass statistical similarity, likelihood-based measures, detection-based scores, machine learning efficacy assessments, privacy checks, and qualitative visualizations. The methodology ensures that each synthesizer is evaluated consistently across the same datasets and under comparable experimental conditions.

3.1. Dataset and Preprocessing Steps

3.1.1. Dataset Description

The experimental analysis in this study is based on the CICIDS2017 [13] dataset, a comprehensive intrusion detection benchmark developed by the Canadian Institute for Cybersecurity (CIC). The dataset was specifically designed to address the limitations of previous intrusion detection datasets by providing realistic traffic patterns, diverse attack scenarios, and a complete set of network flow features suitable for anomaly-based detection research.
The dataset comprised five consecutive days of captured network traffic, recorded between 3 and 7 July 2017, within a fully configured network environment that included a variety of operating systems (Windows, Ubuntu, and Mac OS) and both benign and malicious activities. The attacks covered a broad range of categories, including Brute Force (FTP, SSH), DoS, DDoS, Heartbleed, web attacks, infiltration, and botnets, executed across different network nodes and scenarios. The dataset includes both full packet captures (PCAPs) and labeled network flow summaries extracted using CICFlowMeter, resulting in a rich feature set of over 80 flow-based attributes per record. The CICIDS2017 dataset initially consisted of approximately 2,099,316 records, encompassing both benign and various categories of malicious network traffic.
Given the scope of this study, which focuses on the generation and evaluation of synthetic malicious traffic for intrusion detection purposes, only malicious instances were selected for further analysis. This filtering step was essential to ensure that the generative models exclusively learned patterns associated with attack behaviors, without interference from benign traffic features and characteristics. The malicious dataset, after this consideration, served as the standardized input for all subsequent preprocessing, model training, and evaluation phases described within this study. To this end, 100,000 malicious records were carefully filtered and selected to maintain a representative distribution across the various attack categories, ensuring both data diversity and training feasibility.

3.1.2. Preprocessing Procedure

The initial dataset comprised a total of 87 features, where a wide range of network flow statistics, header attributes, flag counters, and time-based measurements had been captured. However, many of these features were either redundant, exhibited high multicollinearity, or provided little discriminative power for generative modeling purposes. Particularly, features related to per-packet statistics, fine-grained timing intervals, flag counts with constant values, and bulk transfer metrics were excluded to reduce dimensionality and minimize noise that could adversely affect the training of generative models. The final feature set was selected based on domain knowledge, relevance to traffic characterization, and its overall contribution to the variance of the dataset. The final dataset comprised a total of 19 attributes, plus the target label columns that were retained, as they provided a balanced combination of flow-level statistics and essential protocol indicators, ensuring a representative and manageable input for both synthetic data generation and subsequent evaluation actions.
To ensure consistency and compatibility with the selected synthesizers, the dataset has undergone a structured preprocessing pipeline, as summarized below:
  • Feature Selection and Cleaning: Non-informative columns, such as identifiers (Flow ID), source/destination IP addresses, port numbers, and timestamps, were removed. These attributes were considered irrelevant for pattern-based analysis and could introduce unnecessary noise or risk of memorization.
  • Handling Missing and Infinite Values: Instances containing non-available (NaN) or infinite values were addressed by either imputation or removal, depending on their frequency and overall impact. Columns with excessive missing values were excluded.
  • Normalization of Numerical Features: Continuous numerical features were normalized to a common scale, using the min–max scaling method. This step ensured a balanced contribution of all features during the generative model training.
  • Final Dataset Format: The extracted dataset was structured as a flat table with mixed numerical features, void of identifiers or timestamp dependencies. This standardized format ensures direct applicability to all selected synthesizers in a fair comparative setting.
The preprocessed dataset serves as the unified input for all subsequent generative experiments, ensuring that each synthesizer is evaluated under identical conditions with the same feature space and data distribution. Table 1 presents a summary of the key feature types included within the preprocessed dataset. The selected features represent key flow-level and protocol-level characteristics that are commonly used to profile network behavior and distinguish between different types of malicious activity. They include temporal attributes (e.g., flow duration), directional packet and byte counts, throughput indicators (e.g., bytes/s, packets/s), header-level information for both forward and backward flows, and protocol-specific identifiers such as ICMP type and code. Together, these features capture volume patterns, communication directionality, and traffic rates, as well as protocol semantics, providing a compact yet representative set of attributes suitable for generative modeling representation and statistical comparison. Features removed during preprocessing primarily included fine-grained timestamp components, flow-direction or flag counters, and other low-variance or redundant counters that did not contribute meaningful variability to the modeling process.
Table 1. Key feature types included in the preprocessed, final dataset.

3.2. Description of Generative Models

The core of this study revolves around evaluating synthetic data-generation capabilities based on the selected three models (synthesizers), as described within the introduction of this section. These models are representative of different approaches to tabular data generation, ranging from classical statistical methods to deep learning-based architectures. By leveraging different models with varying theoretical foundations, this work aims to provide a comprehensive comparative quality assessment of the extracted generated results.
The following subsections outline the selected four synthesizers, their underlying methodologies, and the key configuration parameters considered during experimentation.

3.2.1. Gaussian Copula Synthesizer

The Gaussian Copula Synthesizer [48] is a statistical model that captures dependencies among variables by modeling their joint distribution using copulas. This approach transforms marginal distributions into a standard normal space, fits a Gaussian copula to the transformed data, and then samples synthetic records by reversing the transformation. Its main strength lies in its computational efficiency, interoperability, and ability to capture linear and non-linear dependencies between different variables.
As a purely statistical method, this synthesizer serves as the baseline model in the comparative study implemented within this study. Given its deterministic nature, it offers limited parameter tuning, primarily related to data handling and sampling strategy.

3.2.2. CTGAN Synthesizer

The CTGAN Synthesizer [49] is a commonly known deep generative adversarial network (GAN) designed specifically for tabular data. Unlike traditional GANs, CTGAN employs a conditional generation mechanism that addresses challenges unique to structured datasets, including imbalanced categorical distributions and mixed data types. This algorithm introduces mode-oriented normalization techniques and conditional vector sampling to improve learning stability and sample diversity.
CTGAN includes several tunable hyperparameters that influence its learning abilities, such as the number of epochs, the batch size, the architecture/dimensions of the generator network (hidden layers), and the architecture of the discriminator network.
During the experiment phase in Section 4, these parameters are systematically varied to observe their impact on synthetic data quality generation.

3.2.3. TVAE Synthesizer

The TVAE Synthesizer [50] is based on the VAE framework, adjusted for tabular data generation. VAEs learn a latent space representation of the input data through an encoder–decoder architecture, enabling the generation of new data samples by sampling from the latent space. TVAE handles mixed-type data and supports conditional sampling processes, making it suitable for structured datasets with both categorical and continuous variables. The key tunable parameters of TVAE include the number of epochs, the batch size, and the architecture of the encoder and the decoder networks. As with the CTGAN, the experiments are conducted with parameter variation to evaluate the performance and sensitivity of the data samples generated.

3.2.4. CopulaGAN Synthesizer

The CopulaGAN Synthesizer [51] combines both statistical modeling with deep learning throughout the integration of copula-based data transformations with GAN training. This hybrid approach aims to leverage the strengths of both methodologies, including the dependency capturing of copulas and the generative flexibility of GANs.
During this study, its inclusion provides an additional comparative perspective, especially regarding the trade-off between interpretability and generative capacity.

3.2.5. Summarization of Models’ Configurations

Each synthesizer is trained independently on the preprocessed malicious traffic dataset under identical experimental conditions. Hyperparameters are systematically varied in a controlled manner to assess their influence on the synthetic data-generation process. Table 2 summarizes the key configuration parameters for each synthesizer considered within this study.
Table 2. Key configuration parameters for each synthesizer.

3.3. Experimental Setup

During the experimental phase of this study, the design was structured to systematically evaluate the behavior of each synthesizer under predefined and controlled conditions. By applying uniform data preparation and training procedures as outlined in Section 3.2, the objective is to ensure a fair comparison across the selected generative models. This section details the training protocol, configuration variations, and the synthetic data-generation process employed in the study.

3.3.1. Training Steps and Practices

The training phase constitutes a critical part of the current experimental methodology, during which each synthesizer learns the statistical properties and dependency structures of the real network traffic dataset. To ensure fairness and consistency, all models are trained individually on the same preprocessed dataset described in Section 3.2. No additional filtering, sampling, or feature transformation was applied beyond what was performed during the preprocessing stage.
Each synthesizer utilizes its own internal training mechanism, as detailed and implemented within the SDV library [52]. For deep generative models, specifically for the CTGAN, TVAE, and CopulaGAN synthesizers, the training procedure involves iterative optimization using stochastic gradient descent methods. The models update their parameters in response to the reconstruction level or adversarial losses computed over each batch of data, aiming to capture both marginal distributions and inter-feature relationships as presented within the input dataset.
On the other hand, the Gaussian Copula Synthesizer employs a statistical fitting approach without iterative training. It estimates marginal distributions and dependency structures directly using copula transformations and multivariate normal modeling, producing a deterministic model after a single fitting pass over the data.
To ensure stability and repeatability, each training run was initialized with a random seed that was controlled where possible. Furthermore, all synthesizers are trained until they complete the predefined number of epochs (in the case of deep models) or until the fitting process converges (for the statistical-oriented Gaussian Copula model). In the post-training stage, the fitted models are saved and used exclusively for synthetic data generation in the subsequent evaluation phase.
This consistent training methodology ensures that the comparison across the four different models focuses on the inherent qualities of the synthetic data they produce, without interference from differences in data preparation or inconsistent training practices.

3.3.2. Hyperparameter Variation Experiments

A fundamental step of this study is to assess and analyze the capabilities and limitations of each generative model in correlation with the variation in its configuration parameters. To this end, a series of controlled experiments is conducted, in which selected hyperparameters are systematically varied for the deep generative models. This approach allows the study to analyze not only the performance of each model under default settings but also its sensitivity to different architectural and training configurations.
The primary parameters selected for variation include the number of training epochs, the batch size, and the network dimensions of key model components (such as generators and discriminators for the GAN models, encoders, and decoders for the TVAE model). These parameters were chosen due to the fact that they directly affect the model’s learning dynamics, convergence behavior, and its ability to generalize from the input data without observing overfitting or underfitting phenomena. By varying them systematically across predefined ranges, the study aims to capture a holistic view of each model’s generative performance spectrum.
As previously noted, the Gaussian Copula model does not require hyperparameter variation, as it relies on deterministic statistical fitting without iterative training. In this study, this model serves primarily as a baseline reference point for comparative purposes. It is also important to state that the hyperparameters used in this study were not obtained through an automated model-tuning procedure (e.g., grid search or Bayesian optimization). Instead, the selected values represent commonly adopted and recommended configurations for tabular synthesizers, as stated in the SDV documentation and the related literature. The aim of the variation setup was to assess each model’s stability and behavior under representative configurations rather than to identify a globally optimal setting. Table 3 summarizes the hyperparameter configurations selected for the experimental analysis.
Table 3. Hyperparameter variation summary.

3.4. Data Generation Procedure

Following the completion of the training phase for each synthesizer, synthetic datasets have been generated for evaluation purposes. This step of synthetic data generation is considered quite critical, since it directly reflects the learned ability of each model to reproduce the statistical and structural properties of real network traffic datasets. For the TSTR evaluation, we report each result as a pair of values. The first value corresponds to the difference in F1-score between a classifier trained on synthetic data and the same classifier trained on real data, while the second value reflects the corresponding difference in accuracy. A value close to zero indicates that the synthetic dataset preserves the discriminative structure of the real dataset, whereas negative values represent performance degradation when training on synthetic samples. These two metrics jointly provide a practical measure of the utility of the generated data for downstream intrusion detection tasks.
For each trained model and configuration, synthetic samples are generated using the respective model’s sampling method. The following principles are applied in a uniform way across all experiments to ensure comparability across them:
  • Sample Size Matching: The number of synthetic records generated for each configuration is set equal to the size of the real dataset used during training. This matching ensures a fair basis for statistical comparison and metric computation.
  • Feature Space Consistency: The synthetic dataset maintains the same dimensionality and feature types (numerical and categorical) as the original, preprocessed one. This consistency ensures that all evaluation metrics, especially those based on statistical tests and deep learning (DL) models, can be applied directly without additional preprocessing.
  • Sampling Repetition: To account for randomness inherent in deep generative models, such as weight initialization and sampling variability, each configuration parameterization was executed in three independent sampling runs. This approach mitigates the impact of outlier detection and supports the assessment of model robustness.
  • Post-Generation Processes: The synthetic datasets are inspected for data integrity, ensuring that generated samples do not contain invalid values (e.g., non-available/NaN or infinite values). If such cases have been detected, they are addressed following the same handling or removal strategy applied during the preprocessing stage of the real data.
  • Data Preservation for Evaluation: All synthetic datasets have been kept in their raw generated form, without additional transformations applied. These datasets serve as direct input for the evaluation metrics detailed in Section 3.5.
This systematic generation procedure ensures that comparisons between real and synthetic datasets are based on uniform sample sizes, consistent feature spaces, and reproducible results across repeated sampling runs. By following this methodology, the present study aims to maximize the reliability and validity of the data quality assessments on the generated synthetic datasets.

3.5. Evaluation Metrics

The evaluation of the quality of the synthetic dataset can be considered as a multivariate task that requires a combination of quantitative metrics and qualitative analyses. To comprehensively assess the generated synthetic network traffic, this study employs a structured evaluation framework that incorporates statistical fidelity tests, likelihood-based measures, detection-based metrics, utility assessments, privacy checks, and visual diagnostics. Each category of metrics targets a particular aspect of data quality, ensuring a holistic evaluation of the generative models’ performance. The use of multiple evaluation categories is necessary because a single metric cannot fully capture data quality. For example, a model may reproduce statistical distributions accurately but still produce samples that may be easily distinguishable by machine learning classifiers. Similarly, a synthetic dataset may achieve a high similarity degree to real data but fail to transfer learned patterns effectively in other downstream tasks. Therefore, combining statistical, discriminative, and utility-based evaluations enables a more reliable and comprehensive assessment of model performance.
In summary, this multi-layered evaluation strategy provides a more detailed interpretation of data quality. Sample-level metrics reveal whether generated instances appear realistic, distribution-level metrics assess structural and statistical similarities, detection metrics quantify distinguishability, and utility metrics reflect practical downstream applicability. By combining these complementary perspectives, the present study adopts a balanced and comprehensive approach for evaluating the accuracy and usefulness of synthetically generated malicious network traffic. Table 4 summarizes the complete set of evaluation metrics employed in this study, categorized by their primary assessment objective.
Table 4. Evaluation metrics categorized by the primary assessment objective.

3.6. Summary of Experiment Execution Workflow

Table 5 provides a summary of the variation experiments and their corresponding parameter configurations. As previously noted, the Gaussian Copula Synthesizer does not require parameter variation, since it serves as a baseline with a default configuration.
Table 5. Summary of the variation experiments.

4. Discussion

Section 4 includes the evaluation of the statistical quality of the synthetic malicious network traffic generated by the four selected models: Gaussian Copula Synthesizer, CTGAN, TVAE, and CopulaGAN. The assessment focuses on the ability of each model to replicate the statistical properties of the real dataset, capturing both marginal distributions and inter-feature dependencies. A set of statistical fidelity metrics—as summarized in Table 4—including distribution similarity measures, correlation analyses, and likelihood-based scores, is applied consistently across all models and configurations (Table 5). The evaluation has been structured per model, providing insights into the generative behavior of each approach under the experimental conditions defined in Section 3.
However, beyond the per-model evaluation, a comparative approach is essential to understanding how the generative mechanisms of each synthesizer affect the quality of the extracted data. Therefore, this section emphasizes cross-model interpretation and highlights the strengths and limitations of each generative approach in relation to distributional accuracy and structural realism.

4.1. Experimental Dataset Preparations

For the experimental purposes of the proposed synthetic data-generation framework, a carefully curated subset of the CICIDS2017 dataset has been implemented. As established within the methodology part, the original dataset contained both benign and malicious network traffic records, totaling 2,099,316 entries. However, since the study focuses exclusively on the generation and quality assessment of synthetic malicious traffic, only attack-related instances have been considered for the experiments.
The initial filtering phase excluded all benign samples, resulting in a malicious-only sample, totaling 516,755 records. Despite its relevance, using the full malicious dataset for training the generative models may pose significant challenges related to computational cost, training convergence, and potential class imbalance issues. Specifically, deep generative models such as GANs and VAEs are sensitive to both the scale and distribution of the input data, often requiring balanced and well-structured datasets for effective learning. Thus, a stratified downsampling approach was applied to the malicious dataset to ensure both representativeness and manageability.
To this end, the malicious samples were grouped by attack type, and up to 10,000 instances per attack category were randomly selected using a controlled random seed for reproducibility. For attack categories with fewer than 10,000 available records, all samples were retained. The extracted balanced malicious dataset comprised a total of 43,176 records, distributed across six distinct attack categories, as summarized in Table 6. This balanced dataset was used as the standard input for all subsequent training phases of the generative models and the experimental evaluation presented in the following sections.
Table 6. Composition of the extracted balanced malicious dataset.

4.2. Gaussian Copula Evaluation and Results—Experiment 1 (E1)

The evaluation of the Gaussian Copula model results in Experiment 1 (E1) starts with its statistical fidelity and diagnostic performance. The SDV Diagnostic Tests report perfect scores for Data Validity (100%) and Data Structure (100%), resulting in an overall diagnostic score of 100%, which confirms that the generated synthetic data strictly adheres to the expected schema, feature constraints, and value ranges of the real dataset. However, the SDV data quality assessment yields a lower Column Shapes Score (76.9%) and a Column Pair Trends Score (90.7%), culminating in an Overall Data Quality Score of 83.9%. These results suggest that while Gaussian Copula maintains the integrity of basic structural characteristics, noticeable discrepancies persist in its ability to replicate the detailed statistical relationships of the real data. The overall KS Test Score of 76.9% reflects this average statistical alignment between real and synthetic samples. The outcomes of the Kolmogorov–Smirnov Two-Sample Test further illustrate the limitations of Gaussian Copula in accurately capturing feature distributions. As observed, features such as protocol, icmp code, and icmp type record no significant differences (p-value = 1.0), indicating effective modeling of discrete attributes. Conversely, all continuous features exhibit significant differences (p-value = 0.0000), underscoring persistent challenges in emulating numerical flow characteristics. The Column Shapes Sub-scores reveal a similar pattern: attributes like protocol (97.2%) and icmp code/type (99.9%) achieve high similarity scores, whereas numerical features such as total length of fwd packet (45.7%), total length of bwd packet (38.1%), and flow bytes/s (45.1%) score notably lower. This indicates that the Gaussian Copula model particularly struggles with capturing highly variable flow attributes. The down/up ratio (75.3%) also ranks among the weaker-performing features, further confirming the model’s limitations in representing skewed distributions. The detection metrics highlight the distinguishability of Gaussian Copula-generated samples from real data. The Logistic Detection score of 78.8% and the SVC Detection score of 87.7% confirm that machine learning classifiers are capable of reliably identifying synthetic data, suggesting a lack of indistinguishability. The TSTR evaluation using a Random Forest Regressor results in a TSTR score of (0.012, −0.038), indicating a very limited predictive utility when training on synthetic data and testing on real data. Furthermore, at the sample level, the Random Forest classifier reaches a classification accuracy of 99.9%, reinforcing that Gaussian Copula-generated data records remain highly distinguishable in supervised learning tasks. These findings collectively suggest that while the Gaussian Copula model generates structurally valid data, its statistical fidelity and practical utility for machine learning applications are constrained. Table 7 summarizes the evaluation results of E1 using the Gaussian Copula model.
Table 7. Evaluation results for Gaussian Copula model (E1).
The visual assessment of the synthetic data generated by the Gaussian Copula model further supports the findings of the statistical metrics summarized above. The pairwise feature distribution plots (pairplots) revealed noticeable deviations between the real and synthetic datasets. Specifically, while some attributes, such as protocol and icmp type displayed overlapping distributions, continuous numerical features, especially those related to packet sizes, flow durations, and rates, showed clear shifts in their distributions. This result suggests that although the Gaussian Copula could reproduce marginal distributions of simple numerical (and categorical too) features, it struggled with capturing the complex joint behavior of numerical attributes inherent in the malicious network traffic patterns. Figure 1 presents the results for the first six data features, whereas the remaining results for E1 are provided in Appendix A (Figure A1 and Figure A2). In Figure 1 and subsequent figures, colored dots represent different datasets. Blue dots denote real data samples, while each additional color corresponds to synthetic data generated by a different model (i.e., CTGAN, TVAE, CopulaGAN and/or Gaussian Copula).
Figure 1. Pairwise feature distribution plots (pairplots) for E1 (dataset features 1–6).
Additionally, the dimensionality reduction visualizations using UMAP projections (Figure 2) provided further insights into the structural differences between the datasets. The synthetic data points formed clusters that were partially overlapping with, but largely distinct from, those of the real data. Unlike models capable of learning higher-order dependencies, the Gaussian Copula-generated samples proved a tendency to cluster more tightly, reflecting the model’s statistical limitations in reproducing the true variance and diversity of the original data. These visual patterns align with the detection and utility metrics, confirming that while the Gaussian Copula maintains certain structural characteristics, it does not fully capture the complexity of real malicious network traffic.
Figure 2. Dimensionality reduction visualizations (using UMAP) for E1.

4.3. CTGAN Evaluation and Results

4.3.1. CTGAN—Experiment 2 (E2)

The performance of the CTGAN model in Experiment 2 (E2) has been evaluated through statistical fidelity and diagnostic metrics. The SDV Diagnostic Tests report perfect scores for Data Validity (100%) and Data Structure (100%), resulting in an overall diagnostic score of 100%, which confirms that the synthetic data complies with the schema constraints and contains no structural inconsistencies. However, the SDV data quality assessment yields a Column Shapes Score of 80.9% and a Column Pair Trends Score of 93.2%, with an Overall Data Quality Score of 87.1%. These results indicate that while CTGAN captures most of the dataset’s statistical characteristics, noticeable deviations persist in certain features. The overall KS Test Score of 81.1% reflects a moderate level of statistical similarity between real and synthetic data, slightly lower than that achieved by the baseline model. The results of the Kolmogorov–Smirnov Two-Sample Test provide further insights into CTGAN’s ability to replicate feature distributions. Attributes such as protocol, icmp code, and icmp type show no significant difference (p-value = 1.0) between real and synthetic data, confirming CTGAN’s effectiveness in modeling categorical variables. In contrast, other features record significant differences (p-value = 0.0000), highlighting persistent challenges in capturing the distributions of numerical attributes. The Column Shapes Sub-scores confirm this observation: attributes like total bwd packets (94.6%) and total fwd packets (94.1%) score high, while features such as flow bytes/s (49.6%) and total tcp flow time (65.6%) achieve lower scores. These findings indicate that CTGAN struggles particularly with high-variance flow-related features. The detection metrics further underline the distinguishability of synthetic data. The Logistic Detection score of 78.8% and the SVC Detection score of 87.7% state that classifiers can reliably differentiate between real and synthetic samples. In terms of predictive utility, the TSTR evaluation using a Random Forest Regressor results in a TSTR score of (−0.035, −0.002), reflecting a limited ability of models trained on synthetic data to generalize to real data. Additionally, the sample-level classification accuracy of 99.9% highlights that CTGAN-based generated samples remain easily identifiable in supervised learning tasks. Overall, these results confirm that while CTGAN improves the modeling of certain data aspects compared to Gaussian Copula model, it still faces notable challenges in producing highly realistic synthetic network traffic. Table 8 presents the evaluation results of E2 using the CTGAN model.
Table 8. Evaluation Results for CTGAN Model (E2).
The visual inspection of the CTGAN-generated dataset supports the findings derived from statistical metrics. The pairwise feature distribution plots in Figure 3, Figure A3 and Figure A4 in Appendix A, demonstrate mixed behavior: attributes maintain overlapping distributions between real and synthetic data, whereas continuous numerical features exhibit notable shifts and distinct clustering patterns. These deviations are especially evident in flow-related features, where CTGAN fails to reproduce the natural variance and dispersion of the real dataset.
Figure 3. Pairwise Feature Distribution Plots (Pairplots) for E2 (dataset features 1–6).
Furthermore, the UMAP projection-depicted in Figure 4-reveals partially overlapping clusters, with several synthetic clusters appearing distinct from those of the real data. This outcome highlights CTGAN’s limited capacity to fully replicate the high-dimensional structure of the original dataset, despite employing a more advanced generative approach compared to Gaussian Copula. These visual results align with the statistical analysis and detection metrics, underscoring that while CTGAN enhances certain aspects of synthetic data generation, it remains constrained in reproducing the full complexity of malicious network traffic patterns.
Figure 4. Dimensionality Reduction Visualizations (using UMAP) for E2.

4.3.2. CTGAN—Experiment 3 (E3)

The performance evaluation of the CTGAN in Experiment 3 (E3) started with the SDV Diagnostic Tests which reports perfect scores for Data Validity (100%) and Data Structure (100%), extracting an overall diagnostic score of 100%. This outcome confirms that the generated synthetic data conforms to the expected schema and contains no structural violations. However, the SDV data quality assessment presents a Column Shapes Score of 79.7% and a Column Pair Trends Score of 92.9%, with an Overall Data Quality Score of 86.3%. These results indicate that the selected CTGAN model in E3 successfully captures several key statistical properties of the dataset but also exhibits notable deviations in specific attributes. The overall KS Test Score of 79.7% points to a moderate degree of statistical similarity between the real and synthetic data, comparable to but slightly below the performance of CTGAN model in E2. The Kolmogorov–Smirnov Two-Sample Test results offer further insight into the model’s ability to mimic the distributions of individual features. Discrete variables such as protocol, icmp code, and icmp type show no significant difference (p-value = 1.0), validating the CTGAN’s ability to model categorical data accurately. In contrast, all continuous features report significant differences (p-value = 0.0000), underscoring persistent modeling challenges with numerical attributes. The Column Shapes Sub-scores reinforce this observation: attributes like total bwd packets (95%) and total fwd packet (93.8%) register high scores, whereas features such as flow bytes/s (46.3%) and total tcp flow time (64.9%) present lower similarity. These results highlight that while this model effectively captures some distributional characteristics, it continues to struggle with attributes characterized by high variance or skewed distributions. The detection metrics reflect the continued distinguishability of synthetic samples generated by CTGAN in E3. The Logistic Detection score of 80.6% and the SVC Detection score of 75.3% indicate that machine learning classifiers can still reliably discern synthetic from real data, though with a slightly lower detection capability compared to the baseline Gaussian Copula model and CTGAN in E2. The TSTR evaluation using a Random Forest Regressor results in a TSTR score of (0.072, −0.003), pointing to limited predictive utility of the synthetic data for downstream tasks. Furthermore, the sample-level classification accuracy of 100% confirms that the generated records remain easily distinguishable when evaluated in supervised learning scenarios. These outcomes imply that despite certain improvements in feature modeling, the CTGAN model suggested in E3 shares the same limitations in data realism and utility observed with the other tested synthesizers. Table 9 summarizes the evaluation results of E3 using the CTGAN model.
Table 9. Evaluation results for CTGAN model (E3).
The visual assessment of the synthetic data produced by the CTGAN model in E3 illustrates the statistical findings discussed above. The pairwise feature distribution plots in Figure 5, Figure A5 and Figure A6 in Appendix A reveal observable discrepancies between the real and synthetic datasets. While categorical variables such as protocol and icmp type maintain overlapping distributions, continuous numerical features, particularly those associated with packet sizes, flow rates, and durations, demonstrate clear divergences. This behavior suggests that although this captures marginal distributions for both the numerical features, it struggles to accurately reflect the joint behavior of numerical attributes inherent in malicious network traffic.
Figure 5. Pairwise feature distribution plots (pairplots) for E3 (dataset features 1–6).
The UMAP projection in Figure 6 further emphasizes these differences in the structural distribution of samples. The synthetic data forms clusters that partially overlap with those of the real dataset but also display distinct separation, highlighting the model’s limited ability to replicate the complex, high-dimensional relationships present in the real data. This visual evidence is in line with the detection and utility metrics, confirming that despite its modeling advancements, the CTGAN model does not fully emulate the statistical and structural nuances of real malicious network traffic.
Figure 6. Dimensionality reduction visualizations (using UMAP) for E3.

4.4. TVAE Evaluation and Results

4.4.1. TVAE—Experiment 4 (E4)

During Experiment 4 (E4) for the TVAE model, the SDV Diagnostic Tests again report perfect scores for Data Validity (100%) and Data Structure (100%), resulting in an Overall Diagnostic Score of 100%. These scores confirm that the generated synthetic data complies with the required schema and contains no structural errors. The SDV data quality assessment yields a Column Shapes Score of 82.6% and a Column Pair Trends Score of 92.5%, culminating in an Overall Data Quality Score of 87.6%. These values suggest that the TVAE model successfully preserves the dataset’s core statistical properties, showing a marginal improvement compared to its previous configuration (E3). The KS Test Score of 82.6% further reflects this trend, indicating a slightly better statistical resemblance to the real data. The results of the Kolmogorov–Smirnov Two-Sample Test highlight the TVAE’s strengths and weaknesses in feature modeling. As seen in previous experiments, discrete variables such as protocol, icmp code, and icmp type do not exhibit significant statistical differences (p-value = 1.0), confirming the model’s reliability in capturing specific distributions. On the other hand, all continuous features once again register significant differences (p-value = 0.0000), underscoring the persisting challenge in emulating other types of numerical distributions. The Column Shapes Sub-scores illustrate this inequality: features like total bwd packets (92.8%) and total fwd packets (90.9%) achieve high similarity scores, while attributes such as total tcp flow time (66.7%) and flow duration (73.4%) score lower ones. Despite the improved modeling of several attributes, the TVAE model still struggles with variables marked by high variance or complex distributions. The detection results offer additional insights into the synthetic data’s distinguishability. The Logistic Detection Score of 75.5% and the SVC Detection Score of 83.8% suggest that classifiers continue to distinguish between real and synthetic samples with considerable accuracy, while detection scores are slightly reduced compared to earlier experiments. Notably, the TSTR evaluation with Random Forest Regressor records a TSTR score of (0.918, 0.564), a significant increase over previous experiments, suggesting a noteworthy improvement in the predictive utility of the generated samples. This score implies that models trained on the synthetic data are better able to generalize patterns present within the real data. At the sample level, the Random Forest Classifier Accuracy of 99.9% confirms that individual synthetic data records remain highly distinguishable, reconfirming the challenges in generating samples that are truly indistinguishable from real datasets. Table 10 depicts the evaluation results of E4 using the TVAE model.
Table 10. Evaluation results for TVAE model (E4).
The visual analysis of the synthetic data depicted within Figure 7, Figure A7 and Figure A8 (Appendix A) produced by the TVAE model in E4 complements the extracted statistical findings. The pairwise distribution plots reveal a mixed pattern: features such as protocol and icmp type show significant overlap between real and synthetic data, while continuous attributes, especially those related to flow statistics and packet measurements, demonstrate noticeable differences. Although the distributions appear slightly closer than in previous experiments, distinct clustering and scaling inconsistencies remain evident.
Figure 7. Pairwise feature distribution plots (pairplots) for E4 (dataset features 1–6).
The UMAP projection within the Figure 8 visualization offers a deeper look into the underlying data structure. Synthetic samples cluster more tightly compared to the more dispersed pattern of real data, while the degree of overlap has slightly increased relative to prior experiments. This partially improved structural similarity hints at some learning of higher-order feature relationships, although significant gaps still exist. These visual outcomes, in combination with the detection and statistical evaluation metrics, confirm that while TVAE in E4 exhibits better performance in certain aspects, it still faces limitations in replicating the full complexity and variability of real malicious network traffic patterns.
Figure 8. Dimensionality reduction visualizations (using UMAP) for E4.

4.4.2. TVAE—Experiment 5 (E5)

For Experiment 5 (E5) of the TVAE model, the SDV Diagnostic Tests result in perfect scores for both Data Validity (100%) and Data Structure (100%), leading to an Overall Diagnostic Score of 100%. These results confirm that synthetic data completely encounters the structural constraints and schema of the original dataset. The SDV Data Quality evaluation further reports a Column Shapes Score of 85.0% and a Column Pair Trends Score of 92.34%, achieving an Overall Data Quality Score of 89.22%. These scores state that this TVAE model manages to reproduce the underlying statistical patterns of real data with a high degree of fidelity, surpassing several of the previously tested models. The KS Test Score of 84.9% also reinforces this observation, reflecting a strong resemblance in the univariate distributions between the real and synthetic datasets. The Kolmogorov–Smirnov Two-Sample Test provides more granular insights: features such as protocol, icmp code, and icmp type show no significant difference (p-value = 1.0), declaring TVAE’s reliable modeling of these kinds of variables. However, other types of continuous attributes again show significant differences (p-value = 0.0000), highlighting the ongoing challenge of capturing the full complexity of numerical distributions in synthetic data generation. The Column Shapes Sub-scores reflect this pattern, and attributes like total bwd packets (95.7%) and total fwd packets (94.1%) perform strongly, while features with higher variability, such as total tcp flow time (68.9%) and down/up ratio (72.4%), achieve ordinary, lower scores. The detection metrics reveal that TVAE-generated samples are still distinguishable from real data. The Logistic Detection score is 70.8%, and the SVC Detection score reaches 84.1%, indicating that despite the improved data quality metrics, detectable differences remain. Concerning predictive utility, the TSTR evaluation using a Random Forest Regressor yields a positive TSTR score of (0.918, 0.564), representing one of the highest utility scores in the set of experiments. This result confirms that models trained on synthetic data generated by TVAE can generalize more effectively to real data patterns compared to other tested models. The sample-level classification using Random Forest Classifier confirms the high distinguishability, as reflected by a perfect classification accuracy of 99.9%. Table 11 summarizes the evaluation results of E5 with the TVAE model.
Table 11. Evaluation results for TVAE model (E5).
Figure 9, Figure A9 and Figure A10, in Appendix A depict the pairwise feature distribution plots (pairplots). They reveal that, while the model successfully captures certain categorical attributes such as protocol and icmp type, discrepancies remain obvious in most continuous numerical features. Notably, attributes related to flow dynamics, including flow duration, flow bytes/s, and total tcp flow time, show evident distributional shifts between real and synthetic data. The overlap in marginal distributions of some simpler features suggests that TVAE during E5 partially succeeds in reproducing single-feature behavior and appears less effective in modeling joint feature interactions, especially those associated with complex network traffic patterns.
Figure 9. Pairwise feature distribution plots (pairplots) for E5 (dataset features 1–6).
As visualized in the corresponding Figure 10 for UMAP projection during E5, synthetic samples form clusters that overlap with those of the real data and still maintain visible separations. This clustering behavior indicates that while TVAE captures certain latent structures and patterns within the data, it struggles to replicate the full diversity and higher-order relationships of the original dataset. The denser clustering of synthetic data points compared to real samples further suggests a tendency toward reduced variance in the generated data.
Figure 10. Dimensionality reduction visualizations (using UMAP) for E5.

4.5. CopulaGAN Evaluation and Results

4.5.1. CopulaGAN—Experiment 6 (E6)

For the CopulaGAN model evaluation during Experiment 6 (E6), the built-in SDV Diagnostic Tests provide perfect scores for both Data Validity (100%) and Data Structure (100%), resulting in an overall diagnostic score of 100%. These results confirm that the synthetic dataset adheres to schema constraints and maintains the correct structural integrity without violations. Additionally, the SDV Data Quality metrics report a Column Shapes Score of 89.3% and a Column Pair Trends Score of 95.1%, achieving a high Overall Data Quality Score of 92.2%. These values indicate that the CopulaGAN model effectively captures the majority of statistical patterns and inter-feature dependencies from the original dataset, while minor deviations are still present. The KS Complement metric at 89.3% further supports this conclusion, reflecting a generally strong alignment of feature distributions. The Kolmogorov–Smirnov Two-Sample Test results highlight that features like protocol, icmp code, and icmp type result in non-significant differences (p-value = 1.0000), demonstrating CopulaGAN’s reliable handling of these types of variables. However, other attribute types, including flow duration, total tcp flow time, and packet-based metrics, show significant differences (p-value = 0.0000). This confirms persistent challenges in generating numerical distributions with high variance. The detailed Column Shapes Sub-scores reflect this behavior: while discrete or simpler features such as protocol (99.9%) and icmp code/type (99.9%) are closely matched, more complex continuous features like down/up ratio (62.8%) remain harder to model accurately. Regarding the detection and utility metrics, the model exhibits a Logistic Detection score of 78.8% and an SVC Detection score of 87.7%, indicating that synthetic samples remain distinguishable from real data with high classification accuracy. On the utility side, the TSTR score of (0.178, −0.074) suggests a marginal predictive capacity when models are trained on synthetic data and tested on real samples. These detection and utility scores align with the statistical assessment, reinforcing the view that despite strong maintenance of structural aspects, CopulaGAN does not fully replicate the complexity required for seamless generalization within downstream tasks. Table 12 summarizes the evaluation results for CopulaGAN in E6.
Table 12. Evaluation results for CopulaGAN model (E6).
The pairwise feature distribution plots, in Figure 11, Figure A11 and Figure A12 (Appendix A), reveal a mixed picture. For features, such as protocol and icmp type, the synthetic samples largely overlap with real data, confirming CopulaGAN’s ability to replicate these types of distributions. However, when examining other numerical attributes, particularly flow duration, flow packets/s, total length of fwd/bwd packet, and total tcp flow time, distinct deviations emerge. The synthetic data distributions often appear more compact or shifted compared to the real ones, highlighting the model’s limitations in capturing the full variability and joint behavior of flow-related metrics.
Figure 11. Pairwise feature distribution plots (pairplots) for E6 (dataset features 1–6).
The UMAP visualization in Figure 12 further underscores these observations. While the synthetic samples form clusters that broadly overlap with those of the real dataset, distinct grouping patterns and denser synthetic clusters are evident. This outcome suggests that although CopulaGAN approximates certain structural properties of the data, it tends to generate less diverse samples, potentially due to overfitting or constrained learning of the data’s underlying complexity. Overall, the visual patterns align with both the detection metrics and statistical tests, ensuring CopulaGAN’s strong but imperfect capability to reproduce the intricate distributions present in real malicious network traffic data.
Figure 12. Dimensionality reduction visualizations (using UMAP) for E6.

4.5.2. CopulaGAN—Experiment 7 (E7)

The built-in SDV Diagnostic Tests, during CopulaGAN’s E7, return perfect scores for both Data Validity (100%) and Data Structure (100%), resulting in a flawless Overall Diagnostic Score of 100%. These outcomes confirm that the synthetic data strictly adjusts to the schema rules and structural constraints of the original dataset. The data quality assessment via SDV further highlights CopulaGAN’s robust performance, achieving a Column Shapes Score of 90.3% and a Column Pair Trends Score of 95.8%, culminating in a high Overall Data Quality Score of 93.1%. These results indicate that the model effectively captures both univariate distributions and further inter-dependencies with a level of accuracy surpassing previous experiments. The Kolmogorov–Smirnov (KS) Test produces a strong similarity score of 90.1%, reflecting CopulaGAN’s proficiency in reproducing the statistical characteristics of the real dataset. The detailed Two-Sample KS Test analysis confirms the trend below: variables such as protocol, icmp code, and icmp type display no significant differences (p-value = 1.0), while other continuous attributes have statistically significant differences (p-value = 0.0000). The Column Shapes Sub-scores align with these findings, and features like total bwd packets (96.1%), total length of fwd packet (94.4%), and flow bytes/s (92.6%) demonstrate significant alignment between real and synthetic data. However, certain attributes, particularly down/up ratio (72.1%) and dst port (76.7%), return comparatively lower scores, indicating persistent challenges in capturing distributions for highly variable or skewed features. In terms of detection and utility metrics, the model achieves a Logistic Detection score of 81.3% and an SVC Detection score of 85.8%, stating that real and synthetic samples are still distinguishable with relatively high confidence. Nonetheless, the TSTR evaluation extracts a moderate score of (0.075, −0.045), indicating some capacity of synthetic data to support downstream learning tasks. At the sample level, the Random Forest Classifier records a near-perfect classification accuracy of 99.9%, underscoring the strong label consistency within the generated samples. Table 13 summarizes the evaluation outcomes of CopulaGAN during E7.
Table 13. Evaluation results for CopulaGAN model (E7).
The visual assessment of the synthetic data generated in Experiment 7 (E7) using the CopulaGAN model supports the outcomes observed in statistical metrics. The pairwise feature distributions reveal that, compared to the previous experiments, CopulaGAN consistently improves the alignment of synthetic and real data distributions. While features such as protocol, icmp code, and icmp type continue to overlap nearly perfectly, continuous numerical features like total fwd packet, total bwd packets, and flow duration demonstrate a notably improved correspondence in both density and joint scatter patterns. However, noticeable deviations still persist in attributes involving dynamic network behaviors, such as flow packets/s, flow bytes/s, and down/up ratio, which reflect CopulaGAN’s inherent limitations in capturing high-variance numerical features. Figure 13 illustrates the comparison of distributions for the first six features, with the remaining visualizations presented in Appendix A (Figure A13 and Figure A14).
Figure 13. Pairwise feature distribution plots (pairplots) for E7 (dataset features 1–6).
Additionally, the UMAP-based dimensionality reduction shown in Figure 14 highlights a tighter overlap between synthetic and real data clusters compared to earlier experiments. Although synthetic points still tend to form denser sub-clusters, they associate more naturally with real data points, indicating a more faithful reproduction of global structural patterns. These visual findings align with the improved detection and statistical similarity scores obtained in E7, further confirming that this CopulaGAN model manages to enhance both marginal and joint feature distributions, while still facing challenges in emulating the full complexity of real malicious network traffic data.
Figure 14. Dimensionality reduction visualizations (using UMAP) for E7.

4.6. Cross-Model Comparative Analysis

Across all experiments, the Gaussian Copula consistently demonstrated high structural validity (100%) but comparatively lower distributional fidelity scores. This model is effective at preserving general feature ranges and schema consistency. However, its reliance on linear and Gaussian dependency assumptions limits its ability to capture the complex and non-linear relationships inherent in malicious network traffic.
CTGAN reveals improved capability in modeling non-linear feature interactions, as reflected in higher Column Pair Trend scores and partially overlapping cluster structures within UMAP visualizations. However, CTGAN shows evidence of partial mode collapse, particularly in flow-based continuous features, where reduced variance and cluster fragmentation suggest underrepresentation of behavioral diversity in the generated attack samples.
Across the different experiments, we consistently observed that certain continuous features with high variance, such as ‘flow byte/s’, ‘down/up ratio’, ‘packet lengths’, and ‘flow durations’, demonstrate lower similarity scores across all generative models. These attributes possess inherent statistical characteristics, including long-tail and heavily skewed distributions, which make them challenging for GAN- and VAE-based tabular generators to reproduce accurately. Several of these features also demonstrate multimodal patterns resulting from diverse traffic behaviors or attack subtypes, often leading generative models to smooth or collapse minor modes during training. Their values cover a wide range of magnitudes, which complicates the learning of stable latent representations and may introduce distortions during normalization. These factors collectively explain the persistent modeling difficulties observed across CTGAN, TVAE, and CopulaGAN, particularly in attributes that reflect dynamic or extreme value network behaviors.
Regarding category-level fidelity, the evaluated synthesizers generally preserved the relationships between attack categories, producing synthetic samples that follow similar proportional distributions and conditional feature patterns. However, we observed that categories with limited representation in the original dataset, such as web attacks, show reduced generation quality, reflected in lower similarity scores and higher variability across models. This effect is primarily driven by class imbalance, which makes it difficult for generative models to learn distinctive patterns for underrepresented attack types. As a result, minor attack categories naturally exhibit greater variability in their synthetic counterparts due to the limited number of real samples available for learning.
The distinct performance differences observed across the four synthesizers can be partly attributed to their underlying architectural mechanisms. GAN-based models such as CTGAN and CopulaGAN are vulnerable to mode collapse, which can lead to the underrepresentation of infrequent or structurally complex patterns, especially in high-variance features. VAEs, in contrast, impose a latent space regularization that encourages smooth and continuous manifolds, often producing overly smoothed distributions and reduced similarity in capturing sharp extrema or multimodal behavior. Copula-based synthesizers are well-suited for modeling marginal distributions through their decomposed dependence structure, yet they can struggle to capture highly non-linear or hierarchical feature interactions that are common in malicious traffic. These architectural characteristics help explain the observed distributional similarities and the differing strengths and limitations of each model family in our experimental results.
TVAE demonstrates a more balanced performance profile, delivering a more consistent statistical similarity across both univariate and multivariate measures and generating smoother and less distorted feature distributions. This suggests that the latent space regularization inherent in VAE architectures helps maintain distributional stability, although this comes at the cost of slightly reduced fine-grained structural accuracy in very high-variance features.
CopulaGAN combines copula-based correlation modeling with adversarial refinement, leading to performance in-between CTGAN and TVAE. It improves structural accuracy compared to the Gaussian Copula and reduces mode collapse effects relative to CTGAN. However, slight instability in adversarial training occasionally leads to heterogeneous cluster groups illustrated in UMAP projections.
Finally, it is important to consider that the CICIDS2017 dataset includes partially synthetic background traffic and exhibits limited diversity across certain benign and attack behaviors, which may influence how well-observed generative patterns generalize to broader intrusion detection environments.

4.7. Utility and TSTR Analysis

The Train-on-Synthetic, Test-on-Real (TSTR) framework provides insight into whether synthetic data captures not just statistical properties but actionable behavioral patterns. Across all models, we have the following:
  • Gaussian Copula achieved the lowest TSTR performance, confirming its limited ability to encode meaningful behavioral representations.
  • CTGAN improved TSTR scores but suffered when mode collapse reduced class-level diversity.
  • TVAE delivered the most stable and consistently positive TSTR results, indicating better retention of meaningful behavioral dynamics in the latent embedding space.
  • CopulaGAN provided competitive TSTR performance, though with slightly higher variance between experiments.
Therefore, the VAE-based latent representation of syntactic embeddings can be considered as more robust and smoother for downstream ML training than purely adversarial generative improvement.

4.8. Summary

Overall, the comparative analysis results indicate that while all four models are capable of producing structurally valid synthetic malicious network traffic, their performance varies meaningfully across data accuracy, diversity, and utility dimensions. Gaussian Copula offers strong structural correctness but limited behavioral realism. On the other hand, CTGAN captures complex dependencies, but risks mode collapse, and TVAE achieves the most consistent overall balance, while CopulaGAN provides an effective hybrid compromise. These findings highlight the importance of selecting generative models not purely on accuracy metrics alone but also based on the quality dimensions most relevant to the intended cybersecurity application.

5. Conclusions and Future Work

The present study evaluated the quality of synthetic malicious network traffic generated using four different models: the Gaussian Copula Synthesizer, CTGAN, TVAE, and CopulaGAN. The evaluation methodology followed a data quality assessment (DQA) approach, incorporating statistical similarity measures, structural dependency analysis, and utility assessment via the Train-on-Synthetic, Test-on-Real (TSTR) framework.
The comparative analysis demonstrated that while all models produced structurally valid and consistent data with regard to the synthetic network traffic, their ability to keep meaningful statistical and behavioral characteristics varied significantly. Gaussian Copula maintained high structural correctness but showed limited ability to reproduce complex feature relationships. CTGAN captured non-linear dependencies more effectively, though signs of mode collapse reduced diversity in some feature subspaces. TVAE achieved the most stable and balanced performance overall, preserving both distributional patterns and behavioral representativeness. Finally, CopulaGAN provided a hybrid compromise between correlation modeling and adversarial refinement, performing competitively but with higher result variability.
These findings highlight the importance of evaluating synthetic malicious data not only based on downstream detection accuracy but through a multi-dimensional quality assessment that considers accuracy, diversity, structural integrity, and utility. Such evaluation ensures that synthetic datasets used for intrusion detection research are not only statistically similar to real data but also illustrate the behavioral characteristics necessary for effective model training and robust operational performance. Given the known constraints of CICIDS2017, the generalizability of the reported results may vary across datasets with different traffic characteristics, and extending this evaluation to more diverse IDS benchmarks remains a valuable direction for future research.
Future work could explore three directions: (i) inclusion of additional real-world network traffic sources to validate generalizability across domains; (ii) integration of temporal and sequential features to improve behavioral realism in generated flows; and (iii) incorporation of privacy risk assessments to ensure synthetic data does not unintentionally reveal sensitive characteristics of the original datasets. These extensions aim to further strengthen the safe and effective use of synthetic malicious network traffic in cybersecurity research and machine learning-based intrusion detection systems. In addition, future extensions of this study could incorporate more recent generative architectures, such as diffusion models and normalizing flow–based approaches, once their implementations for tabular malicious traffic become more mature and practically reproducible.

Author Contributions

Conceptualization, N.P., T.A., E.D., and E.A.; methodology, N.P., T.A., E.D., and E.A.; software, N.P.; validation, N.P., E.A., and E.D.; formal analysis, N.P., T.A., E.D., and E.A.; investigation, N.P., T.A., and E.D.; resources, N.P., E.A., E.D., and T.A.; data curation, N.P. and T.A.; writing—original draft preparation, N.P., T.A., and E.D.; writing—review and editing, E.A., E.D., and T.A.; visualization, N.P. and T.A.; supervision, E.A.; project administration, E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADASYNAdaptive Synthetic Sampling Approach
AIArtificial intelligence
CIDFCommon Intrusion Detection Framework
CNNConvolutional Neural Network
CTGANConditional Tabular Generative Adversarial Network
DDoSDistributed Denial of Service
DNNDeep Neural Network
DoSDenial of Service
DQAData quality assessment
DPData Preprocessing
DTDecision Tree
ELBOEvidence Lower Bound
EMExpectation Maximization
FTPFile Transfer Protocol
GANGenerative Adversarial Network
GOAGazelle Optimization Algorithm
HBAHoney Badger Algorithm
ICMPInternet Control Message Protocol
IDSIntrusion Detection System
IGANImbalanced Generative Adversarial Network
IoTInternet of Things
IPInternet Protocol
LRLogistic Regression
LSTMLong Short-Term Memory
MitMMan-in-the-Middle
MLMachine Learning
MLPMulti-Layer Perceptron
MSCNNMulti-Scale Convolutional Neural Network
NaNNon-Available
NIDSNetwork intrusion detection system
PCAPPacket Capture
RFRandom Forest
RNNRecurrent Neural Network
ROSRandom Over-Sampling
SAPVAGANSelf-Attention-based Provisional Variational Auto-encoder Generative Adversarial Network
SMOTESynthetic Minority Oversampling Technique
SSHSecure Shell
SVMSupport Vector Machine
TCPTransfer Control Protocol
TMG-GANTabular Multi-Generator Generative Adversarial Network
TVAETabular Variational Autoencoder
VAEVariational Autoencoder
VAWGANVariational Autoencoder Wasserstein Generative Adversarial Network
WOGRUWhale Optimized Gate Recurrent Unit
WSNWireless Sensor Network

Appendix A

Appendix A.1. Experiment 1 (E1)

Figure A1. Pairwise feature distribution plots (pairplots) for E1 (dataset features 7–12).
Figure A2. Pairwise feature distribution plots (pairplots) for E1 (dataset features 13–18).

Appendix A.2. Experiment 2 (E2)

Figure A3. Pairwise feature distribution plots (pairplots) for E2 (dataset features 7–12).
Figure A4. Pairwise feature distribution plots (pairplots) for E2 (dataset features 13–18).

Appendix A.3. Experiment 3 (E3)

Figure A5. Pairwise feature distribution plots (pairplots) for E3 (dataset features 7–12).
Figure A6. Pairwise feature distribution plots (pairplots) for E3 (dataset features 13–18).

Appendix A.4. Experiment 4 (E4)

Figure A7. Pairwise feature distribution plots (pairplots) for E4 (dataset features 7–12).
Figure A8. Pairwise feature distribution plots (pairplots) for E4 (dataset features 13–18).

Appendix A.5. Experiment 5 (E5)

Figure A9. Pairwise feature distribution plots (pairplots) for E5 (dataset features 7–12).
Figure A10. Pairwise feature distribution plots (pairplots) for E5 (dataset features 13–18).

Appendix A.6. Experiment 6 (E6)

Figure A11. Pairwise feature distribution plots (pairplots) for E6 (dataset features 7–12).
Figure A12. Pairwise feature distribution plots (pairplots) for E6 (dataset features 13–18).

Appendix A.7. Experiment 7 (E7)

Figure A13. Pairwise feature distribution plots (pairplots) for E7 (dataset features 7–12).
Figure A14. Pairwise feature distribution plots (pairplots) for E7 (dataset features 13–18).

References

  1. Park, C.; Lee, J.; Kim, Y.; Park, J.-G.; Kim, H.; Hong, D. An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks. IEEE Internet Things J. 2023, 10, 2330–2345. [Google Scholar] [CrossRef]
  2. Hussain, B.; Du, Q.; Sun, B.; Han, Z. Deep Learning-Based DDoS-Attack Detection for Cyber–Physical System over 5G Network. IEEE Trans. Ind. Inform. 2021, 17, 860–870. [Google Scholar] [CrossRef]
  3. Kampourakis, V.; Gkioulos, V.; Katsikas, S. A Systematic Literature Review on Wireless Security Testbeds in the Cyber-Physical Realm. Comput. Secur. 2023, 133, 103383. [Google Scholar] [CrossRef]
  4. Piqueira, J.R.C.; Cabrera, M.A.M.; Batistela, C.M. Malware Propagation in Clustered Computer Networks. Phys. A Stat. Mech. Its Appl. 2021, 573, 125958. [Google Scholar] [CrossRef]
  5. Gelgi, M.; Guan, Y.; Arunachala, S.; Samba Siva Rao, M.; Dragoni, N. Systematic Literature Review of IoT Botnet DDOS Attacks and Evaluation of Detection Techniques. Sensors 2024, 24, 3571. [Google Scholar] [CrossRef]
  6. Zhao, X.; Veerappan, C.S.; Loh, P.K.K.; Tang, Z.; Tan, F. Multi-Agent Cross-Platform Detection of Meltdown and Spectre Attacks. In Proceedings of the 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 18–21 November 2018; pp. 1834–1838. [Google Scholar]
  7. Fereidouni, H.; Fadeitcheva, O.; Zalai, M. IoT and Man-in-the-Middle Attacks. Secur. Priv. 2025, 8, e70016. [Google Scholar] [CrossRef]
  8. Statista Cybersecurity—Worldwide. Available online: https://www.statista.com/outlook/tmo/cybersecurity/worldwide#cost (accessed on 17 July 2025).
  9. ΙΒΜ. Cost of a Data Breach Report 2024; IBM Corporation: Armonk, NY, USA, 2024. [Google Scholar]
  10. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
  11. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2022, arXiv:1312.6114. [Google Scholar]
  12. Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends® Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
  13. Sharafaldin, I.; Habibi Lashkari, A.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the Proceedings of the 4th international conference on information systems security and privacy—ICISSP, Madeira, Portugal, 22–24 January 2018; SciTePress/INSTICC: Setúbal, Portugal, 2018; pp. 108–116. [Google Scholar]
  14. Zhao, X.; Fok, K.W.; Thing, V.L.L. Enhancing Network Intrusion Detection Performance Using Generative Adversarial Networks. Comput. Secur. 2024, 145, 104005. [Google Scholar] [CrossRef]
  15. Rao, Y.N.; Suresh Babu, K. An Imbalanced Generative Adversarial Network-Based Approach for Network Intrusion Detection in an Imbalanced Dataset. Sensors 2023, 23, 550. [Google Scholar] [CrossRef]
  16. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
  17. Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
  18. Ding, H.; Sun, Y.; Huang, N.; Shen, Z.; Cui, X. TMG-GAN: Generative Adversarial Networks-Based Imbalanced Learning for Network Intrusion Detection. IEEE Trans. Inf. Forensics Secur. 2024, 19, 1156–1167. [Google Scholar] [CrossRef]
  19. Sun, P.; Li, S.; Xie, J.; Xu, H.; Cheng, Z.; Yang, R. GPMT: Generating Practical Malicious Traffic Based on Adversarial Attacks with Little Prior Knowledge. Comput. Secur. 2023, 130, 103257. [Google Scholar] [CrossRef]
  20. García, S.; Grill, M.; Stiborek, J.; Zunino, A. An Empirical Comparison of Botnet Detection Methods. Comput. Secur. 2014, 45, 100–123. [Google Scholar] [CrossRef]
  21. Duy, P.T.; Tien, L.K.; Khoa, N.H.; Hien, D.T.T.; Nguyen, A.G.-T.; Pham, V.-H. DIGFuPAS: Deceive IDS with GAN and Function-Preserving on Adversarial Samples in SDN-Enabled Networks. Comput. Secur. 2021, 109, 102367. [Google Scholar] [CrossRef]
  22. Li, T.; Luo, Y.; Wan, X.; Li, Q.; Liu, Q.; Wang, R.; Jia, C.; Xiao, Y. A Malware Detection Model Based on Imbalanced Heterogeneous Graph Embeddings. Expert. Syst. Appl. 2024, 246, 123109. [Google Scholar] [CrossRef]
  23. Arp, D.; Spreitzenbarth, M.; Hübner, M.; Gascon, H.; Rieck, K. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket; NDSS: San Diego, CA, USA, 2014. [Google Scholar]
  24. Mercaldo, F.; Martinelli, F.; Santone, A. Deep Convolutional Generative Adversarial Networks in Image-Based Android Malware Detection. Computers 2024, 13, 154. [Google Scholar] [CrossRef]
  25. Yang, L.; Shami, A. Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection. In Proceedings of the Workshop on Autonomous Cybersecurity, Salt Lake City, UT, USA, 14–18 October 2024; ACM: New York, NY, USA, 2023; pp. 68–78. [Google Scholar]
  26. Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.-Y.; Kim, J.; Kim, J.; Ylianttila, M. 5G-NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G Wireless Network. arXiv 2022, arXiv:2212.01298. [Google Scholar] [CrossRef]
  27. Li, Z.; Huang, C.; Qiu, W. An Intrusion Detection Method Combining Variational Auto-Encoder and Generative Adversarial Networks. Comput. Netw. 2024, 253, 110724. [Google Scholar] [CrossRef]
  28. Meenakshi, B.; Karunkuzhali, D. Enhancing Cyber Security in WSN Using Optimized Self-Attention-Based Provisional Variational Auto-Encoder Generative Adversarial Network. Comput. Stand. Interfaces 2024, 88, 103802. [Google Scholar] [CrossRef]
  29. Jiang, S.; Zhao, J.; Xu, X. SLGBM: An Intrusion Detection Mechanism for Wireless Sensor Networks in Smart Environments. IEEE Access 2020, 8, 169548–169558. [Google Scholar] [CrossRef]
  30. Ravi, V.; Chaganti, R.; Alazab, M. Recurrent Deep Learning-Based Feature Fusion Ensemble Meta-Classifier Approach for Intelligent Network Intrusion Detection System. Comput. Electr. Eng. 2022, 102, 108156. [Google Scholar] [CrossRef]
  31. Ramana, K.; Revathi, A.; Gayathri, A.; Jhaveri, R.H.; Narayana, C.V.L.; Kumar, B.N. WOGRU-IDS—An Intelligent Intrusion Detection System for IoT Assisted Wireless Sensor Networks. Comput. Commun. 2022, 196, 195–206. [Google Scholar] [CrossRef]
  32. Zixu, T.; Liyanage, K.S.K.; Gurusamy, M. Generative Adversarial Network and Auto Encoder Based Anomaly Detection in Distributed IoT Networks. In Proceedings of the GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–7. [Google Scholar]
  33. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
  34. Senthilkumar, G.; Tamilarasi, K.; Periasamy, J.K. Cloud Intrusion Detection Framework Using Variational Auto Encoder Wasserstein Generative Adversarial Network Optimized with Archerfish Hunting Optimization Algorithm. Wirel. Netw. 2024, 30, 1383–1400. [Google Scholar] [CrossRef]
  35. Krishnaveni, S.; Sivamohan, S.; Sridhar, S.S.; Prabakaran, S. Efficient Feature Selection and Classification through Ensemble Method for Network Intrusion Detection on Cloud Computing. Clust. Comput. 2021, 24, 1761–1779. [Google Scholar] [CrossRef]
  36. Karuppusamy, L.; Ravi, J.; Dabbu, M.; Lakshmanan, S. Chronological Salp Swarm Algorithm Based Deep Belief Network for Intrusion Detection in Cloud Using Fuzzy Entropy. Int. J. Numer. Model. Electron. Netw. Devices Fields 2022, 35, e2948. [Google Scholar] [CrossRef]
  37. Lou, P.; Lu, G.; Jiang, X.; Xiao, Z.; Hu, J.; Yan, J. Cyber Intrusion Detection through Association Rule Mining on Multi-Source Logs. Appl. Intell. 2021, 51, 4043–4057. [Google Scholar] [CrossRef]
  38. Chalé, M.; Bastian, N.D. Generating Realistic Cyber Data for Training and Evaluating Machine Learning Classifiers for Network Intrusion Detection Systems. Expert. Syst. Appl. 2022, 207, 117936. [Google Scholar] [CrossRef]
  39. Ammara, D.A.; Ding, J.; Tutschku, K. Synthetic Network Traffic Data Generation: A Comparative Study. arXiv 2025, arXiv:2410.16326. [Google Scholar]
  40. Saka, S.; Al-Ataby, A.; Selis, V. Generating Synthetic Tabular Data for DDoS Detection Using Generative Models. In Proceedings of the 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, UK, 1–3 November 2023; pp. 1436–1442. [Google Scholar]
  41. Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–8. [Google Scholar]
  42. Kotal, A.; Luton, B.; Joshi, A. KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation. In Proceedings of the 2024 IEEE 44th International Conference on Distributed Computing Systems Workshops (ICDCSW), Jersey City, NJ, USA, 23 July 2024; pp. 140–145. [Google Scholar]
  43. Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular Data Using Conditional GAN. arXiv 2019, arXiv:1907.00503. [Google Scholar] [CrossRef]
  44. Kim, J.; Jeon, J.; Lee, J.; Hyeong, J.; Park, N. OCT-GAN: Neural ODE-Based Conditional Tabular Gans. arXiv 2021, arXiv:2105.14969. [Google Scholar]
  45. Yoon, J.; Jordon, J.; van der Schaar, M. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  46. Park, N.; Mohammadi, M.; Gorde, K.; Jajodia, S.; Park, H.; Kim, Y. Data Synthesis Based on Generative Adversarial Networks. Proc. VLDB Endow. 2018, 11, 1071–1083. [Google Scholar] [CrossRef]
  47. Guo, R.; Liu, H.; Liu, D. When Deep Learning-Based Soft Sensors Encounter Reliability Challenges: A Practical Knowledge-Guided Adversarial Attack and Its Defense. IEEE Trans. Ind. Inform. 2024, 20, 2702–2714. [Google Scholar] [CrossRef]
  48. Kamthe, S.; Assefa, S.; Deisenroth, M. Copula Flows for Synthetic Data Generation. arXiv 2021, arXiv:2101.00598. [Google Scholar] [CrossRef]
  49. Parise, O.; Kronenberger, R.; Parise, G.; de Asmundis, C.; Gelsomino, S.; La Meir, M. CTGAN-Driven Synthetic Data Generation: A Multidisciplinary, Expert-Guided Approach (TIMA). Comput. Methods Programs Biomed. 2025, 259, 108523. [Google Scholar] [CrossRef] [PubMed]
  50. Kiran, A.; Rubini, P.; Kumar, S.S. Challenges and Limitations of TVAE Tabular Synthetic Data Generator. In Proceedings of the Advanced Computing, Lisbon, Portugal, 28 September 2025–2 October 2025; Garg, D., Pendyala, V., Gupta, S.K., Najafzadeh, M., Eds.; Springer Nature: Cham, Swizerland, 2025; pp. 243–254. [Google Scholar]
  51. Miletic, M.; Sariyar, M. Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. Appl. Sci. 2024, 14, 5975. [Google Scholar] [CrossRef]
  52. Patki, N.; Wedge, R.; Veeramachaneni, K. The Synthetic Data Vault. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 399–410. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.