Dual-Stream Time-Series Transformer-Based Encrypted Traffic Data Augmentation Framework
Abstract
1. Introduction
- Dual-Stream Preprocessing PipelineWe introduce an input structure that simultaneously extracts and normalizes both fine-grained time-series packet information and flow-level statistical vectors, enabling efficient learning of complex patterns.
- Augmentation-Friendly Transformer ArchitectureWe design an encoder-only Transformer augmented with learnable positional embeddings and a FiLM+Context Token injection scheme, which effectively fuses temporal dynamics and statistical features to strengthen both reconstruction and distribution preservation.
- Constraint-Based Augmentation AlgorithmWe incorporate explicit statistical constraints into the augmentation process by combining latent-space perturbations with post-generation verification, thereby ensuring a balance of diversity and consistency; we qualitatively validate the efficacy of this constrained augmentation.
2. Related Works
2.1. Encrypted Traffic Classification
2.2. Data Augmentation
2.3. Time-Series Transformer
3. Materials and Methods
3.1. Data Preprocessing and Feature Extraction
3.1.1. Raw Traffic Flow Delimitation
3.1.2. Primary Feature Extraction
3.1.3. Additional Feature Extraction
3.1.4. Feature Cleaning and Outlier Handling
3.1.5. Temporal Alignment and Resampling
3.1.6. Normalization Strategy Pre-Check
3.1.7. Data Structuring
3.2. Sequence Normalization
3.2.1. Uniform Time-Axis Mapping and Bin-Wise Aggregation
3.2.2. Channel-Wise Scaling and Masking
3.2.3. Consistency Check and Packaging
3.3. Model Architecture
3.4. Training
3.5. Data Augmentation
3.5.1. Constraint-Based Transformation Strategy
3.5.2. Generation Pipeline and Quality Assurance
4. Results
4.1. Experimental Setup
4.2. Dataset and Preprocessing
4.3. Baseline and Evaluation Metrics
- Reconstruction RMSE (Root Mean Squared Error)Computed over the masked valid time steps in the test set. Lower RMSE indicates more accurate sequence reconstruction (Equation (4)).
- Feature-Reconstruction MSEThe mean squared error between the six additional features recomputed from the augmented (or reconstructed) sequences, , and the original normalized features . A lower value signifies better preservation of the original feature distributions.
- Classification Accuracy & F1-ScoreWe train a logistic regression classifier on the combination of real and augmented flows (both benign and attack) and report overall accuracy and the F1-Score on the test set.
4.4. Quantitative Results
4.5. Qualitative Analysis
4.6. Experimental Results
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Detailed Formulations
Appendix A.1. Multi-Head Self-Attention (MHSA)
Appendix A.2. Feed-Forward Network (FFN)
Appendix A.3. FiLM Conditioning
Appendix B. Loss Functions
Appendix B.1. Reconstruction Loss
Appendix B.2. Feature Regression Loss
Appendix B.3. Classification Loss
Appendix B.4. Total Loss
Appendix C. Training Details
Appendix C.1. Optimizer and Scheduler
Appendix C.2. Regularization
- -
- Mixed precision (bfloat16) with gradient scaling.
- -
- Gradient checkpointing for attention and FFN layers.
- -
- Dropout 0.1–0.2 across attention, FFN, and embeddings.
Appendix C.3. Distributed Training
References
- Langley, A.; Riddoch, A.; Wilk, A.; Vicente, A.; Krasic, C.; Zhang, D.; Yang, F.; Kouranov, F.; Swett, I.; Iyengar, J.; et al. The quic transport protocol: Design and internet-scale deployment. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, Los Angeles, CA, USA, 21–25 August 2017; pp. 183–196. [Google Scholar] [CrossRef]
- Papadogiannaki, E.; Ioannidis, S. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Armitage, G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 2009, 10, 56–76. [Google Scholar] [CrossRef]
- Moore, A.W.; Zuev, D. Internet traffic classification using bayesian analysis techniques. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Virtual Event, 14–18 June 2005; pp. 50–60. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Wang, P.; Li, S.; Ye, F.; Wang, Z.; Zhang, M. PacketCGAN: Exploratory study of class imbalance for encrypted traffic classification using CGAN. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar] [CrossRef]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; IEEE: New York, NY, USA, 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- Yoon, J.; Jarrett, D.; Van der Schaar, M. Time-series generative adversarial networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Yu, L.; Zhang, W.; Wang, J.; Yu, Y. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Donahue, C.; McAuley, J.; Puckette, M. Adversarial audio synthesis. arXiv 2018, arXiv:1802.04208. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [CrossRef]
- Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 1558–1566. [Google Scholar] [CrossRef]
- Sivaroopan, N.; Bandara, D.; Madarasingha, C.; Jourjon, G.; Jayasumana, A.; Thilakarathna, K. NetDiffus: Network traffic generation by diffusion models through time-series imaging. Comput. Netw. 2024, 251, 110616. [Google Scholar] [CrossRef]
- Ma, Y.; Li, Z.; Xue, H.; Chang, J. A balanced supervised contrastive learning-based method for encrypted network traffic classification. Comput. Secur. 2024, 145, 104023. [Google Scholar] [CrossRef]
- Wang, H.; Yan, J.; Jia, N. A New Encrypted Traffic Identification Model Based on VAE-LSTM-DRN. Comput. Mater. Contin. 2024, 78, 569–588. [Google Scholar] [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar] [CrossRef]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Wiegreffe, S.; Pinter, Y. Attention is not not explanation. arXiv 2019, arXiv:1908.04626. [Google Scholar]
- Jaszczur, S.; Chowdhery, A.; Mohiuddin, A.; Kaiser, L.; Gajewski, W.; Michalewski, H.; Kanerva, J. Sparse is enough in scaling transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 9895–9907. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Huang, H.; Lu, Y.; Zhou, S.; Zhang, X.; Li, Z. CoTNeT: Contextual transformer network for encrypted traffic classification. Egypt. Inform. J. 2024, 26, 100475. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, H.; Feng, Y.; Cai, Z.; Zhu, L. NetST: Network Encrypted Traffic Classification Based on Swin Transformer. Comput. Mater. Contin. 2025, 84, 5279–5298. [Google Scholar] [CrossRef]
- Liu, Z.; Xie, Y.; Luo, Y.; Wang, Y.; Ji, X. TransECA-Net: A Transformer-Based Model for Encrypted Traffic Classification. Appl. Sci. 2025, 15, 2977. [Google Scholar] [CrossRef]
- Claise, B.; Trammell, B.; Aitken, P. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information; Technical Report; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2013. [Google Scholar] [CrossRef]
- Baldini, G. Analysis of encrypted traffic with time-based features and time frequency analysis. In Proceedings of the 2020 Global Internet of Things Summit (GIoTS), Dublin, Ireland, 3 June 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Shapira, T.; Shavitt, Y. FlowPic: A generic representation for encrypted traffic classification and applications identification. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1218–1232. [Google Scholar] [CrossRef]
- Crovella, M.E.; Bestavros, A. Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 2002, 5, 835–846. [Google Scholar] [CrossRef]
- Moniz, N.; Branco, P.; Torgo, L. Resampling strategies for imbalanced time series forecasting. Int. J. Data Sci. Anal. 2017, 3, 161–181. [Google Scholar] [CrossRef]
- Bolstad, B.M.; Irizarry, R.A.; Åstrand, M.; Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19, 185–193. [Google Scholar] [CrossRef]
- McCaw, Z.R.; Lane, J.M.; Saxena, R.; Redline, S.; Lin, X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 2020, 76, 1262–1272. [Google Scholar] [CrossRef]
- Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–21. [Google Scholar] [CrossRef]
- Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
- Saikhu, A.; Arifin, A.Z.; Fatichah, C. Correlation and Symmetrical Uncertainty-Based Feature Selection for Multivariate Time Series Classification. Int. J. Intell. Eng. Syst. 2019, 12, 129–137. [Google Scholar] [CrossRef]
- Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Feature | Description | Formula |
---|---|---|
Flow Duration | Elapsed time from the first to the last packet | |
Total Bytes | Total number of bytes transmitted in the flow | |
Avg Packet Size | Mean packet size within the flow | |
Bytes per Sec | Transmission rate over the flow duration | |
Max Burst Size | Maximum bytes in any contiguous burst | |
Burst Period | Maximum duration of any burst interval |
Type | Dataset | # of Flows | PCAP Size |
---|---|---|---|
Normal | CIC-IDS-2018 | 1,200,000 | 70.4 GB |
Web Flooding | CIC-IDS-2018 | 200,000 | 12.1 GB |
FTP Flooding | CIC-IDS-2018 | 200,000 | 10.0 GB |
SSH Flooding | CIC-IDS-2018 | 200,000 | 12.9 GB |
DoS/DDoS | CIC-IDS-2018 | 100,000 | 6.8 GB |
Port Scan | CIC-IDS-2018 | 100,000 | 4.2 GB |
IoT Botnet | BoT-IoT | 500,000 | 32.4 GB |
Total | 2,500,000 | 148.8 GB |
Method | RMSE | Feature MSE | AUC | Accuracy (%) | F1-Score |
---|---|---|---|---|---|
Statistical Sampling | 0.031 (±0.002) | 0.012 (±0.001) | 0.912 (±0.005) | 88.5 (±0.6) | 0.879 (±0.008) |
AutoEncoder Reconstruction | 0.024 (±0.001) | 0.010 (±0.0005) | 0.934 (±0.004) | 90.1 (±0.5) | 0.895 (±0.007) |
Time-Series TF (Ours) | 0.013 (±0.001) | 0.005 (±0.0003) | 0.976 (±0.003) | 95.2 (±0.4) | 0.943 (±0.006) |
Method | Precision (%) | Recall (%) |
---|---|---|
Statistical Sampling | 88.1 (±0.7) | 87.9 (±0.8) |
AutoEncoder Reconstruction | 89.7 (±0.6) | 90.2 (±0.5) |
Time-Series TF (Ours) | 94.1 (±0.5) | 94.5 (±0.4) |
Method | Benign F1-Score | Attack F1-Score |
---|---|---|
Statistical Sampling | 0.895 (±0.007) | 0.863 (±0.009) |
AutoEncoder Reconstruction | 0.912 (±0.006) | 0.878 (±0.008) |
Time-Series TF (Ours) | 0.956 (±0.004) | 0.930 (±0.005) |
Method | Avg MMD (RBF) | Avg MMD (Linear) | Aug. Time (ms/Sample) | Memory Overhead (%) |
---|---|---|---|---|
Statistical Sampling | 0.045 (±0.003) | 0.051 (±0.004) | 1.2 | 5 |
AutoEncoder Reconstruction | 0.032 (±0.002) | 0.031 (±0.002) | 5.8 | 12 |
Time-Series TF (Ours) | 0.018 (±0.001) | 0.020 (±0.002) | 3.4 | 8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, D.; Kim, Y.; Lee, C.; Sohn, K. Dual-Stream Time-Series Transformer-Based Encrypted Traffic Data Augmentation Framework. Appl. Sci. 2025, 15, 9879. https://doi.org/10.3390/app15189879
Choi D, Kim Y, Lee C, Sohn K. Dual-Stream Time-Series Transformer-Based Encrypted Traffic Data Augmentation Framework. Applied Sciences. 2025; 15(18):9879. https://doi.org/10.3390/app15189879
Chicago/Turabian StyleChoi, Daeho, Yeog Kim, Changhoon Lee, and Kiwook Sohn. 2025. "Dual-Stream Time-Series Transformer-Based Encrypted Traffic Data Augmentation Framework" Applied Sciences 15, no. 18: 9879. https://doi.org/10.3390/app15189879
APA StyleChoi, D., Kim, Y., Lee, C., & Sohn, K. (2025). Dual-Stream Time-Series Transformer-Based Encrypted Traffic Data Augmentation Framework. Applied Sciences, 15(18), 9879. https://doi.org/10.3390/app15189879