Fast and Accurate Multi-Task Learning for Encrypted Network Traffic Classification
Abstract
:1. Introduction
- We adopt a multi-task learning (MTL) approach for encrypted traffic classification, leveraging the DistilBERT model. The proposed method is based on a model that can handle multiple classification tasks simultaneously. The proposed method allows for a thorough and detailed analysis of encrypted network traffic, addressing the complexity of various tasks within a unified training framework.
- To validate our proposed method, we conducted verification experiments, focusing on three specific tasks using the ISCX 2016 VPN/Non-VPN dataset. We compared our approach with other methods, assessing classification accuracy and efficiency. In terms of classification accuracy, we demonstrated average accuracies ranging from 96.89~99.29% across all tasks, outperforming the majority of existing methods. In terms of model efficiency, our approach showed favorable per sample processing time compared to existing models. Through our experiment results, we validate that our proposed method, employing multi-task classification for encrypted traffic, is effective in terms of both classification performance and efficiency.
- We applied weight adjustments (class weight, task weight) within the model to solve the problems related to data imbalance and varying task difficulty. Through additional experiments, we validated the impact of both weights on performance improvement. This underscores the effectiveness of our approach in diverse scenarios, enhancing its applicability across various situations.
2. Related Works
2.1. Overview of the Network Traffic Classification
2.2. Encrypted Traffic Classification
2.3. Overview of the Multi-Task Learning
3. Proposed Method
3.1. Model Architecture
3.1.1. Data Preprocessing
- (1)
- Target Dataset: While there have been many publicly available network traffic datasets for a long time, encrypted traffic datasets are the most common. There are several encrypted traffic datasets available, but we use the ISCX 2016 VPN/Non-VPN dataset [40], which is the most popular in this research area. This dataset is captured from real traffic and is a publicly available dataset in raw pcap format consisting of traffic from various applications. Since it is the most popular dataset used in several previous studies, it allows for the comparison and interpretation of experimental results from multiple studies. The dataset is broadly categorized into three classes (i.e., encapsulation, category, and application), and separate classification studies are typically performed for each label. Table 1 shows information about the classes for each task. Encapsulation refers to the presence or absence of encryption on the target traffic and consists of two classes: VPN and Non-VPN. Category refers to the nature of the application and consists of six classes, excluding web browsing. Application indicates the application used and consists of sixteen classes.
- (2)
- Preprocessing: We perform the following preprocessing. First, we convert the packet-level pcap file to flow-level. We segment the capture files into bidirectional flows using the SplitCap tool. Second, we remove irrelevant flows from the converted flow file. The ISCX 2016 VPN/Non-VPN dataset contains approximately 309 K flows in total. However, as noted in [51], the dataset contains a lot of irrelevant flows. For example, it also includes traffic that is not application-specific, such as NBSS, LLMNR, DNS, etc. and the disrupted three-way handshake flows. Through the preprocessing steps outlined in [51], a total of 29,195 flows were identified. We performed further analysis and found that there were specific flows within these flows, characterized by UDP, a destination IP of 255.255.255.255, and a consistent inclusion of the string “Beacon~” in the payload. These flows were considered non-essential for the research objectives; therefore, we removed these unnecessary flows from the converted flow data. After going through the first and second process, we finally obtained 8763 flows. Third, we performed zero-padding and flow splicing from the converted data. Considering the subsequent byte tokenization process, we extract 63 bytes from each of the eight packets in the flow. In this process, if the number of bytes in a packet is less than 63, we perform zero-padding. If the packet has more than 63 bytes, we perform splicing. Based on other research [33,34] and experiments under various configurations, we chose 63 as the optimal byte value. The 63 bytes are composed of (1) IP, (2) TCP or UDP, and (3) Payload, depending on the network layer and data. In this case, the IP has the same number of bytes at 20 bytes, but the lengths of the headers for TCP and UDP are 20 and 8 bytes, respectively, so the length of the payload that comes after it will be different. Therefore, the UDP header is extended to 20 bytes by using zero-padding at the end. We also perform zero-padding for flows that are less than 63 bytes in length for the entire flow, and in the case of UDP, additional padding is performed for the UDP header. Finally, we remove the Ethernet header and, masking the IP, port to zero. These are masked as it can cause biased interpolation as it has strong identifying information. Figure 3 shows the distribution of bidirectional flows by class for pre-processed data. In Figure 3, we can see that the three tasks suffer from data imbalance between each class, which we address in Section 3.2.1.
3.1.2. Byte Tokenizing
3.1.3. Multi-Task Classification
3.2. Weight Adjustment
3.2.1. Class Weight for Imbalanced Data
3.2.2. Task Weight for Loss Calculation
4. Evaluation, Result and Analysis
4.1. Evaluation Environment Setup
4.2. Evaluation Metrics
4.3. Evaluation Result
4.3.1. Performance of the Proposed Method
4.3.2. Comparison with Other Model
- (1)
- (2)
- (3)
4.3.3. Performance of the Efficiency
5. Discussions
5.1. Effect of Class Wight in Data Imbalance
5.2. Performance Based on Weight Adjustment
5.3. Performance Based on Input Shape
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Performance Based on Input Shape (for Task #3: Application Classification) | ||||
---|---|---|---|---|
Input Shape (Packet, Byte) | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
(4, 60) | 78.13 | 90.58 | 74.64 | 81.84 |
(4, 61) | 85.36 | 97.14 | 83.59 | 89.85 |
(4, 62) | 84.78 | 94.19 | 82.51 | 87.96 |
(4, 63) | 85.10 | 91.14 | 78.57 | 84.39 |
(4, 64) | 84.97 | 97.54 | 80.63 | 88.28 |
(4, 65) | 85.62 | 96.52 | 81.99 | 88.66 |
(4, 66) | 83.89 | 97.58 | 79.65 | 87.71 |
(4, 67) | 85.32 | 93.80 | 80.48 | 86.63 |
(4, 68) | 86.46 | 97.41 | 80.93 | 86.41 |
(4, 69) | 86.27 | 95.07 | 81.21 | 87.60 |
(4, 70) | 86.40 | 96.52 | 81.99 | 88.66 |
(5, 60) | 84.24 | 96.10 | 80.34 | 87.51 |
(5, 61) | 85.23 | 97.58 | 80.13 | 87.99 |
(5, 62) | 83.46 | 97.57 | 80.61 | 88.28 |
(5, 63) | 86.46 | 96.77 | 82.63 | 89.14 |
(5, 64) | 84.93 | 97.49 | 81.82 | 88.97 |
(5, 65) | 84.63 | 92.94 | 79.14 | 85.49 |
(5, 66) | 84.27 | 97.50 | 79.32 | 87.48 |
(5, 67) | 82.81 | 88.00 | 74.26 | 83.55 |
(5, 68) | 82.24 | 90.15 | 74.78 | 81.75 |
(5, 69) | 83.35 | 91.15 | 76.79 | 83.36 |
(5, 70) | 84.51 | 96.37 | 81.98 | 88.59 |
(6, 60) | 81.86 | 93.14 | 78.51 | 85.20 |
(6, 61) | 85.89 | 94.59 | 82.44 | 88.10 |
(6, 62) | 84.49 | 97.44 | 80.13 | 87.94 |
(6, 63) | 88.07 | 97.76 | 83.22 | 89.91 |
(6, 64) | 85.90 | 97.51 | 81.77 | 88.95 |
(6, 65) | 85.78 | 97.55 | 81.52 | 88.82 |
(6, 66) | 86.46 | 97.62 | 82.37 | 89.59 |
(6, 67) | 86.08 | 97.51 | 83.16 | 89.77 |
(6, 68) | 86.42 | 97.59 | 81.37 | 88.74 |
(6, 69) | 85.93 | 97.59 | 81.50 | 86.58 |
(6, 70) | 86.46 | 96.44 | 83.25 | 89.36 |
(7, 60) | 81.45 | 90.61 | 80.11 | 85.03 |
(7, 61) | 84.18 | 92.27 | 81.77 | 86.70 |
(7, 62) | 87.74 | 97.43 | 83.99 | 90.21 |
(7, 63) | 87.42 | 97.60 | 82.26 | 89.27 |
(7, 64) | 82.62 | 89.23 | 78.18 | 83.34 |
(7, 65) | 86.43 | 96.59 | 82.87 | 89.20 |
(7, 66) | 86.73 | 97.49 | 82.60 | 89.43 |
(7, 67) | 86.35 | 97.34 | 82.87 | 89.52 |
(7, 68) | 84.37 | 91.88 | 78.47 | 84.65 |
(7, 69) | 85.46 | 96.86 | 84.35 | 90.17 |
(7, 70) | 84.79 | 94.01 | 79.11 | 85.92 |
(8, 60) | 86.86 | 96.56 | 79.62 | 87.28 |
(8, 61) | 86.58 | 97.59 | 81.14 | 88.61 |
(8, 62) | 88.16 | 95.39 | 82.19 | 88.30 |
(8, 63) | 90.28 | 98.17 | 86.28 | 91.84 |
References
- Callado, A.; Kamienski, C.; Szabó, G.; Gero, B.P.; Kelner, J.; Fernandes, S.; Sadok, D. A Survey on Internet Traffic Identification. IEEE Commun. Surv. Tutor. 2009, 11, 37–52. [Google Scholar] [CrossRef]
- Dainotti, A.; Pescape, A.; Claffy, K. Issues and Future Directions in Traffic Classification. IEEE Netw. 2012, 26, 35–40. [Google Scholar] [CrossRef]
- Madhukar, A.; Williamson, C. A Longitudinal Study of P2P Traffic Classification. In Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, Monterey, CA, USA, 11–14 September 2006; pp. 179–188. [Google Scholar]
- Nguyen, T.T.T.; Armitage, G. A Survey of Techniques for Internet Traffic Classification using Machine Learning. IEEE Commun. Surv. Tut. 2008, 10, 56–76. [Google Scholar] [CrossRef]
- Pacheco, F.; Exposito, E.; Gineste, M.; Baudoin, C.; Aguilar, J. Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey. IEEE Commun. Surv. Tutor. 2018, 21, 1988–2014. [Google Scholar] [CrossRef]
- Al Khater, N.; Overill, R.E. Network Traffic Classification Techniques and Challenges. In Proceedings of the 2015 Tenth International Conference on Digital Information Management (ICDIM), Jeju, Republic of Korea, 21–23 October 2015; pp. 43–48. [Google Scholar]
- Feng, X.; Huang, X.; Tian, X.; Ma, Y. Automatic Traffic Signature Extraction based on Smith-Waterman Algorithm for Traffic Classification. In Proceedings of the 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), Beijing, China, 26–28 October 2010; pp. 154–158. [Google Scholar]
- Lim, H.-K.; Kim, J.-B.; Heo, J.-S.; Kim, K.; Hong, Y.-G.; Han, Y.-H. Packet-based Network Traffic Classification Using Deep Learning. In Proceedings of the 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 11–13 February 2019; pp. 46–51. [Google Scholar]
- Finsterbusch, M.; Richter, C.; Rocha, E.; Muller, J.A.; Hanssgen, K. A Survey of Payload-Based Traffic Classification Approaches. IEEE Commun. Surv. Tutor. 2014, 16, 1135–1156. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Zade, R.S.H.; Siavoshani, M.J.; Saberian, M. Deep Packet: A Novel Approach for Encrypted Traffic Classification using Deep Learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
- Wang, P.; Ye, F.; Chen, X.; Qian, Y. Datanet: Deep Learning Based Encrypted Network Traffic Classification in SDN Home Gateway. IEEE Access 2018, 6, 55380–55391. [Google Scholar] [CrossRef]
- Zou, Z.; Ge, J.; Zheng, H.; Wu, Y.; Han, C.; Yao, Z. Encrypted Traffic Classification with a Convolutional Long Short-Term Memory Neural Network. In Proceedings of the 2018 IEEE 20th International Conference on High-Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Exeter, UK, 28–30 June 2018; pp. 329–334. [Google Scholar]
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A.; Lloret, J. Network Traffic Classifier with Convolutional and Recurrent Neural Networks for Internet of Things. IEEE Access 2017, 5, 18042–18050. [Google Scholar] [CrossRef]
- Williams, N.; Zander, S.; Armitage, G. A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification. ACM SIGCOMM Comput. Commun. Rev. 2006, 36, 5–16. [Google Scholar] [CrossRef]
- Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. FS-Net: A Flow Sequence Network for Encrypted Traffic Classification. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 1171–1179. [Google Scholar]
- Shapira, T.; Shavitt, Y. FlowPic: Encrypted Internet Traffic Classification is as Easy as Image Recognition. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Paris, France, 29 April–2 May 2019; pp. 680–687. [Google Scholar]
- Lin, K.; Xu, X.; Gao, H. TSCRNN: A Novel Classification Scheme of Encrypted Traffic based on Flow Spatiotemporal Features for Efficient Management of IIoT. Comput. Netw. 2021, 190, 107974. [Google Scholar] [CrossRef]
- Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapè, A. MIMETIC: Mobile Encrypted Traffic Classification using Multimodal Deep Learning. Comput. Netw. 2019, 165, 106944. [Google Scholar] [CrossRef]
- Hao, S.; Hu, J.; Liu, S.; Song, T.; Guo, J.; Liu, S. Network Traffic Classification based on Improved DAG-SVM. In Proceedings of the 2015 International Conference on Communications, Management and Telecommunications (ComManTel), DaNang, Vietnam, 28–30 December 2015; pp. 256–261. [Google Scholar]
- Yao, H.; Liu, C.; Zhang, P.; Wu, S.; Jiang, C.; Yu, S. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short-Term Memory. IEEE Trans. Big Data 2019, 8, 241–252. [Google Scholar] [CrossRef]
- He, H.Y.; Yang, Z.G.; Chen, X.N. PERT: Payload Encoding Representation from Transformer for Encrypted Traffic Classification. In Proceedings of the 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), Ha Noi, Vietnam, 7–11 December 2020; pp. 1–8. [Google Scholar]
- Shi, Z.; Luktarhan, N.; Song, Y.; Tian, G. BFCN: A Novel Classification Method of Encrypted Traffic Based on BERT and CNN. Electronics 2023, 12, 516. [Google Scholar] [CrossRef]
- Ma, X.; Liu, T.; Hu, N.; Liu, X. Bi-ETC: A Bidirectional Encrypted Traffic Classification Model Based on BERT and BiLSTM. In Proceedings of the 2023 8th International Conference on Data Science in Cyberspace (DSC), Hefei, China, 18–20 August 2023; pp. 197–204. [Google Scholar]
- Zhao, R.; Zhan, M.; Deng, X.; Wang, Y.; Wang, Y.; Gui, G.; Xue, Z. Yet Another Traffic Classifier: A Masked Autoencoder Based Traffic Transformer with Multi-Level Flow Representation. Proc. AAAI Conf. Artif. Intell. 2023, 37, 5420–5427. [Google Scholar] [CrossRef]
- Zijun, H.; Yuliang, L.; Yongjie, W.; Yi, X. Flow-MAE: Leveraging Masked AutoEncoder for Accurate, Efficient and Robust Malicious Traffic Classification. In Proceedings of the RAID 2023: The 26th International Symposium on Research in Attacks, Intrusions and Defenses, Hong Kong, China, 16–18 October 2023; pp. 297–314. [Google Scholar]
- Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-End Encrypted Traffic Classification with One-Dimensional Convolution Neural Networks. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics IEEE, Beijing, China, 22–24 July 2017; pp. 43–48. [Google Scholar]
- Shahraki, A.; Abbasi, M.; Taherkordi, A.; Jurcut, A.D. Active Learning for Network Traffic Classification: A Technical Study. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 422–439. [Google Scholar] [CrossRef]
- Park, K.; Kim, H. Encryption Is Not Enough: Inferring User Activities on Kakaotalk with Traffic Analysis. In International Workshop on Information Security Applications (WISA); Springer: Cham, Switzerland, 2015; pp. 254–265. [Google Scholar]
- Saltaformaggio, B.; Choi, H.; Johnson, K.; Kwon, Y.; Zhang, Q.; Zhang, X.; Xu, D.; Qian, J. Eavesdropping on Fine-Grained User Activities Within Smartphone Apps Over Encrypted Network Traffic. In Proceedings of the 10th USENIX workshop on offensive technologies (WOOT 16), Austin, TX, USA, 8–9 August 2016; pp. 69–78. [Google Scholar]
- Fu, Y.; Xiong, H.; Lu, X.; Yang, J.; Chen, C. Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps. IEEE Trans. Mob. Comput. 2016, 15, 2851–2864. [Google Scholar] [CrossRef]
- Celdrán, A.H.; von der Assen, J.; Moser, K.; Sánchez PM, S.; Bovet, G.; Pérez, G.M.; Stiller, B. Early Detection of Cryptojacker Malicious Behaviors on IoT Crowdsensing Devices. In Proceedings of the NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, Miami, FL, USA, 8–12 May 2023; pp. 1–8. [Google Scholar]
- Pathmaperuma, M.H.; Rahulamathavan, Y.; Dogan, S.; Kondoz, A.M. Deep Learning for Encrypted Traffic Classification and Unknown Data Detection. Sensors 2022, 22, 7643. [Google Scholar] [CrossRef] [PubMed]
- Shin, C.-Y.; Park, J.-T.; Baek, U.-J.; Kim, M.-S. A Feasible and Explainable Network Traffic Classifier Utilizing DistilBERT. IEEE Access 2023, 11, 70216–70237. [Google Scholar] [CrossRef]
- Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. ET-BERT: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. arXiv 2022, arXiv:2202.06335. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Van Ede, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.J.; Lindorfer, M.; Choffnes, D.; Van Steen, M.; Peter, A. FlowPrint: Semi-Supervised Mobile-App Fingerprinting on Encrypted Network Traffic. In Proceedings of the 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, CA, USA, 23–26 February 2020. [Google Scholar]
- Shen, M.; Zhang, J.; Zhu, L.; Xu, K.; Du, X. Accurate Decentralized Application Identification via Encrypted Traffic Analysis Using Graph Neural Networks. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2367–2380. [Google Scholar] [CrossRef]
- Xu, Y.; Cao, J.; Song, K.; Xiang, Q.; Cheng, G. FastTraffic: A Lightweight Method for Encrypted Traffic Fast Classification. Comput. Netw. 2023, 235, 109965. [Google Scholar] [CrossRef]
- Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
- Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
- Zhang, Y.; Yang, Q. A Survey on Multi-Task Learning. IEEE Trans. Knowl. Data Eng. 2022, 34, 5586–5609. [Google Scholar] [CrossRef]
- Vandenhende, S.; Georgoulis, S.; Van Gansbeke, W.; Proesmans, M.; Dai, D.; Van Gool, L. Multi-Task Learning for Dense Prediction Tasks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3614–3633. [Google Scholar] [CrossRef] [PubMed]
- Panchenko, A.; Lanze, F.; Pennekamp, J.; Engel, T.; Zinnen, A.; Henze, M.; Wehrle, K. Website Fingerprinting at Internet Scale. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, CA, USA, 21–24 February 2016. [Google Scholar]
- Al-Naami, K.; Chandra, S.; Mustafa, A.; Khan, L.; Lin, Z.; Hamlen, K.; Thuraisingham, B. Adaptive Encrypted Traffic Fingerprinting with Bi-Directional Dependence. In Proceedings of the ACSAC’16: 2016 Annual Computer Security Applications Conference, Los Angeles, CA, USA, 5–8 December 2016; pp. 177–188. [Google Scholar]
- Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, 15–19 October 2018; pp. 1928–1943. [Google Scholar]
- Cheng, J.; Wu, Y.; Yuepeng, E.; You, J.; Li, T.; Li, H.; Ge, J. MATEC: A Lightweight Neural Network for Online Encrypted Traffic Classification. Comput. Netw. 2021, 199, 108472. [Google Scholar] [CrossRef]
- Huang, H.; Deng, H.; Chen, J.; Han, L.; Wang, W. Automatic Multi-Task Learning System for Abnormal Network Traffic Detection. Int. J. Emerg. Technol. Learn. 2018, 13, 4–20. [Google Scholar] [CrossRef]
- Rezaei, S.; Liu, X. Multitask Learning for Network Traffic Classification. In Proceedings of the 2020 29th International Conference on Computer Communications and Networks (ICCCN), Honolulu, HI, USA, 3–6 August 2020; pp. 1–9. [Google Scholar]
- Wang, K.; Gao, J.; Lei, X. MTC: A Multi-Task Model for Encrypted Network Traffic Classification Based on Transformer and 1D-CNN. Intell. Autom. Soft Comput. 2023, 37, 619–638. [Google Scholar] [CrossRef]
- Baek, U.-J.; Lee, M.-S.; Park, J.-T.; Choi, J.-W.; Shin, C.-Y.; Kim, M.-S. Preprocessing and Analysis of an Open Dataset in Application Traffic Classification. In Proceedings of the 2023 24st Asia-Pacific Network Operations and Management Symposium (APNOMS), Sejong, Republic of Korea, 6–8 September 2023; pp. 227–230. [Google Scholar]
- Longadge, R.; Dongre, S. Class Imbalance Problem in Data Mining Review. arXiv 2013, arXiv:1305.1707. [Google Scholar]
- Sharif, M.S.; Moein, M. An Effective Cost-Sensitive Convolutional Neural Network for Network Traffic Classification. In Proceedings of the 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Zallaq, Bahrain,, 29–30 September 2021; pp. 40–45. [Google Scholar]
- Taylor, V.F.; Spolaor, R.; Conti, M.; Martinovic, I. Robust Smartphone App Identification via Encrypted Network Traffic Analysis. IEEE Trans. Inf. Forensics Secur. 2017, 13, 63–78. [Google Scholar] [CrossRef]
Task | Classes |
---|---|
Encapsulation (2) | VPN, Non-VPN |
Category (6) | Chat, Email, Streaming, File Transfer, P2P, VoIP |
Application (16) | Skype, ICQ, Hangout, Facebook, Email, Gmail, FTP, SFTP, SCP, Netflix, Spotify, Vimeo, YouTube, AIM Chat, VOIPBuster, BitTorrent |
Proposed Method | |||||
---|---|---|---|---|---|
Task | Class | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
Task #1: Encapsulation | VPN | 99.45 | 98.72 | 99.87 | 99.29 |
Non-VPN | 98.69 | 99.72 | 97.23 | 98.46 | |
Task #2: Category | Chat | 94.86 | 97.65 | 94.86 | 96.21 |
98.21 | 96.90 | 98.21 | 97.55 | ||
File Transfer | 95.44 | 97.58 | 95.44 | 96.50 | |
P2P | 100.00 | 100.00 | 100.00 | 100.00 | |
Streaming | 99.41 | 99.71 | 99.41 | 99.56 | |
VoIP | 97.99 | 96.06 | 97.99 | 97.02 | |
Task #3: Application | AimChat | 93.75 | 78.95 | 93.75 | 85.71 |
97.62 | 98.97 | 97.62 | 98.29 | ||
Hangout | 98.91 | 98.19 | 98.91 | 98.55 | |
ICQChat | 72.73 | 88.89 | 72.73 | 80.00 | |
Skype | 99.61 | 99.12 | 99.61 | 99.36 | |
97.96 | 98.97 | 97.96 | 98.46 | ||
Gmail | 96.30 | 89.66 | 96.30 | 92.86 | |
FTP | 87.78 | 91.30 | 87.78 | 89.51 | |
SCP | 99.76 | 100.00 | 99.76 | 99.88 | |
SFTP | 92.31 | 100.00 | 92.31 | 96.00 | |
BitTorrent | 100.00 | 100.00 | 100.00 | 100.00 | |
Netflix | 97.67 | 100.00 | 97.68 | 98.82 | |
Spotify | 95.16 | 95.16 | 95.16 | 95.16 | |
Vimeo | 99.06 | 98.13 | 99.06 | 98.59 | |
YouTube | 96.15 | 96.90 | 96.15 | 96.53 | |
VoIPBuster | 94.01 | 96.90 | 96.15 | 92.88 |
Comparison Results for Task #2: Category | ||||
---|---|---|---|---|
Method | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
AppScanner [54] | 71.82 | 73.39 | 72.25 | 71.97 |
CUMUL [44] | 56.10 | 58.83 | 56.76 | 56.68 |
BIND [45] | 75.34 | 75.83 | 74.88 | 74.20 |
DF [46] | 71.54 | 71.92 | 71.04 | 71.02 |
FS-Net [15] | 72.05 | 75.02 | 72.38 | 71.31 |
GraphDApp [38] | 59.77 | 60.45 | 62.20 | 60.36 |
TSCRNN [17] | - | 92.70 | 92.60 | 92.60 |
Deep Packet [10] | 93.29 | 93.77 | 93.06 | 93.21 |
1D-CNN [26] | 98.30 | - | - | 98.60 |
FastTraffic [39] | 94.50 | 94.77 | 94.26 | 94.40 |
MATEC [47] | 73.20 | 84.43 | 82.40 | 82.87 |
PERT [21] | 93.52 | 94.00 | 93.49 | 93.68 |
ET-BERT (flow) [34] | 97.29 | 97.56 | 97.31 | 97.33 |
ET-BERT (packet) [34] | 98.90 | 98.91 | 98.90 | 98.90 |
XENTC [33] | 97.03 | - | - | 97.06 |
BFCN [22] | 99.12 | 99.13 | 99.11 | 99.11 |
YaTC [24] | 98.07 | - | - | 98.04 |
Flow-MAE [25] | 99.15 | 99.24 | 99.15 | 99.17 |
Proposed | 97.38 | 97.31 | 95.93 | 96.61 |
Comparison Results for Task #3: Application | ||||
---|---|---|---|---|
Method | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
AppScanner [54] | 62.66 | 48.64 | 51.98 | 49.35 |
CUMUL [44] | 53.65 | 41.29 | 45.35 | 42.36 |
BIND [45] | 67.67 | 51.52 | 51.53 | 49.65 |
DF [46] | 61.16 | 66.97 | 66.51 | 65.31 |
FS-Net [15] | 66.47 | 48.19 | 48.48 | 47.37 |
GraphDApp [38] | 62.28 | 59.00 | 54.72 | 55.58 |
TSCRNN [17] | - | - | - | - |
Deep Packet [10] | 97.58 | 97.85 | 97.45 | 97.65 |
1D-CNN [26] | 86.60 | - | - | 86.50 |
FastTraffic [39] | 92.24 | 93.58 | 92.84 | 93.12 |
MATEC [47] | 69.21 | 73.32 | 65.40 | 68.24 |
PERT [21] | 82.29 | 70.92 | 71.73 | 69.92 |
ET-BERT (flow) [34] | 85.19 | 75.08 | 72.94 | 73.06 |
ET-BERT (packet) [34] | 99.62 | 99.36 | 99.38 | 99.37 |
XENTC [33] | 96.37 | - | - | 94.63 |
BFCN [22] | 99.65 | 99.36 | 99.47 | 99.41 |
YaTC [24] | - | - | - | - |
Flow-MAE [25] | 99.87 | 99.91 | 99.89 | 99.90 |
Proposed | 96.89 | 96.91 | 95.13 | 96.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.-T.; Shin, C.-Y.; Baek, U.-J.; Kim, M.-S. Fast and Accurate Multi-Task Learning for Encrypted Network Traffic Classification. Appl. Sci. 2024, 14, 3073. https://doi.org/10.3390/app14073073
Park J-T, Shin C-Y, Baek U-J, Kim M-S. Fast and Accurate Multi-Task Learning for Encrypted Network Traffic Classification. Applied Sciences. 2024; 14(7):3073. https://doi.org/10.3390/app14073073
Chicago/Turabian StylePark, Jee-Tae, Chang-Yui Shin, Ui-Jun Baek, and Myung-Sup Kim. 2024. "Fast and Accurate Multi-Task Learning for Encrypted Network Traffic Classification" Applied Sciences 14, no. 7: 3073. https://doi.org/10.3390/app14073073
APA StylePark, J.-T., Shin, C.-Y., Baek, U.-J., & Kim, M.-S. (2024). Fast and Accurate Multi-Task Learning for Encrypted Network Traffic Classification. Applied Sciences, 14(7), 3073. https://doi.org/10.3390/app14073073