Data Exfiltration Detection on Network Metadata with Autoencoders
Abstract
:1. Introduction
- We design and test a novel Network Exfiltration Detection System (NEDS) that can detect instances of data exfiltration as occurring in ransomware attacks. The NEDS analyses network traffic and can detect anomalies in this traffic without relying on specific threat intelligence. The NEDS is composed of an ensemble of autoencoders, where each autoencoder is targeted at one or multiple network protocols;
- A key novelty of our NEDS is that it operates on aggregated network metadata from multiple, sequential sessions. This allows to detect data exfiltration that happens over a longer period of time, by either stateless or stateful protocols. The usage of aggregated metadata also allows to deal efficiently with large amounts of network sessions. Hence, the NEDS can be applied in practical settings for either near real-time analysis of live data or in post incident analysis of captured data;
- We train the NEDS using unsupervised learning with real-life data from the NDN sensor platform (NSP) in the Netherlands. We evaluate the detection performance of the NEDS for data exfiltration over different channels, including DNS tunnels and uploads to FTP servers, web servers, and cloud storage. Our experimental results demonstrate that the usage of aggregated metadata significantly increases detection performance of exfiltration with limited false positive rates.
2. Background
2.1. Data Exfiltration
2.2. Autoencoders
3. Related Work
3.1. Traffic Inspection
3.2. Methods
3.3. Rationale
4. NEDS Design
4.1. Architecture
4.2. Session Metadata
- Hosts: the two hosts that are communicating in the session.
- Session size: the total amount of bytes transmitted both ways.
- Lifetime: the total number of seconds the connection was alive.
- Timestamp: the time at which the connection was initiated.
- Payload: the total amount of bytes successfully transmitted both ways.
- Total transmitted bytes of request and response: the total amount of bytes that were transmitted, including dropped packets.
- Direction of traffic: indicates whether traffic is inbound, lateral, or outbound.
- Packet count of session: total packet count of session.
- Transport level protocol: TCP or UDP.
- Service: DNS, HTTP, etc.
- Destination organisation: indicates whether the destination is linked to an organisation.
- Entropy of request and response payload: entropy of transmitted bytes.
4.3. Aggregated Session Metadata
- 1.
- Average packet count per session: This feature represents the average number of packets per session. It aggregates the packet count metadata field. If a malicious actor is actively exfiltrating data, this feature value may possibly be higher than normal. However, this can also be due to a non-malicious actor uploading information to an external host. For some protocols where the number of packets is usually small and consistent, such as DNS, this feature can highlight anomalous behaviour.
- 2.
- Average request entropy per session: This feature represents the average entropy of the payloads in outgoing packets per session. It aggregates the entropy.req metadata field. Since our main objective is to detect exfiltration, we only consider outgoing traffic and we ignore response entropy and total entropy. Entropy roughly resembles the amount of information the payload contains. A high request entropy may indicate exfiltration, as more data are being moved. A high request entropy can also be an indicator of encrypted traffic, which an attacker may use when tunnelling over an usually unencrypted protocol [10].
- 3.
- Average session duration: This feature represents the average duration (i.e., lifetime) of a session in seconds. It aggregates the lifetime metadata field. The duration of a session combined with other features can be indicative of anomalous behaviour. For instance, sessions with DNS tunnelling have a much longer duration compared with normal DNS traffic.
- 4.
- Average session payload size: This feature represents the average total session payload size in bytes. It aggregates the payload metadata field. Note that the average session payload size might be small even in long sessions due to packet drops. A high payload size may indicate that a lot of data is being transported, and hence this feature can potentially detect naïve data exfiltration. However, the feature will not reveal a smart adversary who uploads data in low quantities over longer time periods.
- 5.
- Average time between sequential sessions: This feature represents the average time between sessions in seconds. It aggregates the differences between timestamp metadata fields. It is potentially useful for identifying automated behaviour, since a user will likely not generate many hundreds of DNS or HTTP requests per second.
- 6.
- Weight: This feature represents the total amount of sessions aggregated under the key. This feature is used for computing the averages of the features above. It also indicates the difference between a single large session and multiple smaller sessions. A single session which contains a large payload will have a weight of 1, while 100 sessions transporting a total payload that is 100 times as big will have an identical average payload size but a greater weight value.
- select entropy.req, sessionid, lifetime, time, service, ip.src.hash, ip.dst.hash, ipv6.src.hash, ipv6.dst.hash, packets, payload.req
- where direction = ‘outbound’ and time = “-”
Algorithm 1. Querying and aggregation. |
|
4.4. Storage
4.5. Anomaly Detector
4.5.1. Architecture
4.5.2. Autoencoder Model
4.5.3. Threshold Checker
4.6. Normalization
4.7. Deployment
5. Experimental Setup
5.1. Infrastructure
5.2. Training Dataset
5.3. Test Dataset
5.3.1. Normal Traffic
5.3.2. Anomalous Traffic
- DNS tunnel exfiltration: We exfiltrated data through DNS tunnels by using the following malware: bondupdater, cobaltstrike, denis, dnspionage, ismdoor, pisloader-01, pisloader-02, and udpos. The malware creates DNS tunnels that are used as C2 channels or to exfiltrate data. Malware such as ismdoor and iodine tunnels multiple packets, while tunnels created by, for instance, pisloader-01 and pisloader-02 are very small.
- FTP exfiltration: We exfiltrated data by uploading files to an FTP server. We set up an FTP server on a virtual private server. The files are uploaded from another host.
- HTTP exfiltration: We exfiltrated data over HTTP with GET/POST requests. We set up a simple upload server on a virtual private server. The files are uploaded from another host.
- Cloud exfiltration: We exfiltrated data by uploading files to cloud storage, using the Rclone utility (https://rclone.org (accessed on 1 December 2021)). We setup Rclone to exfiltrate data to Google Drive.
5.4. Evaluation Criteria
6. Experimental Results
6.1. Results
6.2. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AE | Autoencoder |
API | Application Programming Interface |
C2 | Command & Control |
DNS | Domain Name System |
DPI | Deep Packet Inspection |
FPR | False Positive Rate |
FTP | File Transfer Protocol |
HTTP | Hypertext Transfer Protocol |
HTTPS | Hypertext Transfer Protocol Secure |
IOC | Indicator of Compromise |
NCSC | National Cyber Security Centre |
NDN | National Detection Network |
NEDS | Network Exfiltration Detection System |
NSP | NDN Sensor Platform |
PCA | Principal Component Analysis |
REST | Representational State Transfer |
SMTP | Simple Mail Transfer Protocol |
SSH | Secure Shell |
SVM | Support Vector Machine |
TNR | True Negative Rate |
TPR | True Positive Rate |
References
- National Coordinator for Security and Counterterrorism (NCTV); Ministry of Justice and Security. Cyber Security Assessment Netherlands 2021; National Coordinator for Security and Counterterrorism: The Hague, The Netherlands, 2021. [Google Scholar]
- Caviglione, L.; Choraś, M.; Corona, I.; Janicki, A.; Mazurczyk, W.; Pawlicki, M.; Wasielewska, K. Tight Arms Race: Overview of Current Malware Threats and Trends in Their Detection. IEEE Access 2021, 9, 5371–5396. [Google Scholar] [CrossRef]
- Wang, Y.; Zhou, A.; Liao, S.; Zheng, R.; Hu, R.; Zhang, L. A comprehensive survey on DNS tunnel detection. Comput. Netw. 2021, 197, 108322. [Google Scholar] [CrossRef]
- Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-based network anomaly detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA, 17–20 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Sabir, B.; Ullah, F.; Babar, M.A.; Gaire, R. Machine Learning for Detecting Data Exfiltration: A Review. ACM Comput. Surv. 2021, 54, 50. [Google Scholar] [CrossRef]
- Deri, L.; Fusco, F. Using Deep Packet Inspection in CyberTraffic Analysis. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Virtual, 26–28 July 2021; pp. 89–94. [Google Scholar] [CrossRef]
- Fadolalkarim, D.; Bertino, E. A-PANDDE: Advanced Provenance-based ANomaly Detection of Data Exfiltration. Comput. Secur. 2019, 84, 276–287. [Google Scholar] [CrossRef]
- Liu, Y.; Corbett, C.; Chiang, K.; Archibald, R.; Mukherjee, B.; Ghosal, D. SIDD: A Framework for Detecting Sensitive Data Exfiltration by an Insider Attack. In Proceedings of the 2009 42nd Hawaii International Conference on System Sciences, Waikoloa, HI, USA, 5–8 January 2009; pp. 1–10. [Google Scholar] [CrossRef]
- Fawcett, T.W. EXFILD: A Tool for the Detection of Data Exfiltration Using Entropy and Encryption Characteristics of Network Traffic. Master’s Thesis, University of Delaware, Newark, DE, USA, 2010. [Google Scholar]
- He, G.; Zhang, T.; Ma, Y.; Xu, B. A Novel Method to Detect Encrypted Data Exfiltration. In Proceedings of the 2014 Second International Conference on Advanced Cloud and Big Data, Huangshan, China, 20–22 November 2014; pp. 240–246. [Google Scholar] [CrossRef]
- Nadler, A.; Aminov, A.; Shabtai, A. Detection of malicious and low throughput data exfiltration over the DNS protocol. Comput. Secur. 2019, 80, 36–53. [Google Scholar] [CrossRef] [Green Version]
- Haghighat, M.H.; Foroushani, Z.A.; Li, J. SAWANT: Smart Window Based Anomaly Detection Using Netflow Traffic. In Proceedings of the 2019 IEEE 19th International Conference on Communication Technology (ICCT), Xi’an, China, 16–19 October 2019; pp. 1396–1402. [Google Scholar] [CrossRef]
- Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proceedings of the Network and Distributed System Security (NDSS) Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
- Kemp, C.; Calvert, C.; Khoshgoftaar, T. Utilizing Netflow Data to Detect Slow Read Attacks. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; pp. 108–116. [Google Scholar] [CrossRef]
- Najafabadi, M.M.; Khoshgoftaar, T.M.; Calvert, C.; Kemp, C. Detection of SSH Brute Force Attacks Using Aggregated Netflow Data. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 283–288. [Google Scholar] [CrossRef]
- Lu, W.; Ghorbani, A.A. Network Anomaly Detection Based on Wavelet Analysis. EURASIP J. Adv. Signal Process. 2008, 2009, 837601. [Google Scholar] [CrossRef] [Green Version]
- Tsikerdekis, M.; Waldron, S.; Emanuelson, A. Network Anomaly Detection Using Exponential Random Graph Models and Autoregressive Moving Average. IEEE Access 2021, 9, 134530–134542. [Google Scholar] [CrossRef]
- Ahmed, J.; Habibi Gharakheili, H.; Raza, Q.; Russell, C.; Sivaraman, V. Monitoring Enterprise DNS Queries for Detecting Data Exfiltration From Internal Hosts. IEEE Trans. Netw. Serv. Manag. 2020, 17, 265–279. [Google Scholar] [CrossRef]
- Liu, D.; Lung, C.H.; Lambadaris, I.; Seddigh, N. Network traffic anomaly detection using clustering techniques and performance comparison. In Proceedings of the 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Regina, SK, Canada, 5–8 May 2013; pp. 1–4. [Google Scholar] [CrossRef]
- Liu, Y.; Xue, H.; Wei, G.; Wu, L.; Wang, Y. A Comparative Study on Network Traffic Clustering. In Network and System Security; Liu, J.K., Huang, X., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 11928, pp. 443–455. [Google Scholar] [CrossRef]
- Münz, G.; Li, S.; Carle, G. Traffic Anomaly Detection Using K-Means Clustering. In Proceedings of the GI/ITG-Workshop MMBnet 2007, Leistungs-, Zuverlässigkeits- und Verlässlichkeitsbewertung von Kommunikationsnetzen und Verteilten Systemen, Hamburg, Germany, 13–14 September 2007; Volume 7, pp. 116–123. [Google Scholar]
- Pagliari, R.; Ghosh, A.; Gottlieb, Y.M.; Chadha, R.; Vashist, A.; Hadynski, G. Insider attack detection using weak indicators over network flow data. In Proceedings of the MILCOM 2015—2015 IEEE Military Communications Conference, Tampa, FL, USA, 26–26 October 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Radhakrishnan, C.; Karthick, K.; Asokan, R. Ensemble Learning based Network Anomaly Detection using Clustered Generalization of the Features. In Proceedings of the 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 157–162. [Google Scholar] [CrossRef]
- Nixon, C.; Sedky, M.; Hassan, M. Autoencoders: A Low Cost Anomaly Detection Method for Computer Network Data Streams. In Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing, Virtual, 26–28 August 2020; pp. 58–62. [Google Scholar] [CrossRef]
- Xu, W.; Jang-Jaccard, J.; Singh, A.; Wei, Y.; Sabrina, F. Improving Performance of Autoencoder-Based Network Anomaly Detection on NSL-KDD Dataset. IEEE Access 2021, 9, 140136–140146. [Google Scholar] [CrossRef]
- Chen, J.; Sathe, S.; Aggarwal, C.; Turaga, D. Outlier Detection with Autoencoder Ensembles. In Proceedings of the 2017 SIAM International Conference on Data Mining (SDM), Houston, TX, USA, 27–29 April 2017; pp. 90–98. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, Q.P.; Lim, K.W.; Divakaran, D.M.; Low, K.H.; Chan, M.C. GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection. In Proceedings of the 2019 IEEE Conference on Communications and Network Security (CNS), Washington, DC, USA, 10–12 June 2019; pp. 91–99. [Google Scholar] [CrossRef] [Green Version]
- Wu, K.; Zhang, Y.; Yin, T. TDAE: Autoencoder-based Automatic Feature Learning Method for the Detection of DNS tunnel. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Mao, J.; Hu, Y.; Jiang, D.; Wei, T.; Shen, F. CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection. IEEE Access 2020, 8, 116216–116225. [Google Scholar] [CrossRef]
- Jayalakshmi, T.; Santhakumaran, A. Statistical Normalization and Back Propagation for Classification. Int. J. Comput. Theory Eng. 2011, 3, 89–93. [Google Scholar] [CrossRef]
- Ting, K. Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
Channel | Type | Number of Sessions |
---|---|---|
DNS tunnel | bondupdater | 30 |
cobaltstrike | 285 | |
denis | 31 | |
dnspionage | 8 | |
ismdoor | 725 | |
pisloader-01 | 13 | |
pisloader-02 | 78 | |
udpos | 127 | |
FTP exfiltration | credit card file | 4 |
1 kB file | 4 | |
100 kB file | 4 | |
10 MB file | 24 | |
HTTP exfiltration | credit card file | 1 |
1 kB file | 1 | |
100 kB file | 1 | |
10 MB file | 1 | |
Google Drive exfiltration | credit card file | 1 |
1 kB file | 1 | |
100 kB file | 1 | |
10 MB file | 3 |
Dataset | Traffic Type | Number of Sessions |
---|---|---|
Training | Normal | 627,797 |
Test | Normal | 52,320 |
Data exfiltration | 1343 |
Threshold FPR | DNS | HTTPS | HTTP | |||
---|---|---|---|---|---|---|
TPR | TNR | TPR | TNR | TPR | TNR | |
0.001 | 0.0016 | 0.9990 | 0.0000 | 0.9990 | 0.3000 | 0.9983 |
0.010 | 0.0368 | 0.9955 | 0.1667 | 0.9899 | 0.3000 | 0.9896 |
0.050 | 0.1878 | 0.9500 | 0.1667 | 0.9502 | 0.3000 | 0.9844 |
Threshold FPR | DNS | HTTPS | HTTP | |||
---|---|---|---|---|---|---|
TPR | TNR | TPR | TNR | TPR | TNR | |
0.001 | 0.6974 | 0.9990 | 0.1667 | 0.9990 | 0.3000 | 0.9983 |
0.010 | 0.8386 | 0.9900 | 0.1667 | 0.9899 | 0.3000 | 0.9896 |
0.050 | 0.9087 | 0.9499 | 0.3333 | 0.9495 | 0.3000 | 0.9499 |
Exfiltration Channel | Non-Aggregated Data | Aggregated Data | |||||
---|---|---|---|---|---|---|---|
Threshold FPR | 0.001 | 0.010 | 0.050 | 0.001 | 0.010 | 0.050 | |
DNS | bondupdater | 0.000 | 0.000 | 0.033 | 0.000 | 0.000 | 0.000 |
cobaltstrike | 0.000 | 0.000 | 0.067 | 0.519 | 0.828 | 0.888 | |
denis | 0.000 | 0.742 | 0.742 | 1.000 | 1.000 | 1.000 | |
dnspionage | 0.000 | 0.250 | 0.375 | 0.000 | 0.000 | 0.000 | |
ismdoor | 0.004 | 0.029 | 0.240 | 0.859 | 0.877 | 0.935 | |
pisloader-01 | 0.000 | 0.077 | 1.000 | 0.000 | 0.000 | 0.000 | |
pisloader-02 | 0.000 | 0.013 | 0.282 | 0.000 | 0.013 | 0.705 | |
udpos | 0.000 | 0.000 | 0.055 | 0.000 | 0.701 | 0.843 | |
FTP | credit card | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 |
1 kB | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | |
100 kB | 0.500 | 0.500 | 0.500 | 1.000 | 1.000 | 1.000 | |
10 MB | 0.083 | 0.125 | 0.125 | 0.125 | 0.458 | 1.000 | |
HTTP | credit card | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
1 kB | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
100 kB | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
10 MB | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
Drive | credit card | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
1 kB | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
100 kB | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
10 MB | 0.000 | 0.333 | 0.333 | 0.333 | 0.333 | 0.667 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Willems, D.; Kohls, K.; van der Kamp, B.; Vranken, H. Data Exfiltration Detection on Network Metadata with Autoencoders. Electronics 2023, 12, 2584. https://doi.org/10.3390/electronics12122584
Willems D, Kohls K, van der Kamp B, Vranken H. Data Exfiltration Detection on Network Metadata with Autoencoders. Electronics. 2023; 12(12):2584. https://doi.org/10.3390/electronics12122584
Chicago/Turabian StyleWillems, Daan, Katharina Kohls, Bob van der Kamp, and Harald Vranken. 2023. "Data Exfiltration Detection on Network Metadata with Autoencoders" Electronics 12, no. 12: 2584. https://doi.org/10.3390/electronics12122584
APA StyleWillems, D., Kohls, K., van der Kamp, B., & Vranken, H. (2023). Data Exfiltration Detection on Network Metadata with Autoencoders. Electronics, 12(12), 2584. https://doi.org/10.3390/electronics12122584