Insider Threat Detection Based on Deep Clustering of Multi-Source Behavioral Events
Abstract
:1. Introduction
- A new end-to-end insider threat detection method is proposed for automatically learning user behavior features from multi-source audit logs by using a deep neural network.
- A new deep clustering model for user behavior sequences is proposed to optimize the user behavior features for the clustering task and improve the detection of insider threats.
- Using the CERT benchmark datasets from Carnegie Mellon University, the proposed end-to-end insider threat detection method was evaluated in terms of frequently used metrics such as Recall, ROC curve, and area under the ROC curve (AUC).
2. Related Work
3. Proposed Method
3.1. Problem Statement
3.2. Deep Clustering Network
3.2.1. User Multi-Source Behavior Sequence Feature Representation
3.2.2. Deep Clustering of User Multi-Source Behavior Sequences
Algorithm 1 Algorithm for Deep Clustering of Multi-source User Behavior Sequences |
|
|
3.3. Anomaly Detection
4. Experiment
4.1. Datasets
- Time entities: extracted from the hour field of the time field in the user behavior records, and the values for time entities range from [0, 1, 2, …, 23], representing the 24-h clock.
- Host entities: extracted from the host number field in the user behavior records, with the host number serving as the identifier for the host entity.
- User entities: extracted from the user number field in the user behavior records, with the user number serving as the identifier for the user entity.
- Action entities: Extracted from the action field in the user behavior records, with specific actions serving as the identifier for the action entity. There are seven action entities: login (host), logout (host), connect (mobile device USB), disconnect (mobile device USB), browse (web), send (email), and access (file).
- Action magnitude: Derived by counting the number of repetitions for each action and discretizing the counts into different levels of action magnitude. The action magnitude entity reflects the current level of repetition for that action entity.
4.2. Experimental Setup
4.3. Evaluation Metrics
- (1)
- True positive (TP): the sub-activities of abnormal behavior are detected as abnormal;
- (2)
- True negative (TN): the sub-activities of normal behavior are detected as normal;
- (3)
- False positive (FP): the sub-activities of normal behavior are detected as abnormal, resulting in a false alarm;
- (4)
- False negative (FN): the sub-activities of abnormal behavior are detected as normal, resulting in a missed detection.
4.4. Results for Insider Threat Detection
4.5. Results for Different Parameters
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cappelli, D.M.; Moore, A.P.; Trzeciak, R.F. The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud); Addison-Wesley: Boston, MA, USA, 2012. [Google Scholar]
- Insider Threat Report [EB/OL]. 2023. Available online: https://www.cybersecurity-insiders.com/portfolio/2023-insider-threat-report-gurucul/ (accessed on 19 October 2023).
- Parveen, P.; Evans, J.; Thuraisingham, B.; Hamlen, K.W.; Khan, L. Insider threat detection using stream mining and graph mining. In Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, Boston, MA, USA, 9–11 October 2011; pp. 1102–1110. [Google Scholar]
- Morales, A.; Fierrez, J.; Ortega-Garcia, J. Towards predicting good users for biometric recognition based on keystroke dynamics. In Proceedings of the Computer Vision-ECCV 2014 Workshops, Zurich, Switzerland, 6–7 and 12 September 2014; Part II 13. Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 711–724. [Google Scholar]
- Hu, T.; Niu, W.; Zhang, X.; Liu, X.; Lu, J.; Liu, Y. An insider threat detection approach based on mouse dynamics and deep learning. Secur. Commun. Netw. 2019, 2019, 3898951. [Google Scholar] [CrossRef]
- Salem, M.B.; Stolfo, S.J. A comparison of one-class bag-of-words user behavior modeling techniques for masquerade detection. Secur. Commun. Netw. 2012, 5, 863–872. [Google Scholar] [CrossRef]
- Camiña, J.B.; Monroy, R.; Trejo, L.A.; Medina-Pérez, M.A. Temporal and spatial locality: An abstraction for masquerade detection. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2036–2051. [Google Scholar] [CrossRef]
- Senator, T.E.; Goldberg, H.G.; Memory, A.; Young, W.T.; Rees, B.; Pierce, R.; Huang, D.; Reardon, M.; Bader, D.A.; Chow, E.; et al. insider threats in a real corporate database of computer usage activity. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11 August 2013; pp. 1393–1401. [Google Scholar]
- Young, W.T.; Goldberg, H.G.; Memory, A.; Sartain, J.F.; Senator, T.E. Use of domain knowledge to detect insider threats in computer activities. In Proceedings of the 2013 IEEE Security and Privacy Workshops, San Francisco, CA, USA, 19–22 May 2013; pp. 60–67. [Google Scholar]
- Yen, T.F.; Oprea, A.; Onarlioglu, K.; Leetham, T.; Robertson, W.; Juels, A.; Kirda, E. Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In Proceedings of the 29th Annual Computer Security Applications Conference, New Orleans, LA, USA, 9 December 2013; pp. 199–208. [Google Scholar]
- Fox, I.; Ang, L.; Jaiswal, M.; Pop-Busui, R.; Wiens, J. Deep multi-output forecasting: Learning to accurately predict blood glucose trajectories. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 23 August 2018; pp. 1387–1395. [Google Scholar]
- Salem, M.B.; Stolfo, S.J. Detecting Masqueraders: A Comparison of One-Class Bag-of-Words User Behavior Modeling Techniques. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2010, 1, 3–13. [Google Scholar]
- Maxion, R.A.; Townsend, T.N. Masquerade detection using truncated command lines. In Proceedings of the International Conference on Dependable Systems and Networks, Washington, DC, USA, 23–26 June 2002; pp. 219–228. [Google Scholar]
- Ahmed, A.A.E.; Traore, I. A new biometric technology based on mouse dynamics. IEEE Trans. Dependable Secur. Comput. 2007, 4, 165–179. [Google Scholar] [CrossRef]
- Shen, C.; Cai, Z.; Guan, X.; Du, Y.; Maxion, R.A. User authentication through mouse dynamics. IEEE Trans. Inf. Forensics Secur. 2012, 8, 16–30. [Google Scholar] [CrossRef]
- Ahmed, A.A.; Traore, I. Biometric recognition based on free-text keystroke dynamics. IEEE Trans. Cybern. 2013, 44, 458–472. [Google Scholar] [CrossRef]
- Camiña, B.; Monroy, R.; Trejo, L.A.; Sánchez, E. Towards building a masquerade detection method based on user file system navigation. In Advances in Artificial Intelligence: Proceedings of the 10th Mexican International Conference on Artificial Intelligence, MICAI 2011, Puebla, Mexico, 26 November–4 December 2011; Part I 10; Springer: Berlin/Heidelberg, Germany, 2011; pp. 174–186. [Google Scholar]
- Eberle, W.; Graves, J.; Holder, L. Insider threat detection using a graph-based approach. J. Appl. Secur. Res. 2010, 6, 32–81. [Google Scholar] [CrossRef]
- Patil, A.; Liu, J.; Shen, J.; Brdiczka, O.; Gao, J.; Hanley, J. Modeling attrition in organizations from email communication. In Proceedings of the 2013 International Conference on Social Computing, Alexandria, VA, USA, 8–14 September 2013; pp. 331–338. [Google Scholar]
- Yang, Y.C. Web user behavioral profiling for user identification. Decis. Support Syst. 2010, 49, 261–271. [Google Scholar] [CrossRef]
- Young, W.T.; Memory, A.; Goldberg, H.G.; Senator, T.E. Detecting unknown insider threat scenarios. In Proceedings of the 2014 IEEE Security and Privacy Workshops, San Jose, CA, USA, 17–18 May 2014; pp. 277–288. [Google Scholar]
- Liu, F.; Wen, Y.; Zhang, D.; Jiang, X.; Xing, X.; Meng, D. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1777–1794. [Google Scholar]
- Cao, C.; Chen, Z.; Caverlee, J.; Tang, L.A.; Luo, C.; Li, Z. Behavior-based community detection: Application to host assessment in enterprise information networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1977–1985. [Google Scholar]
- Xie, J.; Girshick, R.; Farhadi, A. Unsupervised deep embedding for clustering analysis. In Proceedings of the International Conference on Machine Learning; PMLR: London, UK, 2016; pp. 478–487. [Google Scholar]
- Yang, J.; Parikh, D.; Batra, D. Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5147–5156. [Google Scholar]
- Ghasedi Dizaji, K.; Herandi, A.; Deng, C.; Cai, W.; Huang, H. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5736–5745. [Google Scholar]
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 132–149. [Google Scholar]
- Software Engineering Institute, Carnegie Mellon University. [n. d.]. Insider Threat Test Dataset. Available online: https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=508099 (accessed on 27 January 2021).
- Azaria, A.; Richardson, A.; Kraus, S.; Subrahmanian, V.S. Behavioral analysis of insider threat: A survey and bootstrapped prediction in imbalanced data. IEEE Trans. Comput. Soc. Syst. 2014, 1, 135–155. [Google Scholar] [CrossRef]
- Gavai, G.; Sricharan, K.; Gunning, D.; Rolleston, R.; Hanley, J.; Singhal, M. Detecting insider threat from enterprise social and online activity data. In Proceedings of the 7th ACM CCS International Workshop on Managing Insider Security Threats, Denver, CO, USA, 16 October 2015; pp. 13–20. [Google Scholar]
- Chattopadhyay, P.; Wang, L.; Tan, Y.P. Scenario-based insider threat detection from cyber activities. IEEE Trans. Comput. Soc. Syst. 2018, 5, 660–675. [Google Scholar] [CrossRef]
Study | Type of Data | Source of Data | Features | ML/Statistical Model | Remarks |
---|---|---|---|---|---|
[6,12,13] | Host-based audit data | UNIX commands execution audit data | Occurrence frequency of each subsequence of the Unix commands | One-class SVM, Naive Bayes | Host-based and network-based insider threat detection methods are limited in their ability to detect more complex insider attacks since they typically analyze only a single type of user behavior data to model the user profile. |
[14,15,16,17] | Host-based audit data | Keyboard or mouse dynamics audit data | Frequency of clicks or movements, different key-up and key-down times | One-class SVM, KNN, neural networks | |
[18,19] | Host-based audit data | File access audit data | File path distance, file access frequency | TreeBagger | |
[20,21] | Network-based audit data | Email audit data | Email graph structure | Naive Bayes, Decision Trees, Random Forests and Bagging | |
[22] | Network-based audit data | Web browsing audit data | Page access frequency, page view time | Cosine Similarity | |
[8,9,10,23] | Multi-source audit data | Email logs, proxy server logs | File access times, web browsing times | KDE, GMM, kNN, and HMM | These methods rely on artificial feature engineering, where domain experts manually extract statistical features from multi-source user behavior data based on their prior knowledge. |
[24] | Multi-source audit data | Email logs, proxy server logs | Feature learning by using graph embedding learning techniques | Graph embedding learning, Kmeans | The feature extraction is independent of the clustering method, resulting in suboptimal user behavior feature representations for the clustering algorithm which leads to reduced accuracy in insider threat detection. |
[25] | Multi-source audit data | Network-level event, process-level event | Feature learning by using embedding learning techniques | Embedding learning | This method analyzes static behavior and ignores the temporal relationships among user behaviors in the user behavior sequences. |
Method | Anomaly Type | |||
---|---|---|---|---|
Type 1 | Type 2 | Type 3 | Avg | |
BAIT | 51.29 | 54.60 | 49.04 | 51.64 |
Isolation Forest | 82.09 | 95.68 | 71.08 | 82.95 |
Random Forest | 89.58 | 85.94 | 96.35 | 90.62 |
Deep Autoencoder | 90.25 | 99.48 | 94.10 | 94.61 |
Proposed method | 98.01 | 99.84 | 87.50 | 95.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Sun, Q.; Zhou, C. Insider Threat Detection Based on Deep Clustering of Multi-Source Behavioral Events. Appl. Sci. 2023, 13, 13021. https://doi.org/10.3390/app132413021
Wang J, Sun Q, Zhou C. Insider Threat Detection Based on Deep Clustering of Multi-Source Behavioral Events. Applied Sciences. 2023; 13(24):13021. https://doi.org/10.3390/app132413021
Chicago/Turabian StyleWang, Jiarong, Qianran Sun, and Caiqiu Zhou. 2023. "Insider Threat Detection Based on Deep Clustering of Multi-Source Behavioral Events" Applied Sciences 13, no. 24: 13021. https://doi.org/10.3390/app132413021
APA StyleWang, J., Sun, Q., & Zhou, C. (2023). Insider Threat Detection Based on Deep Clustering of Multi-Source Behavioral Events. Applied Sciences, 13(24), 13021. https://doi.org/10.3390/app132413021