Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis
AbstractIn highly sophisticated network attacks, command-and-control (C&C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Distinguishing the domains generated by DGAs from the legitimate ones is critical for finding out the existence of malware or further locating the hidden attackers. The word-based DGAs disclosed in recent network attack events have shown significantly stronger stealthiness when compared with traditional character-based DGAs. In word-based DGAs, two or more words are randomly chosen from one or more specific dictionaries to form a dynamic domain, these regularly generated domains aim to mimic the characteristics of a legitimate domain. Existing DGA detection schemes, including the state-of-the-art one based on deep learning, still cannot find out these domains accurately while maintaining an acceptable false alarm rate. In this study, we exploit the inter-word and inter-domain correlations using semantic analysis approaches, word embedding and the part-of-speech are taken into consideration. Next, we propose a detection framework for word-based DGAs by incorporating the frequency distribution of the words and that of part-of-speech into the design of the feature set. Using an ensemble classifier constructed from Naive Bayes, Extra-Trees, and Logistic Regression, we benchmark the proposed scheme with malicious and legitimate domain samples extracted from public datasets. The experimental results show that the proposed scheme can achieve significantly higher detection accuracy for word-based DGAs when compared with three state-of-the-art DGA detection schemes. View Full-Text
Share & Cite This Article
Yang, L.; Zhai, J.; Liu, W.; Ji, X.; Bai, H.; Liu, G.; Dai, Y. Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis. Symmetry 2019, 11, 176.
Yang L, Zhai J, Liu W, Ji X, Bai H, Liu G, Dai Y. Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis. Symmetry. 2019; 11(2):176.Chicago/Turabian Style
Yang, Luhui; Zhai, Jiangtao; Liu, Weiwei; Ji, Xiaopeng; Bai, Huiwen; Liu, Guangjie; Dai, Yuewei. 2019. "Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis." Symmetry 11, no. 2: 176.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.