Next Article in Journal
A Modified PML Acoustic Wave Equation
Next Article in Special Issue
Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language
Previous Article in Journal
Response and Energy Absorption of Concrete Honeycombs Subjected to Dynamic In-Plane Compression: A Numerical Approach
Previous Article in Special Issue
Package Network Model: A Way to Capture Holistic Structural Features of Open-Source Operating Systems
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle
Symmetry 2019, 11(2), 176; https://doi.org/10.3390/sym11020176

Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis

1
School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
2
School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210094, China
*
Author to whom correspondence should be addressed.
Received: 13 December 2018 / Revised: 27 January 2019 / Accepted: 29 January 2019 / Published: 2 February 2019
(This article belongs to the Special Issue Information Technology and Its Applications 2018)
Full-Text   |   PDF [1314 KB, uploaded 20 February 2019]   |  

Abstract

In highly sophisticated network attacks, command-and-control (C&C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Distinguishing the domains generated by DGAs from the legitimate ones is critical for finding out the existence of malware or further locating the hidden attackers. The word-based DGAs disclosed in recent network attack events have shown significantly stronger stealthiness when compared with traditional character-based DGAs. In word-based DGAs, two or more words are randomly chosen from one or more specific dictionaries to form a dynamic domain, these regularly generated domains aim to mimic the characteristics of a legitimate domain. Existing DGA detection schemes, including the state-of-the-art one based on deep learning, still cannot find out these domains accurately while maintaining an acceptable false alarm rate. In this study, we exploit the inter-word and inter-domain correlations using semantic analysis approaches, word embedding and the part-of-speech are taken into consideration. Next, we propose a detection framework for word-based DGAs by incorporating the frequency distribution of the words and that of part-of-speech into the design of the feature set. Using an ensemble classifier constructed from Naive Bayes, Extra-Trees, and Logistic Regression, we benchmark the proposed scheme with malicious and legitimate domain samples extracted from public datasets. The experimental results show that the proposed scheme can achieve significantly higher detection accuracy for word-based DGAs when compared with three state-of-the-art DGA detection schemes. View Full-Text
Keywords: network attack; domain generation algorithm; DGA detection; semantic analysis; ensemble classifier network attack; domain generation algorithm; DGA detection; semantic analysis; ensemble classifier
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Yang, L.; Zhai, J.; Liu, W.; Ji, X.; Bai, H.; Liu, G.; Dai, Y. Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis. Symmetry 2019, 11, 176.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top