An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec

Li, Xinyu; Wang, Xiaoying; Yang, Guoqing; Zhang, Jinsha; Li, Chunhui; Cui, Fangfang; Gu, Ruize

doi:10.3390/fi17070319

Open AccessArticle

An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec

by

Xinyu Li

^1,2

,

Xiaoying Wang

^1,2,*,

Guoqing Yang

^1,2,

Jinsha Zhang

^1,2,

Chunhui Li

^1,2,

Fangfang Cui

^1,2

and

Ruize Gu

^1,2

¹

School of Information Engineering, Institute of Disaster Prevention, Langfang 065201, China

²

Langfang Key Laboratory of Network Emergency Protection and Network Security, Langfang 065201, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(7), 319; https://doi.org/10.3390/fi17070319

Submission received: 13 June 2025 / Revised: 15 July 2025 / Accepted: 17 July 2025 / Published: 21 July 2025

(This article belongs to the Section Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

The covert nature of DNS covert channels makes them a widely utilized method for data exfiltration by malicious attackers. In response to this challenge, the present study proposes a detection methodology for DNS covert channels that employs a Deep Boltzmann Machine with Enhanced Security (DBM-ENSec). This approach entails the creation of a dataset through the collection of malicious traffic associated with various DNS covert channel attacks. Time-dependent grouping features are excluded, and feature optimization is conducted on individual traffic data through feature selection and normalization to minimize redundancy, enhancing the differentiation and stability of the features. The result of this process is the extraction of 23-dimensional features for each DNS packet. The extracted features are converted to gray scale images to improve the interpretability of the model and then fed into an improved Deep Boltzmann Machine for further optimization. The optimized features are then processed by an ensemble of classifiers (including Random Forest, XGBoost, LightGBM, and CatBoost) for detection purposes. Experimental results show that the proposed method achieves 99.92% accuracy in detecting DNS covert channels, with a validation accuracy of up to 98.52% on publicly available datasets.

Keywords:

feature selection; integrated learning; DNS covert channel (DCC); network security; flow detection

1. Introduction

In the domain of network security, data exfiltration facilitated by DCCs represents a significant contemporary threat. The Domain Name System (DNS) is responsible for translating domain names into IP addresses, thereby allowing users to access the appropriate server by entering the domain name, which obviates the need to memorize complex IP addresses [1]. However, an increasing body of evidence suggests that malicious actors are exploiting the DNS to establish covert channels for data transmission. These covert channels can be employed to surreptitiously exfiltrate data by encoding information within subdomains, user attribute values in queries, or other elements of a DNS transaction. This technique entails embedding data within DNS query and response headers, thereby circumventing traditional security mechanisms [2].

A study conducted by the Neustar International Security Council (NISC) [3] indicated that 72% of organizations experienced a Domain Name System (DNS) attack in 2021. The most common types of DNS attacks included DNS hijacking (47%), DNS flooding, reflection, or amplification attacks—which subsequently evolved into Distributed Denial of Service (DDoS) attacks (46%)—DNS channel attacks (35%), and cache poisoning (33%). Furthermore, the IDC 2023 Global DNS Threat Report [4] reveals that 90% of organizations encountered one or more DNS attacks in 2023, with 32% of these organizations also suffering a ransomware attack. The average cost associated with such an attack was estimated at $1.1 million, with organizations experiencing an average of 7.5 DNS attacks throughout the year. A comparative analysis of 2023 and 2022 indicates a 4% increase in DNS channel attacks, while DNS-based malware (including ransomware) rose by 5%, and attacks involving DNS domain hijacking or registrar credentials also increased by 5%. These rising trends highlight the critical importance of DCC protection for both businesses and individuals, as it plays a vital role in mitigating cyberattacks and safeguarding networks and data.

The methodologies employed for the detection of DCC can be categorized into two primary groups: domain name-based detection methods and traffic grouping-based detection methods. Domain name-based approaches primarily focus on analyzing the characteristics of domain names within DNS queries, including factors such as domain name length, structural composition, and statistical attributes such as domain name entropy. However, the reliance on these fundamental features makes these methods susceptible to evasion techniques utilized by adversaries, potentially compromising their effectiveness.

Traffic grouping-based detection methods have proven effective in comprehensively capturing and analyzing the characteristics of anomalous traffic, including transmission patterns, response times, packet sizes, and temporal sequences. This improved detection accuracy is particularly beneficial in the context of complex and evolving covert channel techniques. Nevertheless, because this approach relies on the collection and analysis of traffic data over an extended period, there is a risk that sensitive data may have already been exfiltrated by the time an abnormal condition is identified, potentially leading to irreversible losses.

In order to address the limitations associated with traffic grouping-based detection methods and the singularity of existing datasets, the present study proposes the development of a comprehensive dataset that encompasses various DCC attacks. This dataset has been constructed by employing different DCC tools to execute attacks on a target machine, followed by the collection of the resulting malicious traffic. This methodology aims to mitigate the inadequacies of current datasets in accurately reflecting contemporary attack strategies, thereby enhancing the precision and applicability of the detection model. The subsequent phase involves the optimization of the traffic feature extraction method. This is accomplished by eliminating time-dependent traffic grouping behavioral features and instead concentrating on the extraction of packet features from individual DNS packets. This refinement simplifies the data processing pipeline and improves the model’s detection efficiency, thereby facilitating the prevention of data exfiltration prior to the detection of an attack. Finally, the paper presents a covert channel detection model based on a Deep Boltzmann Machine with an integrated classifier (DBM-ENSec), which significantly enhances DCC detection. This is achieved by combining a Deep Boltzmann Machine with various integrated classifiers. Specifically, 23 DNS traffic features were extracted and converted into 5 × 5 grayscale images, which were subsequently fed into a modified Deep Boltzmann Machine for feature optimization. The optimized features are then forwarded to an integrated classifier comprising XGBoost, Random Forest, LightGBM, and CatBoost for detection purposes. This model leverages deep learning techniques for feature optimization and improves detection accuracy through the utilization of an integrated classifier.

In summary, the main contributions of this paper are as follows:

(1) Dataset Construction: A new dataset is constructed by capturing multiple DCC traffic instances and covering a variety of DCC attacks.

(2) Feature Optimization: Time-dependent grouping features are excluded, and individual traffic features are optimized. Specifically, 23-dimensional features from a single DNS traffic message are extracted, significantly improving the accuracy and effectiveness of the features, thus providing more reliable inputs for the detection algorithm.

(3) DBM-ENSec Model: This paper proposes the DBM-ENSec detection model, which employs an improved Deep Boltzmann Machine for feature optimization and extraction. This approach aims to obtain high-dimensional features with stronger discriminative properties, enhancing feature representation capabilities. By integrating classifiers (XGBoost, Random Forest, LightGBM, and CatBoost), the generalization ability and accuracy of the detection model are effectively improved. The model not only enhances the detection rate of covert channel attacks but also offers new insights and methods for future research in this field.

This paper is organized as follows: Section 2 reviews the current research on DCC detection. Section 3 presents a detailed discussion of the DCC detection method based on DBM-ENSec, including a comprehensive explanation of the composition and interrelationships of each module. Section 4 describes the experimental setup and presents the results of the experimental evaluation. Finally, Section 5 summarizes the conclusions and discusses the limitations of the current research, as well as future work.

2. Related Work

To effectively address the intricate and covert security threats presented by DCCs, researchers have conducted extensive investigations into innovative detection methodologies aimed at enhancing detection accuracy. These studies extend beyond the mere optimization and extraction of traditional features; they also prioritize the creation of detection frameworks that integrate deep learning with feature engineering to discern covert data transmission patterns within DCCs. The utilization of multi-dimensional analyses of traffic behavior and domain characteristics is crucial in the advancement of more robust and efficient detection systems.

2.1. Detection Methods Based on Domain Name

In 2019, Zhang et al. [5] applied a machine learning algorithm to identify DCCs by analyzing the requested domain names in both legitimate and covert channel DNS samples. They mined relevant information from the domain names and combined four types of attributes: domain name length, character occupancy, randomness features, and semantic features. However, relying solely on these four types of attributes makes the method vulnerable to circumvention by attackers. Therefore, a broader range of features is required to enhance detection capability. In 2020, Wang et al. [6] proposed using DNS server logs as a data source to extract multidimensional statistical features, including entropy of second-level domains, the number of subdomains, and cache hit rates. The logs were then quantized into feature vectors, and the random forest algorithm was employed for model training, with ten-fold cross-validation used to adjust and optimize model parameters, thereby improving detection accuracy. Although the method introduces two additional features, cache hit and resource type, the detection still primarily relies on domain name features, which cannot cover all characteristics of DNS traffic. To address this issue, Chen et al. [7] proposed a DCC detection method based on an LSTM model in 2021. This method does not depend on feature engineering but instead uses the fully qualified domain name (FQDN) in DNS packets as input, achieving good detection results. While this approach effectively detects domain name differences, it still fails to capture the complete traffic characteristics, leading to a high false alarm rate in practical applications. In 2023, Shen [8] adopted a graph structure-based detection method to further improve detection accuracy. He introduced a DCC detection model based on graph attention networks (GAT), utilizing the semantic representation and inter-domain name relevance to detect covert channels. This method transforms the domain-level detection task into an undirected graph node classification problem. Despite the use of artificial intelligence methods to automatically extract features from domain names, as seen in Chen et al.’s approach, the model remains susceptible to circumvention by attackers. In the same year, Sun et al. [9] proposed a Transformer-based DCC detection method. This approach uses the Transformer architecture to capture global dependencies between inputs, significantly improving training speed and detection accuracy. However, the model’s detection accuracy may still need refinement to handle more complex and advanced covert channels. To further enhance detection adaptability, Bykov et al. [10] in 2024 extended DCC detection by proposing a machine learning-based approach utilizing the LightGBM algorithm and selected features. Although this method achieves efficient detection, it primarily focuses on DNS request and response statistical features, which may not encompass a broader range of network behavior characteristics.

The principal detection mechanism utilized by domain name-based methods relies on the examination of essential features of domain names. However, this approach fails to fully exploit the complex behavioral patterns and data characteristics inherent in DNS traffic. As a result, these methods can be easily evaded by attackers who manipulate, obscure, or simplify domain names, leading to a diminished effectiveness in detection.

2.2. Detection Methods Based on Traffic Grouping

In 2013, Zhang et al. [11] proposed a detection method based on DCC that first studied the characteristics of DCC, extracted 12 data grouping features, and used machine learning classifiers to distinguish the statistical properties of these features for identifying DCC. Although the model is capable of detecting new, untrained covert channels, its effectiveness may be limited, especially for more covert and complex channels. Consequently, expanding the feature set and optimizing the detection model became essential. In 2020, Zhang et al. [12] introduced a more advanced detection scheme, extracting 48 feature vectors from four aspects and constructing them as 8 × 8 grayscale images, which were input into a modified convolutional neural network for detection. However, since most of the features used are traffic grouping features, this method may not support real-time detection in practical scenarios. In the same year, Saeli et al. [13] proposed a detection method based on anomaly metrics, analyzing DNS network data from a monitoring system by passively extracting data and using specific anomaly metrics to describe the problem. Due to the real-world data used, the dataset might be biased and may not include all types of DCC or attack variants. To improve the generalizability of detection, Yang et al. [14] in 2021 proposed a new detection method using a stacked model that combines K-nearest neighbor, linear support vector machine, and random forest. This model extracts DNS packet features through deep detection and provides high accuracy in detecting DCC. Although the study mentions deployment in real networks, it does not provide a detailed analysis of the system’s real-time performance, especially in high-traffic environments. In 2022, Chen et al. [15] proposed an LSTM-based model combining request and response traffic characteristics. The model constructs a 13-dimensional feature vector to differentiate between DCC traffic and normal traffic by extracting timestamps, TTLs, and response packet lengths. To enhance the detection of encrypted traffic, Wang et al. [16] introduced FF-MR, a DoH-based DCC detection method that employs feature fusion using multi-head attention and residual neural networks to combine session statistical features and multi-channel session byte sequence features. Key features are filtered using multi-head attention, and a multi-layer perceptron is used for detection. However, this method is mainly based on the CIRA-CIC-DoHBrw-2020 dataset, which may not cover all potential network conditions and attack scenarios, limiting the model’s generalization across different datasets. In 2022, Diao et al. [17] proposed a Tactics, Techniques, and Procedures (TTPs)-based approach to generate malicious traffic covering a wide sample space. They also proposed a sticky mechanism to convert certain decisions into dynamic human–computer interactions to validate suspicious hosts identified by the AI model. However, this mechanism requires user involvement, which could affect user experience, especially when rapid responses are required. In 2023, Zhang et al. [18] proposed an anomaly detection-based DNS data exfiltration detection algorithm that accurately identifies DCC and small-scale malware data exfiltration. However, detection accuracy may decrease when the transmission rate of malware drops below a certain threshold.

The traffic grouping-based DCC detection method effectively analyzes the distinctions between DCC and normal traffic, thereby facilitating a more precise identification of covert channel traffic. However, this approach requires the aggregation of traffic data over a designated time period for grouping purposes, which raises the concern that the data may have already been compromised by the time any anomalous traffic is detected. This presents a significant challenge for the method.

This paper presents a detection model known as DBM-ENSec, which integrates an improved Deep Boltzmann Machine (DBM) with an ensemble learning strategy. The model utilizes a singular feature derived from DNS traffic and amalgamates the advantages of DBM with an ensemble classifier to improve the detection accuracy of DCC attacks. This integration facilitates the model’s ability to more effectively identify complex attack patterns. Initially, features with 23 dimensions are extracted from DNS traffic and subsequently transformed into a 5 × 5 grayscale image. These features, which encompass message length, Time to Live (TTL) value, and subdomain entropy, provide a comprehensive representation of the characteristics inherent in DNS traffic. The processed grayscale image is then input into the enhanced DBM, which optimizes and extracts high-dimensional, more discriminative features to augment the accuracy of subsequent classification tasks. The optimized features are subsequently fed into the ENSec module, which incorporates XGBoost, Random Forest, LightGBM, and CatBoost. This integration aims to leverage the strengths of each model to enhance detection performance. Specifically, XGBoost is proficient in managing large-scale data, Random Forest effectively mitigates overfitting, LightGBM accelerates training through a histogram-based algorithm suitable for large datasets, and CatBoost improves the model’s robustness by uniquely addressing categorical features. The integrated approach has been shown to significantly enhance the generalization capability and accuracy of the detection model. Furthermore, it has been demonstrated to reduce the false positive rate and improve adaptability to novel covert channel attacks. The model effectively mitigates the risk of data exfiltration prior to the detection of abnormal traffic by employing deep feature extraction with the enhanced DBM and integrating the features through ensemble classifiers. The model achieves an accuracy rate of 99.92%, surpassing traditional machine learning and deep learning methodologies based on traffic grouping.

3. Methodology

To approach the issue outlined in Section 1, this paper establishes a DCC attack environment, from which individual traffic features are extracted and subsequently transformed into grayscale images. These images serve as inputs to the detection model, thereby improving both its performance and interpretability. The input grayscale images are processed using an improved Deep Boltzmann Machine to extract deep features. These features are then synthesized and trained through an integrated classifier, which enhances the accuracy of DCC detection. The overall process of this study is shown in Figure 1. The model consists of two primary components: data collection and processing, and DNS covert channel detection.

3.1. Data Processing Module

The data processing module consists of four components, with the primary objective of establishing a DNS attack environment. It collects both DCC traffic, generated during the attack, and standard DNS traffic. These data types are subsequently integrated and processed for use in the experiment. The following section presents a visual representation of this process, as illustrated in Figure 2.

(1) Deploying the Attack Environment: The attack environment comprises the target client, a local DNS server, the “.top” root domain server, and a covert channel server. Nine DNS covert channel tools (iodine, dns2tcp, cobaltstrike, dnsexfiltrator, tcp-over-dns, dnsshell, dnscat, dnslivery, and ozymandns) [19] are deployed and executed on the local DNS server to generate DCC traffic. These tools facilitate the establishment of covert channels with the target and enable the exfiltration of sensitive data. This configuration allows for the generation of diverse covert channel traffic, thereby simulating a variety of attack methods that an adversary might employ in a real-world context. Furthermore, Wireshark4.2.3 software is installed on the target device to capture the traffic generated during the attack.

(2) Capturing Mixed Traffic: This study successfully captured a total of 100,000 samples of nine types of DCC attack traffic in the aforementioned attack environments and categorized them into DCC traffic sets. These samples represent a wide range of attack methods, ensuring diversity in the data. In addition, a further 100,000 benign DNS flows were captured from daily network activity over the last seven days and filtered as benign for training purposes. The resulting traffic was combined to create a comprehensive, balanced dataset, as shown in Table 1.

(3) Data Processing: This module firstly cleans and filters the captured mixed traffic set to finally extract the valid DNS traffic and then performs feature extraction on the cleansed DNS traffic to finally form the dataset used for training.

In terms of feature extraction, the present study examined the characteristics of Domain Name System (DNS) traffic, thereby identifying eight main feature categories, namely string complexity, character structure, character combinations, vocabulary and labels, message length, resource record type, request-response pattern, and resource record content. The development of these features was undertaken to enhance the distinction between benign DNS traffic and DNS covert channel traffic.

String Complexity Features:

It is a well-established fact that standard DNS requests typically exhibit reasonable subdomain lengths. However, it has been observed that covert channels often utilize subdomains with excessively long or short lengths for the purpose of data encryption [20]. Information entropy, which serves as a quantitative measure of randomness and uncertainty within data strings, is a valuable metric in this context. In the realm of DCC, encoding techniques are frequently employed to obfuscate the transmitted data, resulting in increased randomness of letter combinations and higher entropy values for subdomains. Consequently, entropy can effectively differentiate between covert channel traffic and standard DNS traffic. Generally, normal subdomains consist of meaningful words, whereas encoded domain names are less likely to form coherent words. Therefore, the presence of excessively long words in a domain name may indicate benign DNS traffic. The calculation of subdomain entropy is expressed as follows (Equation (1)):

H (X) = - \sum_{i = 1}^{n} P (x_{i}) \log_{2} P (x_{i}),

(1)

where H(X) denotes the entropy of the random variable X, denotes the characters in the subdomain string, P(x) denotes the probability of occurrence of x, and n denotes the number of possible character types in the subdomain string.

Character-Building Features:

Covert channels may employ unconventional character combinations to embed data [21]. For example, typical subdomains generally exhibit a low frequency of digits, while an increased frequency of digits may suggest data transmission. Furthermore, the structure of standard subdomains is characterized by a predominance of lowercase letters, whereas a higher occurrence of uppercase letters may indicate the presence of a covert channel. The presence of a significant percentage of consecutive consonants within a word is also indicative of a randomly generated character sequence. Moreover, frequent alternation between letters and numbers is a distinctive characteristic of covert channels, and such patterns can serve as reliable indicators of covert channel traffic.

Character Combination Features:

It is noteworthy that subdomains generated through covert channels may exhibit unusual character combinations that diverge from the patterns typically observed in standard traffic. The Jaccard index serves as a metric for quantifying the similarity between two sets. In normal subdomains, adjacent double characters generally adhere to specific linguistic patterns, whereas double-character combinations in covert channels tend to be more random. Additionally, the combination of three characters can also reflect the distribution pattern of characters, thereby aiding in the detection of anomalous behavior. This paper focuses exclusively on calculating the Jaccard index for a single domain name, and the simplified formula is presented as follows:

J a c c a r d I n d e x = \frac{|N|}{L},

(2)

where |N| denotes the set of n-grams of the string and L represents the length of the string.

Lexical and tagging features:

The examination of covert channels has demonstrated the creation of subdomains that consist of a significant quantity of nonsensical words or tags. The primary objective of these channels is to detect anomalies that diverge from typical traffic patterns. In contrast, conventional subdomains typically include one or more semantically meaningful words, whereas covert channels may comprise arbitrary or meaningless sequences of characters, leading to a reduced word count. Additionally, standard DNS requests are characterized by a reasonable number of labels (subdomain components); an excessive number of labels may suggest that data is being partitioned and transmitted through a covert channel.

Message Length Features:

The utilization of covert channels often results in the creation of DNS messages that diverge from standard lengths, either by being excessively long or short, in an effort to obscure data or evade detection. For example, standard DNS messages are constrained by specific load length limitations, whereas covert channels may utilize larger UDP payloads to enable the transmission of covert data.

Resource Record Type Features:

The existence of covert channels can facilitate the embedding of data within a resource record, leading to anomalies in both the size and entropy of the record. For example, the size of resource records in standard DNS responses is typically small; however, unusually large sizes may indicate the occurrence of data injection. Additionally, a high level of entropy in the content of the resource record implies a random distribution of characters, which may serve as an indicator of data injection. Furthermore, irregularities in the types of resource records (e.g., A, AAAA, CNAME) may also suggest the transfer of data.

Request and Response Features:

The utilization of covert channels has been shown to produce responses characterized by multiple IP addresses or anomalous Time to Live (TTL) values, thereby augmenting the element of stealth. Response codes may exhibit atypical behavior, such as a high frequency of error codes, which could signify abnormal activity. Additionally, the employment of specific request types (e.g., TXT logging requests) by covert channels for data transmission can provide critical insights for detection purposes, as the distribution of request types can serve as a significant indicator of such activities.

Resource Record Content Features:

The utilization of covert channels has the potential to increase the entropy of resource record content, thereby enabling the injection of significant volumes of concealed data. These anomalies can be effectively employed to detect steganographic information. Moreover, the incorporation of additional data into UDP messages through covert channels aids in the identification of anomalous message structures.

The selected features capture the fundamental characteristics of DNS traffic and enable effective discrimination between benign and malicious behaviors. In this study, a 23-dimensional feature set was systematically constructed through comparative analysis and iterative refinement, with a particular emphasis on the detection of DCC traffic. To ensure the uniqueness and relevance of each feature, Pearson correlation analysis was performed to identify potential redundancy, as illustrated in Figure 3. Feature pairs exhibiting high correlation (|r| > 0.85) were examined, and only the most informative feature within each correlated group was retained. This process resulted in a compact, non-redundant, and informative feature set. The effectiveness of the selected features has been empirically validated, and the full list is presented in Table 2.

(4) Generating Grayscale Image: To mitigate the model’s dependence on features with differing numerical scales, all selected feature values are first normalized and subsequently linearly scaled to integer values ranging from 0 to 255. Each DNS traffic instance is thereby represented as a 23-dimensional row vector, capturing key lexical, statistical, and protocol-level attributes of the DNS request.

To enhance both the interpretability and spatial learning capacity of the model, especially in architectures such as convolutional neural networks (CNNs) and Deep Boltzmann Machines (DBMs), this feature vector is transformed into a fixed-size grayscale image. The dimensions of the image are determined based on the total number of features; in this case, a 5 × 5 image is constructed to represent each instance. Since the number of features (23) is slightly less than the number of image pixels (25), two vacant positions are filled using zero-padding, maintaining shape compatibility while introducing minimal noise. The 23 features are placed into the 5 × 5 matrix following a row-major (left-to-right, top-to-bottom) order based on the predefined feature sequence listed in Table 2. This fixed and reproducible layout ensures that every image maintains structural consistency, enabling the model to learn position-aware feature representations.

Additionally, to understand the contribution of each feature to the classification output, a SHAP (SHapley Additive exPlanations) analysis was conducted (Figure 4). An example of the generated grayscale image, derived from a representative DNS traffic instance, is shown in Figure 5.

As shown in Figure 4, the top features influencing the prediction include Jaccard_Index_Bi, Subdomain_Uppercase_Ratio, and Additional_Data_in_UDP, each contributing a mean SHAP value greater than 0.07. These features are consistent with domain knowledge in DNS covert channel detection, as they reflect payload similarity, encoding behavior, and protocol anomalies. Importantly, the SHAP values provide an additive explanation model, meaning the final prediction can be decomposed as a sum of individual feature contributions. This ensures that highly weighted features directly push the model towards classifying the sample as DCC or not. For example, a high Jaccard_Index_Bi indicates frequent similarity between consecutive queries, which is typical in tunneling behavior and thus increases the DCC probability. Similarly, a high Subdomain_Uppercase_Ratio or Digit_Ratio often points to base32 or base64 encoding schemes used in covert channels.

3.2. DNS Covert Channel Detection Module

This paper presents a hybrid approach within the DCC detection module, which integrates an improved Deep Boltzmann Machine with a composite classifier to develop the DBM-ENSec model. The effectiveness of this model in identifying DCCs is demonstrated in Figure 6.

As illustrated in the figure above, the proposed model adopts a multi-stage architecture that combines unsupervised pre-training, spatial feature modeling, and ensemble-based decision making. First, the normalized DNS feature vector is mapped into a 5 × 5 grayscale image. This image is processed by a Deep Boltzmann Machine (DBM), which serves as an unsupervised feature abstraction module. The DBM captures high-level latent structures inherent in DNS traffic by learning the joint probability distribution of feature pixels, thus initializing the network in a way that reflects intrinsic data characteristics. The high-level representations from the DBM are fed into a convolutional neural network (CNN) that refines spatial-local patterns through a convolutional layer. The resulting feature maps are then passed through a self-attention mechanism, which identifies critical long-range dependencies across spatial dimensions, enhancing the model’s ability to detect sophisticated encoding behaviors. The resulting one-dimensional feature vector is then passed to an ensemble decision module, composed of four independent classifiers: XGBoost, Random Forest, LightGBM, and CatBoost. Each model is trained separately using the extracted vector representations. Their individual outputs are combined using a customized weighted ensemble strategy, in which prediction scores are aggregated based on each model’s F1-score performance on the validation set. This ensemble approach improves robustness and reduces model variance without relying on any single algorithm. This end-to-end architecture ensures that both low-level traffic patterns and high-order semantic features are captured. The combined use of unsupervised representation learning (DBM), spatial modeling (CNN + attention), and ensemble decision strategies offers a strong defense against overfitting while also improving generalization in detecting DNS covert channels (DCC). A comprehensive overview of the techniques employed in this model is provided below.

(1) Deep Boltzmann Machine (DBM):

In 2009, Salakhutdinov and Hinton introduced the concept of the Deep Boltzmann Machine and proposed a learning algorithm that utilizes variational approximation and Markov chains to estimate the expected value of a model [22].

In this paper, we utilize the layer-by-layer pre-training capabilities of the Deep Boltzmann Machine to identify underlying patterns within the input data. These patterns include statistical distributions of characters, domain name lengths, and hierarchical structures, as well as critical information such as request types, response codes, and complex nonlinear relationships between domain names and query behaviors. The application of high-level feature modeling allows DBMs to optimize the input data by eliminating irrelevant or redundant information, including meaningless request fields and repetitive content. Simultaneously, this approach enhances the expressiveness of statistical attributes related to domain names (e.g., length and character distribution) and message characteristics (e.g., packet size and Time to Live (TTL) values). The result of this process is an initial feature representation that effectively highlights anomalous query behavior patterns, such as frequent requests for specific record types, anomalies in TTL values, and irregular distributions of error codes. Moreover, DBMs are capable of detecting potential deviations or traces of artifacts within the content of requests and responses. During the deeper mining process, DBMs extract covert communication patterns, including domain names with atypical lengths or malicious query features in specific formats, which facilitate the transmission of additional data through DNS fields. These deep features provide accurate and reliable inputs for subsequent detection models. The structural configuration of the DBM model is illustrated in Figure 7.

(2) Residual networks:

To solve the problem of layer disappearance and layer explosion, this paper proposes the integration of a residual network (ResNet) within the framework of the Deep Boltzmann Machine (DBM) to improve the stability of feature transfer. In a conventional feedforward network, the stacked layers transform the input x into F(x), resulting in an overall network output defined as F(x) = H(x). When employing a constant mapping function f(x) = x where inputs and outputs are equal, it becomes challenging to directly fit such a constant mapping function within a layer. An alternative approach strategy involves a training network that enables us to fit an approximate f(x) = 0. The output is expressed as H(x) = F(x) + x, allowing the network to focus on learning the residual F(x) scenarios, which are required to perform constant mapping. The stacked layers are configured to learn F(x) = 0. The residual network is depicted in Figure 8.

(3) One-dimensional convolutional neural network:

The present study introduces a two-layer one-dimensional convolutional neural network (1D-CNN) designed to extract finer-grained features, which will be implemented following the Deep Boltzmann Machine. This network utilizes mechanisms similar to those of the human brain in processing image data, effectively capturing local dependencies and significant patterns through convolutional operations. Convolutional neural networks (CNNs) have become fundamental in the field of computer vision due to their demonstrated effectiveness in feature extraction and pattern recognition. In this research, a CNN is employed to conduct a comprehensive analysis of DNS traffic features, thereby enhancing the accuracy and richness of feature representations and providing robust support for subsequent classification tasks. The formulation of the two-layer one-dimensional convolutional neural network is presented in Equation (3).

x_{k}^{l + 1} = f (\sum_{j} W_{j k}^{l + 1} * f (\sum_{i} W_{i j}^{l} * x_{i}^{l - 1} + b_{j}^{l}) + b_{k}^{l + 1}),

(3)

(4) Self-attention mechanism:

To address the limitations of traditional convolutional networks in modeling long-distance dependencies and to enhance the detection accuracy and adaptability of the model, this paper introduces the self-attention mechanism. Also referred to as the internal attention mechanism, the self-attention mechanism was proposed by Vaswani et al. [23]. in 2017. This technique enables the model to dynamically allocate attention to various elements across the sequence while processing a specific element. In this study, the self-attention mechanism is integrated into a one-dimensional convolutional neural network (1D-CNN), allowing the model to capture relevant information from other positions in the sequence during the processing of a particular element. This approach not only investigates the shortcomings of traditional convolutional networks in managing long-range dependencies but also significantly enhances the model’s capacity to focus on critical information. As a result, the overall feature extraction process is improved, leading to a more precise and comprehensive feature representation for subsequent classification and detection tasks.

The self-attention mechanism operates by assessing the correlation between each element in the input sequence and all other elements within that sequence. The process comprises the following key steps:

1. Input Vector Transformation: Each element of the input sequence is transformed through a linear transformation into three distinct vector spaces: Query, Key, and Value.

2. Calculate Attention Score: The dot product between the query vectors and the key vectors is computed, subsequently scaled, and normalized, typically employing the softmax function, to derive the attention score. The precise formula is presented in Equation (4).

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(4)

where d_k represents the dimension of the key vector, which is used to scale the result of the dot product.

3. Weighted Summing: The attention scores are employed to weight and aggregate the value vectors, resulting in the creation of a new representation.

The computational process of the self-attention mechanism is illustrated in Figure 9.

(5) Flatten Layer:

The Flatten layer is specifically designed to transform a multi-dimensional feature map into a one-dimensional vector, which is commonly employed in the final stage of a convolutional neural network. The feature maps generated by convolutional or other feature extraction layers are often multidimensional; however, classifiers require one-dimensional vectors as input. The Flatten layer facilitates this transformation, thereby enabling subsequent classifiers to effectively process the extracted features.

(6) ENSec:

The classifier utilized in this study is an ensemble learning model developed by integrating four distinct machine learning algorithms through a combinatorial strategy. The algorithms employed include Random Forest [24], XGBoost (eXtreme Gradient Boosting) [25], LightGBM (Light Gradient Boosting Machine) [26], and CatBoost (Categorical Boosting). The operation of each sub-model can be summarized as follows: It initially predicts DNS traffic, yielding four prediction values (0 or 1). These values are subsequently utilized to compute a decision score, with the corresponding F1 scores of the models serving as weights. The decision score is then compared against a predetermined threshold to ascertain whether the traffic is classified as a DCC. This approach amalgamates the prediction outcomes of the models along with their performance weights to enhance the accuracy and reliability of the final decision. The formula for calculating the decision score is presented in Equation (5).

S c o r e = \frac{\sum_{i}^{4} {(F_{i} * d_{i})}^{2}}{\sum_{1}^{4} F_{i}^{2}},

(5)

where Fi represents the F1 score of each model and di denotes the prediction made by each model for the given DNS data, with a prediction of 1 indicating the presence of a DCC and 0 otherwise.

The specific DCC detection algorithm is outlined in Algorithm 1:

Algorithm 1: DCC Detection Algorithm

Input : T r a i n i n g d a t a X_{t r a i n}, t e s t d a t a X_{t e s t}

Output : B i n a r y d e c i s i o n r e s u l t s i n d i c a t i n g D C C p r e s e n c e (1) o r n o r m a l d a t a (0)

S t e p 1 : I n i t i a l i z e d e e p f e a t u r e e x t r a c t i o n m o d e l

A d d D B M (D u a l - B r a n c h M o d u l e) t o t h e m o d e l

A d d C o n v 1 D l a y e r (f i l t e r s = 64, k e r n e l s i z e = 3, a c t i v a t i o n = R e L U)

A d d C o n v 1 D l a y e r (f i l t e r s = 128, k e r n e l s i z e = 3, a c t i v a t i o n = R e L U)

A d d M a x P o o l i n g 1 D l a y e r

A d d A t t e n t i o n l a y e r

A d d A t t e n t i o n l a y e r

A d d F l a t t e n l a y e r

C o m p i l e m o d e l w i t h o p t i m i z e r = A d a m, l o s s = b i n a r y c r o s s e n t r o p y

S t e p 2 : S e t t r a i n i n g c a l l b a c k s

I n i t i a l i z e E a r l y S t o p p i n g (m o n i t o r = v a l i d a t i o n l o s s, p a t i e n c e = 5)

I n i t i a l i z e R e d u c e L R O n P l a t e a u (m o n i t o r = v a l i d a t i o n l o s s, f a c t o r = 0.2, p a t i e n c e = 3)

S t e p 3 : T r a i n t h e m o d e l o n X_{t r a i n} w i t h c a l l b a c k s

S t e p 4 : E x t r a c t f e a t u r e s f r o m t r a i n e d m o d e l

{f e a t u r e s}_{t r a i n} \leftarrow Extract (m o d e l, X_{t r a i n})

{f e a t u r e s}_{t e s t} \leftarrow Extract (m o d e l, X_{t e s t})

S t e p 5 : I n i t i a l i z e b a s e c l a s s i f i e r s : L G B M, X G B o o s t, R F, C a t B o o s t

f o r E a c h m o d e l \in {L G B M, X G B o o s t, R F, C a t B o o s t} d o

T r a i n m o d e l o n {features}_{t r a i n}

P r e d i c t l a b e l s o n {features}_{t e s t}

C o m p u t e w e i g h t e d F 1 s c o r e

S t o r e F 1 s c o r e s i n d i c t i o n a r y

S t e p 6 : C o n s t r u c t d e c i s i o n m a t r i x D f r o m m o d e l p r e d i c t i o n s

S t e p 7 : F o r e a c h s a m p l e i i n D, c o m p u t e d e c i s i o n s c o r e :

S c o r e_{i} = \frac{\sum_{j} F_{j} \cdot d_{i j}}{\sum_{j} F_{j}}

S t e p 8 : S e t d e c i s i o n t h r e s h o l d θ = 0.01

S t e p 9 : f o r E a c h i i n d e c i s i o n m a t r i x d o

I f S c o r e_{i} \geq θ t h e n

S e t r e s u l t y_{i} = 1 (D C C)

e l s e

S e t r e s u l t y_{i} = 0 (N o r m a l)

r e t u r n {y_{i}} f o r a l l i

4. Experiments and Analysis of Results

In order to ensure the reliability of the experimental results, this study utilizes a five-fold cross-validation method to partition the dataset. Specifically, the dataset is divided into five non-overlapping subsets. These subsets are subsequently combined to maintain the heterogeneity and representativeness of the original dataset.

Throughout the experimental process, the principle of cross-validation is rigorously applied. In each iteration, four subsets are utilized as training data to construct and train the model, while the remaining subset functions as independent test data to assess the model’s performance. This procedure is repeated five times, with each iteration employing a different subset as the test set, thereby ensuring that every sample is evaluated at least once. This methodology offers a thorough evaluation of the model’s generalization capability.

Following the completion of the experiments, the results from the five iterations are averaged to derive the final assessment of the model’s performance. This methodology aims to minimize random errors that may arise from data partitioning, thereby improving the stability and reliability of the evaluation through the averaging effect of multiple experimental trials.

In addition to cross-validation, multiple anti-overfitting strategies were employed during model training to further enhance generalization and mitigate overfitting risks. These include the use of Dropout layers (with a dropout rate of 0.4) in the convolutional modules to randomly deactivate neurons during training and L2 regularization to penalize excessive weight magnitudes. Moreover, early stopping was implemented based on validation loss monitoring, ensuring that training was halted once performance on unseen data plateaued or declined. These techniques jointly serve to prevent the model from memorizing training data and promote its ability to learn robust, transferable representations.

4.1. Experimental Setup

All experiments in this study were conducted on a workstation equipped with a 12th Gen Intel^® Core™ i7-12700 CPU (2.10 GHz), 16 GB of RAM, and an NVIDIA RTX 4070 GPU. The environment included both Windows 11 and CentOS 7 x86_64 platforms for traffic data collection. Model implementation was carried out using Python 3.8. Deep learning modules, including DBM, CNN, and attention mechanisms, were developed with TensorFlow/Keras, while ensemble classifiers such as XGBoost, LightGBM, Random Forest, and CatBoost were implemented using Scikit-learn and their respective native libraries.

4.2. Performance Evaluation

To assess the recognition performance of the DCC, four evaluation metrics have been selected: accuracy, recall, precision, and F1 score. The definitions of these metrics are outlined below.

Accuracy is defined as the ratio of the number of samples that are correctly predicted by the model to the total number of samples. This relationship is expressed mathematically in Equation (6).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N},

(6)

Recall is defined as the ratio of actual positive samples that are accurately identified as positive by the model. It serves as an indicator of the model’s ability to correctly classify instances belonging to the positive class. A high recall value signifies that the model is proficient in recognizing positive class samples. The corresponding formula is presented in Equation (7).

R e c a l l = \frac{T P}{T P + F N},

(7)

Precision is defined as the ratio of samples that the model predicts as positive and are, in fact, true positives. This metric provides insight into the accuracy of the positive predictions made by the model. A high precision value suggests that the model has a lower incidence of false positives in its predictions. The formula for calculating precision is presented in Equation (8).

P r e c i s i o n = \frac{T P}{T P + F P},

(8)

F1 score represents the harmonic mean of precision and recall, effectively integrating the performance of both metrics. This score is particularly advantageous in situations characterized by class imbalance, as it provides a balanced assessment of precision and recall. The F1 score reaches its maximum value when precision and recall are equal. The mathematical representation of the F1 score is provided in Equation (9).

F 1 = \frac{2 P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(9)

where TP (True Positive) refers to a correctly identified positive instance of the true class, TN (True Negative) refers to a correctly identified negative instance of the true class, FP (False Positive) refers to an incorrectly identified positive instance, and FN (False Negative) refers to an incorrectly identified negative instance.

4.3. Parameterization

This experiment seeks to improve the efficacy of the Deep Boltzmann Machine with Enhanced Security (DBM-ENSec) in feature extraction and optimization during the deep learning phase by fine-tuning the architecture of the Deep Boltzmann Machine. In particular, the following parameters were optimized:

The number of neurons in the hidden layers of the DBM.

The number and size of filters in the convolutional layers.

The comparison of two pooling strategies: maximum pooling and average pooling.

After conducting a comprehensive evaluation of multiple configurations, the following architectural framework has been selected:

DBM layer: 8 × 8

CNN_1 layer: 64 × 3

CNN_2 layer: 128 × 3

Pooling method: Maximum pooling

The detailed tuning process is shown in Table 3.

In the section on integrated classifiers, particular attention was given to tuning the parameters of each ensemble learning model to ensure optimal performance in leveraging the refined feature set. Following this, a systematic threshold adjustment was performed to determine the optimal decision boundary.

To objectively evaluate threshold selection, we conducted a series of comparative experiments across a range of threshold values from 0 to 1 in steps of 0.1. For each setting, we recorded the corresponding accuracy of the integrated model, as shown in Figure 10. The results demonstrate that the accuracy remains relatively stable between thresholds of 0.1 and 0.6 but degrades beyond 0.7. Notably, the threshold value of 0.1 yielded the highest accuracy, indicating a strong discriminative capability at this point.

4.4. Analysis of Results

This section offers a comprehensive analysis of the experimental results, emphasizing the key performance indicators of the model through various visualizations. The specific results, along with their corresponding graphical representations, are presented below.

To conduct a thorough assessment of the model’s stability and generalization capabilities, a five-fold cross-validation method is utilized. The fold plot serves as a visual representation of the variation in model accuracy throughout the five-fold cross-validation process. Notably, despite the use of distinct datasets in each fold, the model demonstrates a consistent level of accuracy, exhibiting only minor fluctuations. The findings of this analysis are illustrated in Figure 11.

To achieve a deeper understanding of the distribution of model accuracies, this paper incorporates a box plot. The plot illustrates both the mean and median accuracies for each model, while also emphasizing the dispersion and potential range of variation in the accuracies through the use of quartiles and outliers. The box plot is presented in Figure 12.

The box plot observations are presented below.

The DBM-ENSec model attains the highest median detection accuracy while exhibiting the narrowest range of accuracy distributions, thereby demonstrating enhanced stability and performance.

The median detection accuracy of the XGBoost model is marginally lower than that of the DBM-ENSec model. While the distribution range of the XGBoost model is narrower and its performance is more stable, it remains slightly inferior to the DBM-ENSec model.

The LightGBM model demonstrates reduced detection accuracy across a diverse array of distributions and elevated volatility, suggesting a lack of stability.

The RandomForest model demonstrates a marginally higher median detection accuracy compared to LightGBM, exhibiting a relatively narrower distribution and more consistent performance.

The CatBoost model exhibits the narrowest range of accuracy distribution; however, it demonstrates a lower median detection accuracy compared to both DBM-ENSec and XGBoost. While the results of the CatBoost model are stable, it shows a slight underperformance in terms of overall accuracy.

To provide a more comprehensive understanding of the model’s performance beyond overall accuracy, a detailed confusion matrix is presented in Figure 13. This matrix illustrates the classification results across all traffic categories, including both normal and various types of DCC attacks such as iodine, dnsshell, dnslivery, dns2tcp, and others.

The results demonstrate that the proposed DBM-ENSec model achieves a consistently high performance across classes, with minimal confusion between normal traffic and DCC instances. Specifically, the number of false positives (normal traffic misclassified as DCC) is close to zero, and false negatives (DCC traffic misclassified as normal) remain very limited across all attack types. Most misclassifications are observed between DCC subclasses with similar encoding behavior (e.g., minor confusion between tcp-over-dns and dns2tcp), which have inherently similar traffic patterns.

Finally, to provide a more granular evaluation, we plotted per-class ROC curves, as shown in Figure 14. The results demonstrate that each class, including OzymanDNS, achieves a high area under the curve (AUC), verifying that the model is not only accurate in aggregate but also discriminative across categories.

To demonstrate the deployability and user-friendliness of the proposed model in real-world scenarios, this study encapsulates the trained DBM-ENSec model in an intuitive front-end user interface (UI). As shown in Figure 15 and Figure 16, the UI supports uploading DNS traffic in .pcap and .pcapng formats, enabling automatic feature extraction and real-time covert channel detection. The UI is designed to be easy for analysts and network administrators to use, providing clear detection output and highlighting potential DCC events.

4.5. Comparative Analysis

To address potential issues with the robustness and generalization of the grayscale transformation scheme and the spatial ordering of DNS features, we designed two experiments. These experiments aim to evaluate the sensitivity of the model to the arrangement of features in a two-dimensional matrix and the impact of the choice of image dimension on classification performance.

4.5.1. Feature Arrangement Sensitivity Test

CNNs applied to non-natural image data may exhibit spatial sensitivity depending on the arrangement of features. To investigate this, we conducted an ablation experiment comparing three arrangement strategies in a fixed order (baseline):

Features are mapped to the 5 × 5 matrix using a consistent row-wise order, based on their sequence in Table 2.
Random Order: For each training epoch, the feature-to-pixel mapping is randomly permuted.
Correlation-Based Order: Features are ordered via hierarchical clustering based on Pearson correlation, grouping similar features spatially.

Each experiment used the same architecture and identical training settings. The results are summarized in Table 4.

These results suggest that the model is indeed sensitive to the spatial layout of input features. The performance degradation under random arrangements validates the necessity of a consistent feature mapping strategy. Moreover, correlation-based arrangements yield results comparable to the baseline.

4.5.2. Grayscale Image Dimension Comparison

The original grayscale image was constructed as a 5 × 5 matrix based on 23 DNS features. To explore whether image dimension affects model expressiveness and accuracy, we compared three configurations:

5 × 5 (23 features + 2 zero-pads)
5 × 5 (23 features + 2 the most important features-pads)
6 × 6 (23 features + 13 zero-pads)
8 × 8 (23 features + 41 zero-pads)

All configurations maintained the same feature order and padding strategy. The results are shown in Table 5.

Larger matrix configurations do not yield substantial performance improvements; instead, they introduce excessive padding, which may act as noise and reduce spatial coherence. These results reaffirm that the 5 × 5 configuration offers a compact and effective structure for representing the current 23-dimensional feature set. While both zero-padding and high-contribution feature replication achieve comparable performance, the latter may inadvertently increase the model’s reliance on a few dominant features, potentially impairing its generalization ability across more diverse or unseen DNS traffic patterns.

4.6. Comparative Analysis

4.6.1. Comparison with Machine Learning Models

The current study seeks to investigate the application of integrated learning methods in greater detail through a systematic comparison of their effectiveness in enhancing the overall accuracy of the model. This research specifically examines the disparity between the final accuracy of the overall model, achieved through the implementation of the integrated learning algorithm, and the individual accuracy of each base learner (i.e., meta-model) that comprises the algorithm. The pertinent comparison results are illustrated in Figure 17.

The findings of the comparative analysis indicate that the integrated learning approach attains a superior final accuracy compared to any individual meta-model, as it leverages the predictive strengths of multiple meta-models.

4.6.2. Comparison with the DBM-Conv Model

The current study aims to compare the performance differences between methods that employ integrated learning classifiers and those that do not. A series of experiments is conducted to evaluate the approaches separately using the same dataset, with an emphasis on analyzing the vbgnv njmbj, and jaccuracy of the models. The results of this comparison are presented in Figure 18.

4.6.3. Comparison with Domain-Based Detection Method

This section will conduct a comparative analysis of the DBM-ENSec model alongside four methods for detecting covert channels in DNS communication that are based on domain names. The methods under consideration include (1) an analysis of four categories of attribute features, namely the length of DNS request domain names, character proportions, randomness characteristics, and semantic features, coupled with the application of four machine learning algorithms to train classifiers for detection; (2) a detection approach utilizing a Long Short-Term Memory (LSTM) model that directly examines the Fully Qualified Domain Name (FQDN) present in DNS packets; (3) the DSR-GAT method, which constructs a domain name graph to illustrate the relationships between domain names and employs a one-dimensional convolutional neural network to automatically extract semantic features of the domain names, further enhancing these feature representations through a graph attention network; and (4) a detection method that identifies anomalous behavior by analyzing various statistical features of DNS traffic to train a LightGBM model. The results of this comparative analysis are presented in Table 6.

The comprehensive results of the comparative analysis demonstrate that the DBM-ENSec model surpasses current domain-based detection methods across all performance metrics. This further substantiates its detection capabilities and generalization performance in addressing complex attack scenarios.

4.6.4. Comparison with Traffic Grouping-Based Detection Method

In this section, the DBM-ENSec model is evaluated against four methods for detecting DCC that are based on traffic grouping. The four methods include (1) converting the feature data of DNS traffic into grayscale images, followed by the application of convolutional neural networks to learn and classify these images for the identification of DCC; (2) passively extracting DNS network data from network monitoring systems and employing machine learning algorithms to develop models that encapsulate the traffic behavior characteristics of network users for the detection of DCC; (3) detecting DCC through the integration of multi-head attention mechanisms and residual neural networks, which combine session statistical features with multi-channel session byte sequence features; and (4) identifying DCC by merging artificial intelligence models with a dynamic human–computer interaction mechanism, which generates diverse malicious traffic samples and incorporates user validation to enhance the accuracy and practicality of detection. The results of this comparative analysis are presented in Table 7.

The experimental results indicate that the DBM-ENSec model surpasses traffic packet-based detection methods, including Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Fast Fourier Transform-based Multiresolution (FF-MR), and Random Forest (RF), in the context of DCC detection. This model demonstrates considerable advantages in terms of accuracy, precision, recall, and F1 score.

4.6.5. Validation Using Publicly Available Datasets

To further evaluate the reliability of the dataset and the robustness of the proposed model, we employed the CIC-Bell-DNS-EXF-2021 dataset [27,28], published by the Canadian Cybersecurity Institute, to validate the trained model. This dataset encompasses a diverse range of DNS traffic records that reflect the complexity and variability of real-world network environments.

To ensure compatibility with the model input, we developed a dedicated preprocessing pipeline that extracted the same 23-dimensional feature set used during training. All features were normalized using the same min-max scaling procedure adopted in the training phase.

Through this consistent preprocessing, the external dataset was adapted to match the input format of the model. Validation experiments demonstrated that the DBM-ENSec model exhibited strong generalization and maintained high detection accuracy across varied scenarios. Notably, it achieved stable performance in identifying DCC attacks under real-world-like conditions. The results of this evaluation are summarized in Figure 19.

5. Summary

In summary, the present study simulates a covert channel attack utilizing the Domain Name System (DNS) on a target machine to facilitate data exfiltration. Concurrently, it collects the malicious traffic generated during the attack to construct a comprehensive dataset. The traffic features are subsequently optimized by eliminating time-based traffic grouping features and extracting 23-dimensional DNS message features. This optimization aims to mitigate the risk of data leakage that may occur prior to the detection of the covert channel attack. A detection model, designated as DBM-ENSec, is proposed, which integrates an improved Deep Boltzmann Machine (DBM) with a composite classifier. The 23-dimensional DNS message features are transformed into a 5 × 5 grayscale image, which is then processed by the improved DBM for feature optimization. The optimized features are subsequently input into an ensemble classifier that includes XGBoost, Random Forest, LightGBM, and CatBoost for detection purposes. This methodology has demonstrated a reduction in the false alarm rate and an enhancement in the model’s adaptability to novel covert channel attacks. The effectiveness of the DBM-ENSec model is substantiated by empirical evaluations, which indicate that the proposed detection method outperforms traditional machine learning and deep learning approaches across various key metrics. Notably, the model achieves an accuracy of 99.92% and 98.52%, as validated against public datasets.

Future work will focus on validating the model using real-world DNS traffic, including enterprise threat-hunting datasets and passive DNS logs, to assess its robustness and generalizability beyond synthetic scenarios. Additionally, we will conduct performance profiling under practical network conditions, evaluating inference latency, resource usage, and throughput. This will support the development of lightweight DBM-ENSec variants suitable for edge or high-throughput deployments. As DNS over HTTPS (DoH) and DNS over TLS (DoT) become more prevalent, detecting covert channels in encrypted traffic presents new challenges. Future research will explore privacy-preserving feature extraction methods tailored to encrypted protocols, leveraging metadata, timing, and side-channel patterns for effective detection.

Author Contributions

Conceptualization, X.L. and X.W.; methodology, X.L.; validation, C.L., F.C. and R.G.; data management, J.Z. and G.Y.; writing—original draft preparation, X.L.; writing—review and editing, X.L. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Innovation Program for Postgraduate students in IDP (ZY20250327), the Research on Data Security Encryption Tunnel and Sensitive Data Monitoring and Early Warning System funded by China Metallurgical Geology Bureau (CMGBKY202407), and the Research on the feasibility scheme and simulation environment design of unmanned emergency rescue in complex environment in northern Guangdong funded by Shaoguan Data Industry Research Institute.

Data Availability Statement

The datasets generated during or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest to report regarding the present study.

References

Pope, M.B.; Warkentin, M.; Mutchler, L.A.; Luo, X.R. The domain name system—Past, present, and future. Commun. Assoc. Inf. Syst. 2012, 30, 21. [Google Scholar] [CrossRef]
Lyu, M.; Gharakheili, H.H.; Sivaraman, V. A survey on DNS encryption: Current development, malware misuse, and inference techniques. ACM Comput. Surv. 2022, 55, 1–28. [Google Scholar] [CrossRef]
Coker, J. 72% of Organizations Experienced a DNS Attack in the Past Year. Available online: https://www.infosecurity-magazine.com/news/72-orgs-dns-attack-last-year/ (accessed on 18 March 2025).
Fouchereau, R. IDC 2023 Global DNS Threat Report. Available online: https://efficientip.com/resources/cyber-threat-intelligence-idc-2023-global-dns-threat-report/ (accessed on 18 March 2025).
Zhang, H.; Zheng, R.F.; Peng, H.; Liu, J.Y. Requested domain name-based DNS covert channel detection. Netinfo Secur. 2019, 19, 76–82. [Google Scholar]
Qi, W.; Kun, X.; Yan, M.; Qun, C. Detection of DNS tunnels based on log statistics feature. J. Zhejiang Univ. 2020, 54, 1753–1760. [Google Scholar] [CrossRef]
Chen, S.; Lang, B.; Liu, H.; Li, D.; Gao, C. DNS covert channel detection method using the LSTM model. Comput. Secur. 2021, 104, 102095. [Google Scholar] [CrossRef]
Shen, C.X.; Wang, Y.J.; Xiong, X.L. DNS covert channel detection based on graph attention network. Netinfo Secur. 2023, 23, 73–83. [Google Scholar] [CrossRef]
Sun, Q.; Liu, J.; Wang, J.; Yan, T.; Ana, D.; Qia, F. Transformer-Based Detection Method for DNS Covert Channel. Proceeds Sci. 2023, 19, 31. [Google Scholar] [CrossRef]
Bykov, N.; Chernyshov, Y. Detecting DNS Tunnels Using Machine Learning. In Proceedings of the 2024 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology, Yekaterinburg, Russian, 13 May 2024; pp. 92–94. [Google Scholar] [CrossRef]
Zhang, S.Y.; Zou, F.T.; Wang, L.H.; Chen, M. Detecting DNS-based covert channel on live traffic. J. China Inst. Commun. 2013, 34, 143–151. [Google Scholar]
Zhang, M.; Sun, H.L.; Yang, P. Identification of DNS covert channel based on improved convolutional neural network. J. Commun. 2020, 41, 169–179. [Google Scholar] [CrossRef]
Saeli, S.; Bisio, F.; Lombardo, P.; Massa, D. DNS covert channel detection via behavioral analysis: A machine learning approach. arXiv 2010, arXiv:2010.01582. [Google Scholar] [CrossRef]
Yang, P.; Wan, X.; Shi, G.; Qu, H.; Li, J.; Yang, L. Identification of DNS covert channel based on stacking method. Int. J. Comput. Commun. Eng. 2021, 10, 37–51. [Google Scholar] [CrossRef]
Chen, J. DNS covert channel detection method based on LSTM. Inf. Technol. Netw. Secur. 2022, 41, 60. [Google Scholar] [CrossRef]
Wang, Y.; Shen, C.; Hou, D.; Xiong, X.; Li, Y. FF-MR: A DoH-encrypted DNS covert channel detection method based on feature fusion. Appl. Sci. 2022, 12, 12644. [Google Scholar] [CrossRef]
Diao, J.; Fang, B.; Cui, X.; Wang, Z.; Wang, T.; Song, S. From Passive to Active: Near-optimal DNS-based Data Exfiltration Defense Method Based on Sticky Mechanism. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Wuhan, China, 9 December 2022; pp. 159–166. [Google Scholar] [CrossRef]
Zhang, S.; Han, Z.; Jiang, K. Detection of Data Leakage Based on DNS Traffic. In Proceedings of the 2023 IEEE 5th International Conference on Power, Intelligent Computing and Systems, Shenyang, China, 14 July 2023; pp. 33–38. [Google Scholar] [CrossRef]
Aiello, M.; Merlo, A.; Papaleo, G. Performance assessment and analysis of DNS tunneling tools. Log. J. IGPL 2013, 21, 592–602. [Google Scholar] [CrossRef]
Huang, K.; Fu, J.M.; Huang, J.W.; Li, P.W. A malicious domain detection approach based on character and resolution features. Comput. Simul. 2018, 35, 287–292. [Google Scholar] [CrossRef]
Song, J.; Yang, J.; Li, T. Research on domain flux botnet domain name detection method based on weighted support vector machine. Inf. Netw. Secur. 2018, 12, 66–71. [Google Scholar] [CrossRef]
Salakhutdinov, R.; Hinton, G. Deep boltzmann machines. Ina. Intell. Stat. 2009, 15, 448–455. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Canadian Institute for CyberSecur. 2021. Available online: https://www.unb.ca/cic/datasets/dns-exf-2021.html (accessed on 18 March 2025).
Mahdavifar, S.; Hanafy Salem, A.; Victor, P.; Razavi, A.H.; Garzon, M.; Hellberg, N.; Lashkari, A.H. Lightweight hybrid detection of data exfiltration using dns based on machine learning. In Proceedings of the 2021 11th International Conference on Communication and Network Security, Weihai, China, 3 December 2021; pp. 80–86. [Google Scholar] [CrossRef]

Figure 1. Experimental framework diagram.

Figure 2. Data processing module.

Figure 3. Pearson correlation heatmap of DNS features.

Figure 4. SHAP feature importance.

Figure 5. Generated grayscale image.

Figure 6. DCC detection module.

Figure 7. Structure of the Deep Boltzmann Machine model.

Figure 8. Structure of residual network model.

Figure 9. Calculation process of the self-attention mechanism.

Figure 10. Results of model accuracy with thresholds.

Figure 11. Line graph of the accuracy of different models.

Figure 12. Box plots of the accuracy of different models.

Figure 13. Confusion matrix.

Figure 14. ROC curve.

Figure 15. DCC detected.

Figure 16. Undetected DCC.

Figure 17. Comparison of machine learning models.

Figure 18. Comparison with DBM-conv modeling.

Figure 19. Validation using publicly available datasets. Model accuracy comparison between the proposed DBM-ENSec method and several existing approaches, including Zhang, 2019 [5], Chen, 2021 [7], Shen, 2023 [8], Bykov, 2024 [10], Zhang, 2020 [12], Saeli, 2020 [13], Wang, 2022 [16], and Diao, 2022 [17].The DBM-ENSec model achieved the highest accuracy of 98.52%, outperforming all the referenced methods. Note: Method names in the figure correspond to the references listed above.

Table 1. Captured mixed traffic.

Sample Type	Sample Size
Normal DNS traffic captured	100,000
cobaltstrike	12,000
iodine	12,000
dnsshell	12,000
dnscat	12,000
dnsexfiltrator	12,000
dns2tcp	12,000
tcp-over-dns	12,000
dnslivery	12,000
ozymandns	4000

Table 2. Features.

Num	Feature Name	Meaning
1	Subdomain_Length	Subdomain Length
2	Subdomain_Entropy	Subdomain Entropy
3	Subdomain_Digit_Ratio	Proportion of Numbers in Subdomain
4	Subdomain_Uppercase_Ratio	Proportion of Uppercase Letters in Subdomain
5	Subdomain_Word_Count	Number of Words in Subdomain
6	Max_Word_Length	Maximum Word Length
7	Jaccard_Index_Bi	Jaccard Index of Adjacent Double Characters
8	Jaccard_Index_Tri	Jaccard Index of Adjacent Triple Characters
9	Consecutive_Digits_Ratio	Proportion of Consecutive Numbers in Domain Name
10	Letter_Digit_Alternation_Ratio	Proportion of Letter-Number Switching in Domain Name
11	Resource_Record_Type	Proportion of Consecutive Consonants
12	Label_Count_QNAME	Number of Labels in QNAME
13	UDP_Payload_Length	UDP Payload Length
14	Resource_Record	Resource Record Size
15	Response_IP_Count	Number of Response IPs
16	DNSTTL	TTL Value
17	Payload_Length	Message Payload Length
18	Response_Type	Request Type
19	Entropy_Resource_Record_Content	Entropy of Resource Record Content
20	Data_Injection_Volume	Amount of Injected Data
21	Additional_Data_in_UDP	Whether Other Data is Carried in UDP
22	Resource_Record_Type	Resource Record Type
23	Response_Code	Response Code

Table 3. Detection results when using different parameters.

Num	Dbm Layer	CNN_1	CNN_2	Pooling Layer	Accuracy
1	8 × 8	8 × 3	16 × 3	2 × 1	99.86%
2	8 × 8	16 × 3	32 × 3	2 × 1	99.90%
3	8 × 8	32 × 3	64 × 3	2 × 1	99.91%
4	16 × 16	8 × 3	16 × 3	2 × 1	99.84%
5	16 × 16	16 × 3	32 × 3	2 × 1	99.90%
6	16 × 16	32 × 3	64 × 3	2 × 1	99.91%
7	8 × 8	64 × 3	128 × 3	2 × 1	99.92%

Table 4. Classification accuracy under different feature arrangement strategies.

Arrangement Type	Accuracy	F1-Score
Fixed Order (Baseline)	99.92%	99.91%
Random Order	97.61%	96.85%
Correlation-Based	99.84%	99.72%

Table 5. Classification performance under different grayscale image sizes.

Image Dimension	Accuracy	F1-Score
5 × 5 (zero-pads)	99.92%	99.91%
5 × 5 (the most important features-pads)	99.93%	99.91%
6 × 6	99.71%	99.55%
8 × 8	99.32%	99.01%

Table 6. Comparison with domain name-based detection methods.

Reference	Mold	Accuracy	Precision	Recall Rate	F1 Score
Zhang, 2019 [5]	SVM	89.92%	90.13%	89.92%	89.85%
	NB	79.80%	84.15%	79.80%	79.61%
	RF	95.12%	93.35%	92.41%	92.88%
	DT	93.74%	91.42%	89.87%	90.63%
Chen, 2021 [7]	LSTM	98.23%	98.54%	97.98%	98.26%
Shen, 2023 [8]	DSR-GAT	98.77%	98.99%	98.54%	98.80%
Bykov, 2024 [10]	LightGBM	99.01%	99.12%	98.77%	98.94%
Our Work	DBM-ENSec	99.92%	99.92%	99.91%	99.91%

Table 7. Comparison with traffic grouping-based detection methods.

Reference	Mold	Accuracy	Precision	Recall Rate	F1 Score
Zhang, 2020 [12]	CNN	97.83%	97.26%	97.54%	97.40%
Saeli, 2020 [13]	SVM	98.46%	98.77%	98.32%	98.54%
Wang, 2022 [16]	FF-MR	98.99%	98.65%	98.77%	98.73%
Diao, 2022 [17]	RF	99.13%	98.99%	98.99%	98.99%
Our Work	DBM-ENSec	99.92%	99.92%	99.91%	99.91%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Wang, X.; Yang, G.; Zhang, J.; Li, C.; Cui, F.; Gu, R. An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec. Future Internet 2025, 17, 319. https://doi.org/10.3390/fi17070319

AMA Style

Li X, Wang X, Yang G, Zhang J, Li C, Cui F, Gu R. An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec. Future Internet. 2025; 17(7):319. https://doi.org/10.3390/fi17070319

Chicago/Turabian Style

Li, Xinyu, Xiaoying Wang, Guoqing Yang, Jinsha Zhang, Chunhui Li, Fangfang Cui, and Ruize Gu. 2025. "An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec" Future Internet 17, no. 7: 319. https://doi.org/10.3390/fi17070319

APA Style

Li, X., Wang, X., Yang, G., Zhang, J., Li, C., Cui, F., & Gu, R. (2025). An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec. Future Internet, 17(7), 319. https://doi.org/10.3390/fi17070319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Approach to DNS Covert Channel Detection Based on DBM-ENSec

Abstract

1. Introduction

2. Related Work

2.1. Detection Methods Based on Domain Name

2.2. Detection Methods Based on Traffic Grouping

3. Methodology

3.1. Data Processing Module

3.2. DNS Covert Channel Detection Module

4. Experiments and Analysis of Results

4.1. Experimental Setup

4.2. Performance Evaluation

4.3. Parameterization

4.4. Analysis of Results

4.5. Comparative Analysis

4.5.1. Feature Arrangement Sensitivity Test

4.5.2. Grayscale Image Dimension Comparison

4.6. Comparative Analysis

4.6.1. Comparison with Machine Learning Models

4.6.2. Comparison with the DBM-Conv Model

4.6.3. Comparison with Domain-Based Detection Method

4.6.4. Comparison with Traffic Grouping-Based Detection Method

4.6.5. Validation Using Publicly Available Datasets

5. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI