While the previous sections have demonstrated the model’s strong detection performance, understanding why the model makes specific classification decisions is equally important for building trust and improving system reliability. Most existing research in network intrusion detection focuses primarily on achieving high accuracy metrics, often overlooking post-hoc interpretability analysis that can reveal the underlying decision-making mechanisms. This gap limits practitioners’ ability to understand model behavior and gain insights into attack characteristics.
To address this, we employ SHAP (SHapley Additive exPlanations) for a two-stage interpretability analysis. Beyond merely listing important features, we aim to answer a more fundamental question: Does the model develop distinct, semantically meaningful “strategies” for recognizing different types of network threats? We first identify global decision drivers, then dissect class-specific feature dependencies. The results reveal a coherent taxonomy of detection strategies, demonstrating that the model’s internal logic aligns with the technical nature of various attacks, thereby enhancing trust in its outputs.
4.7.1. Global Feature Importance Analysis
To understand which features the model relies on most across all decisions, we first examine global feature importance. The SHAP analysis follows a standard paradigm: a fixed background dataset of 500 samples is sampled from the balanced training set using stratified sampling, serving as a unified baseline for all SHAP calculations. For global analysis, a balanced explanation dataset is constructed by randomly selecting 50 samples per class from the test set, ensuring fair representation of all classes to observe overall patterns. Using SHAP (SHapley Additive exPlanations), we analyze the average feature contributions across all classes and samples to identify the global driving factors of model decisions.
Figure 9 presents a SHAP summary plot illustrating how feature values influence the model’s predictions. Each point represents a single sample. Its horizontal position indicates the SHAP value (contribution to the output), while its color corresponds to the original feature value—red for values above the median and blue for values below the median in the explanation dataset.
The plot highlights nonlinear relationships between features and their SHAP contributions. For key quantitative features such as duration, src_pkts, and src_bytes, SHAP values are widely distributed across both positive and negative ranges. This spread indicates that the influence of these features depends on context rather than following a simple linear trend. For example, src_pkts shows a negative correlation with SHAP values, where lower feature values tend to have higher positive SHAP values, pushing predictions toward attack classes, while higher values are associated with lower or negative SHAP values. Similarly, duration exhibits a weak positive correlation, but both high and low values can correspond to either positive or negative SHAP values, reflecting its context-dependent impact.
Categorical features show clearer directional patterns. The presence of states such as conn_state_REJ generally contributes positively to attack predictions, with red dots clustered on the positive SHAP side, whereas their absence clusters near zero or slightly negative impact. In contrast, features like src_ip_bytes exhibit an inverse tendency: higher feature values are associated with negative SHAP values, while lower values show positive SHAP values, suggesting that lower src_ip_bytes values are more indicative of attack patterns. dst_ip_bytes follows a more conventional pattern, with high values contributing positively to attack predictions and low values contributing negatively.
Figure 10 quantifies the global importance of the top 15 features through their mean absolute SHAP values. The results show a clear hierarchy of feature importance, with network traffic statistics features dominating the rankings. src_pkts ranks highest, followed closely by duration and src_bytes, with these three features showing similar importance levels. This is followed by src_ip_bytes, dst_ip_bytes, and dst_bytes, forming a group of byte and packet count features that collectively drive most of the model’s global decision-making. Connection state features (conn_state_REJ) and protocol features (proto_tcp) appear in the lower ranks but still contribute meaningfully. The importance distribution is highly skewed, with the top few features accounting for the majority of the model’s global impact, while lower-ranked features show progressively smaller contributions. This indicates that a small subset of traffic statistics features drives most of the model’s decisions across all classes.
The analysis reveals that basic network traffic statistics (packet counts, duration, byte counts) are the primary global driving factors, rather than application-layer content features. This finding aligns with network intrusion detection domain knowledge: attack traffic typically exhibits anomalies in traffic patterns (e.g., unusual packet transmission volumes, abnormal connection durations). These statistical features are more globally decisive than payload content features, validating the effectiveness of preserving these basic statistical features in feature engineering and providing important insights into the model’s decision mechanism for distinguishing normal from anomalous traffic.
4.7.2. Class-Specific Feature Importance
While global feature importance reveals the overall feature contributions across all classes, it may mask class-specific decision patterns. Different attack types often exhibit distinct network behaviors, leading the model to rely on different feature subsets for accurate classification. To uncover these class-specific patterns, we perform SHAP analysis for each attack type using all test samples of that class as the explanation dataset, while maintaining the same fixed background dataset (500 stratified samples from the balanced training set) as established in
Section 4.7.1. This approach ensures consistent baseline comparisons while revealing how the model adapts its feature usage across different attack categories, highlighting the heterogeneity of network intrusion patterns.
Injection, XSS, and Password attacks exhibit similar feature dependency patterns, with byte and packet count features dominating their decision processes.
As shown in
Figure 11, the three attack types exhibit a consistent reliance on byte- and packet-related features. In each case, high values of src_bytes and dst_bytes (represented by red dots) produce strong positive SHAP values, driving predictions toward the corresponding attack class. This pattern aligns with the nature of these application-layer attacks: injection, XSS, and password attacks typically involve transmitting substantial payloads (SQL injection code, JavaScript scripts, or password dictionaries) within HTTP requests, resulting in high byte volumes. For dst_ip_bytes, high values are indicative for injection and password attacks, while XSS shows a different pattern. However, packet count features (src_pkts, dst_pkts) exhibit more complex relationships: for all three classes, low src_pkts values are more indicative of these attacks, while dst_pkts shows varying patterns across classes. This inverse relationship between packet counts and byte volumes reflects the typical HTTP POST request pattern—fewer but larger packets containing substantial payload data—which is characteristic of these application-layer attacks.
For injection attacks, src_bytes and dst_bytes are the most influential features, with elevated byte counts yielding particularly high positive SHAP contributions. This is consistent with SQL injection attacks, which often involve embedding malicious SQL code within HTTP request bodies, resulting in high byte transmission. XSS attacks follow a similar trend, where src_bytes and duration are the top discriminators. The importance of duration for XSS attacks aligns particularly with stored XSS attacks, which often require sustained malicious scripts to remain active on a compromised page, leading to longer-lived connections associated with the attack session. Password attacks also depend heavily on byte-volume features, most notably src_bytes and dst_ip_bytes. Additionally, the service_- feature (indicating unknown services) shows that low values (representing known services) yield positive SHAP contributions for password attacks, reflecting that brute-force attempts predominantly target well-defined, specific services (e.g., SSH, FTP, HTTP), which is consistent with the attack’s service-specific nature.
The consistent positive relationship between high byte feature values (src_bytes, dst_bytes) and positive SHAP contributions across these three classes indicates that each attack type produces distinctive traffic volumes. The inverse relationship observed for src_pkts (where low values are more indicative) aligns with the application-layer attack pattern: these attacks typically manifest as HTTP POST requests containing large payloads in fewer packets, rather than high-frequency packet transmission. This pattern distinguishes them from network-layer attacks (e.g., DoS) that generate high packet counts. The model effectively recognizes these nuanced patterns through quantitative network features, demonstrating alignment between the learned feature dependencies and the underlying attack mechanisms.
DoS, Scanning, Backdoor, and DDoS attacks demonstrate a different pattern where connection state features play a prominent role alongside byte/packet features.
As shown in
Figure 12, connection-state features play a prominent role in classifying these attack types.
DoS attacks are primarily identified by a high conn_state_REJ value, which produces a strong positive SHAP contribution—indicating that rejected connections are a key behavioral signature. This aligns with the nature of DoS attacks, which overwhelm target systems with excessive connection attempts, leading to connection rejections. Notably, src_pkts shows a positive relationship for DoS: high packet counts contribute positively to its detection, reflecting the high-volume packet flooding characteristic of DoS attacks. Additionally, src_ip_bytes exhibits an inverse relationship (low values are more indicative). This is consistent with the prevalence of small-packet floods (e.g., SYN/ACK floods) in many DoS attacks, which generate high packet counts (src_pkts) but low per-packet byte volume.
Scanning attacks show a similar dependence on conn_state_REJ and src_ip_bytes, where high src_ip_bytes values are strongly indicative. This pattern aligns with scanning behavior: port scanning and network reconnaissance generate numerous connection attempts, many of which are rejected, while the scanning source typically generates substantial source IP byte volumes. Here, service_- displays a clear directional pattern: low values (known services) contribute positively to scanning detection, while high values (unknown services) show negative contributions. This suggests that scanning activities are more likely to target well-known, standard service ports for reconnaissance, rather than obscure or custom ports.
Backdoor attacks produce the most distinct feature impact: high src_ip_bytes values yield exceptionally large positive SHAP values, making this the most discriminative indicator for this class. This reflects the operational pattern of backdoors, which often involve establishing persistent connections and transmitting command-and-control traffic, resulting in high source IP byte volumes.
DDoS attacks are characterized mainly by duration and conn_state_S1, with the presence of S1-state connections strongly suggesting DDoS activity. The importance of conn_state_S1 (half-open connections) aligns with SYN flood attacks, a common DDoS technique that exploits the TCP three-way handshake by sending SYN packets without completing connections, leaving numerous connections in the S1 (SYN-sent) state. The prolonged duration reflects the sustained nature of DDoS attacks. Notably, dst_ip_bytes shows an inverse relationship (low values are more indicative), which may reflect the asymmetric nature of DDoS attacks where attackers send many small packets to overwhelm targets.
The consistent importance of connection-state features across these classes highlights that the model recognizes protocol-level behavioral patterns, rather than relying solely on traffic volume. The distinct connection state signatures (REJ for DoS/scanning/backdoor, S1 for DDoS) demonstrate that the model effectively captures the underlying attack mechanisms at the network protocol level.
MITM and Ransomware attacks, despite having limited test samples, exhibit unique feature patterns that distinguish them from other attack types.
As shown in
Figure 13, the two minority attack classes exhibit distinct feature patterns that reflect their underlying attack mechanisms.
MITM attacks are primarily identified by src_pkts, service_ssl, and dst_bytes. The strong positive SHAP contribution of service_ssl (high values yield positive SHAP values) is consistent with the fact that MITM attacks frequently occur within encrypted communications or interact with SSL/TLS sessions. The model therefore appears to leverage SSL-related traffic as a contextual indicator for this attack type. For src_pkts, high values contribute positively to detection, which may reflect the additional packet forwarding or occasional packet injection associated with attacker-in-the-middle positioning. In contrast, dst_bytes exhibits a distinct inverse pattern: low values yield positive SHAP contributions, while high values are associated with negative contributions. This may be related to the traffic interception or disruption that can occur during MITM activities, potentially reducing the amount of data successfully reaching the intended destination compared to normal high-throughput encrypted sessions.
Ransomware attacks display the most distinctive signature, where duration is the dominant feature with exceptionally strong positive SHAP contributions. Notably, these ransomware samples typically exhibit duration values that fall below the median of the background dataset (represented by blue dots), yet they occupy a distinctive, prolonged range that distinguishes them from very long benign sessions. This pattern precisely matches the prolonged but finite encryption and command-and-control phases of a ransomware attack. Additionally, proto_tcp shows strong positive impacts when its value is high, indicating that ransomware generates substantial TCP-based packet activity during its operation, consistent with the network communication patterns required for data exfiltration or command-and-control activities.
The unique feature dependencies observed in these minority classes underscore the value of class-specific analysis. Such distinct decision patterns—particularly the SSL-focused pattern for MITM attacks and the duration-based signature for ransomware—would likely be obscured in global feature importance calculations, demonstrating that the model effectively captures attack-specific behavioral signatures.
As shown in
Figure 14, normal network traffic is characterized by distinct feature patterns that differentiate it from attack classes. The most discriminative features for normal traffic are conn_state_S0 and proto_tcp. High values of conn_state_S0 yield positive SHAP contributions, indicating that established connections (S0 state) are typical of normal traffic. Notably, proto_tcp exhibits an inverse relationship: low values (non-TCP protocols) are more indicative of normal traffic, with high TCP values showing weaker positive contributions compared to non-TCP. This pattern demonstrates that the model effectively distinguishes benign traffic by recognizing typical network behavior characteristics—established connection states and protocol-level patterns.
Class-specific SHAP analysis reveals distinct feature patterns across attack categories. Injection, XSS, and password attacks are primarily characterized by high byte volumes coupled with lower packet counts, with src_bytes consistently influential. In contrast, DoS, scanning, backdoor, and DDoS attacks show stronger dependence on connection states—notably conn_state_REJ for DoS/scanning and conn_state_S1 for DDoS. Minority classes exhibit unique signatures: ransomware is identified by prolonged duration and high packet activity, while MITM attacks associate with service_ssl, high src_pkts, and an inverse pattern on dst_bytes (where low values are indicative), reflecting their focus on intercepting and potentially disrupting encrypted traffic.
However, this analysis also reveals an important limitation: individual features often exhibit conflicting patterns across different classes. For instance, high src_pkts values are associated with DoS, MITM, and scanning attacks, while low src_pkts values are associated with XSS, injection, password, and ransomware attacks. Similarly, features such as duration and src_bytes show opposing relationships across different attack categories. This demonstrates that single features cannot independently determine class membership; instead, the model relies on feature combinations and contextual relationships. The XGBoost classifier effectively leverages these multi-feature interactions to achieve accurate classification, as evidenced by the high overall performance metrics.