Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation

Elfaki, Abdelrahman Osman; Albluwi, Abdulhadi; Aljaedi, Amer; Nerma, Mohamed Hussien Mohamed

doi:10.3390/electronics15081714

Open AccessArticle

Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation

by

Abdelrahman Osman Elfaki

^1,*

,

Abdulhadi Albluwi

²

,

Amer Aljaedi

²

and

Mohamed Hussien Mohamed Nerma

³

¹

Department of Computer Science, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

²

Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

³

Department of Computer engineering, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(8), 1714; https://doi.org/10.3390/electronics15081714

Submission received: 5 March 2026 / Revised: 13 April 2026 / Accepted: 14 April 2026 / Published: 17 April 2026

(This article belongs to the Special Issue Digital Security and Privacy Protection: Trends and Applications, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Firewall policy misconfigurations remain a major source of security vulnerabilities in modern networks, particularly as firewall rule sets grow in size and complexity. Such misconfigurations, commonly referred to as firewall anomalies, can lead to unintended access control behavior and undermine network security. In this paper, we propose a formal logic rule-based framework for the systematic detection and investigation of firewall anomalies, supported by knowledge graph-based visualization. First-order logic (FOL) is employed to precisely model firewall rules and to define major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, in both single and distributed firewall environments. The proposed framework introduces explicit and comprehensive logical definitions for each anomaly type, enabling deterministic, interpretable, and complete detection of rule conflicts and overlaps. Complex anomalies, particularly correlation and generalization, are systematically decomposed into well-defined logical cases to facilitate the accurate identification of subtle, order-dependent interactions among firewall rules. To enhance usability and analysis, firewall rules and detected anomalies are represented using Neo4j knowledge graphs, providing intuitive visual insights into rule relationships and anomaly causes. The effectiveness of the proposed approach is validated using a real operational backbone network dataset collected from Stanford University’s campus network. Experimental results demonstrate the framework’s ability to accurately detect both simple and complex firewall anomalies under realistic network conditions. To further validate the proposed logic rules, a machine learning-based evaluation was conducted. The findings confirm their effectiveness in accurately characterizing firewall anomalies. Unlike machine learning or heuristic-based methods, the proposed approach does not require training data and guarantees formal correctness and explainability. These features make it a robust and practical solution for firewall policy verification and network security management.

Keywords:

firewall anomaly detection; knowledge graph; firewall policy verification; explainable network security; logistic regression validation

1. Introduction

The Internet has become an integral part of our daily lives in numerous ways. Hence, our contemporary lives depend on the Internet to complete and perform a lot of tasks. Protecting data during online transactions is crucial for maintaining privacy and security. A firewall is a device for network security that examines incoming and outgoing network traffic, determining whether to permit or restrict specific traffic based on a predefined set of security rules [1]. Firewalls are essential security components used to protect computer networks from unauthorized access and potential threats. They act as a barrier between an internal network (such as a corporate network) and external networks (such as the Internet), controlling the traffic flow based on predefined rules. A firewall is considered one of the most well-known cybersecurity tools and plays a key role in many cybersecurity strategies [2].

Firewalls can accumulate a significant number of rules as network environments become more complex. Firewall configurations can consist of a substantial number of rules, often ranging from hundreds to even thousands of rules [3]. The number of rules in a firewall configuration can vary widely depending on the complexity of the network and the specific security requirements. In large organizations or networks with diverse services and applications, the firewall rule set can grow significantly. Networks with multiple subnets, different security zones, and complex traffic routing often require a larger number of rules to ensure proper network segmentation and protection [4,5]. Configuring and setting up a firewall is a complex and error-prone task for the following four reasons:

(i): The firewall system often contains a large number of rules.
(ii): Legacy firewall rules may have been designed and implemented by different administrators.
(iii): Rules can be added to or removed from the firewall system at different times.
(iv): Firewall policies might be maintained in the network by more than one administrator.

The aforementioned reasons illustrate the complex nature of managing firewall policies and their implementation. Therefore, automated tools are necessary to ensure the accuracy and efficiency of firewall rules, given the large number of rules in place. The primary function of these automated tools is to validate the set of filtering rules and assist in the optimization process [6].

In real-world scenarios, network security issues can arise as a result of firewall configuration errors. In the literature, these configuration errors within firewall rules are commonly referred to as firewall anomalies. The prominent firewall anomalies discussed in the literature include shadowing, redundancy, correlation, generalization, and irrelevance [7,8]. In this paper, the analysis of firewall anomalies using FOL supported by knowledge graphs has been introduced. First-order logic (FOL) is utilized to define the firewall anomalies. FOL has been selected due to its ability to describe the relationships between objects [9]. Secondly, the denoted notations have been described. Third, new definitions of each firewall anomaly have been introduced in the form of FOL rules. Finally, experiments have been conducted to prove the correctness and applicability of our FOL rules. In this paper, we have used a dataset which is real backbone network configuration data collected from Stanford University campus routers.

The paper makes the following contributions. It proposes a formal first-order logic-based framework for firewall policy analysis that enables precise, deterministic, and explainable detection of firewall anomalies. The paper introduces complete logical definitions for all major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, across both single and distributed firewall environments. Complex anomalies are systematically decomposed into explicit logical cases, ensuring accurate detection of order-dependent and overlapping rules. The framework is further strengthened by knowledge graph-based visualization using Neo4j, and its effectiveness is validated on real backbone network configuration data, demonstrating practical applicability under realistic conditions. To prove the correctness, a machine learning approach is utilized to prove the correctness of our proposed logic rules.

The remainder of this paper is organized as follows. Section 2 reviews and analyzes the related literature. Section 3 presents the definitions and categories of firewall policy anomalies. Section 4 introduces the proposed validation rules and the knowledge graph-based visualization framework. Section 5 employs machine learning techniques to validate the correctness and practical applicability of the proposed logic rules. Finally, Section 6 concludes the paper and outlines future research directions.

2. Related Work

Arthur et al. [10] developed an automated tool to determine and resolve these firewall anomalies. This tool solves the anomalies by selecting the best position of the rule index by using a heuristic approach based on the particle swarm optimization. The work in [11] developed a model for firewall anomalies detected by utilizing the parallel-feature-fusion technique. They used a stacked autoencoder and deep belief network as feature learning methods. The work in [12] developed a formal system utilizing logic solver for detecting firewall anomalies.

Togay et al. [13] suggested an anomaly detection framework based on the Java web application platform. This framework is based on linear logic where traditional anomalies could be detected. Kulyadi et al. [14] designed a firewall anomaly detection model using a Recurrent Neural Network (RNN) that learns the normal behavior of the firewall and the complex spatio-temporal correlations in the data. The model detects anomalies that can potentially be malware. The work in [15] developed a model that combines Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) to analyze log files and detect firewall anomalies. In this work, logic notation is used as a formalization tool. The work in [16] developed an approach based on Header Space Analysis (HSA) for enhancing the performance of firewall anomaly detection. Their HAS approach allows for statically checking network specifications and configurations. Toprak and Yavuz [17] developed a deep learning semi-supervised model to detect firewall anomalies. For training and testing the model, they utilized the PayloadAllTheThings dataset. The work in [18] developed a firewall anomaly detection algorithm based on an asymmetric double decision tree. They utilized the packet filter, the first matching rule for the practical decision. The work in [19] developed a prototype of a decision support system for optimizing firewall rules aiming to detect firewall anomalies by using a probability approach. The work in [20] designed machine learning and deep learning multi class models for analyzing firewall logs and classifying the actions. Their results show the applicability of the two models in discovering firewall anomalies. In [21] they developed four classification algorithms (Naive Bayes, logistic regression, decision tree, and support vector machine) for discovering firewall anomalies using feature engineering. The work in [22] developed a machine learning classifier that can be used to provide the correct action in firewall records. They employed confusion matrix parameters as evaluation metrics.

The work in [23] developed a model that analyzes log files to identify firewall anomaly patterns. The work in [24] developed an optimized model that aids administrators in detecting firewall anomalies, which used rule-based logic as an optimization tool. The work in [25] conducted experiments by using injected anomalies into the existing firewall logs, with the aim of comparing the efficiency of supervised and unsupervised learning techniques. Their results proved that the unsupervised learning method had difficulty detecting the injected anomalies. The work in [26] proposed a new unsupervised anomaly detection model. They utilized natural language processing methods for feature extraction and PCA for dimension reduction.

A new model and visualization tool, developed by [27], helps information security managers identify and prioritize anomalies in firewall policies. This model is developed based on considering exceptional rules, which are often misclassified as anomalies in existing models. However, there is no proof that this model is able to detect all cases of firewall anomalies; completeness is an issue.

3. Anomalies in Firewall Policy

This section examines firewall policy anomalies at two levels. The first level focuses on anomalies occurring within a single firewall, whereas the second level considers anomalies arising in distributed firewall architectures. According to [8], the primary rule anomalies in a single firewall are shadowing, correlation, generalization, and irrelevance. Each of these anomalies is defined and discussed in the subsequent subsections. Additionally, distributed firewall environments may exhibit inter-firewall anomalies and spuriousness anomalies [27].

3.1. Anomalies in a Single Firewall

In the following section, the anomalies in a single firewall have been defined and illustrated.

3.1.1. Shadowing Firewall Anomalies

A networking rule is considered a shadowing anomaly if it is placed after another rule that already covers all the same network traffic. Because the first rule will always be applied, the “shadowed” rule will never be activated or counted. Table 1 shows an instance of a shadowing anomaly. In Table 1, rule 2 (index:I₂) is shadowed by rule 1 (index:I₁). Rule 2 follows rule 1, and rule 2 is a subset match of rule 1 and the actions of rule 1 and rule 2 are different. Rule 2 is never activated because an earlier, more general rule captures the traffic it is meant to handle.

3.1.2. Correlation Firewall Anomaly

A correlation anomaly occurs when two firewall rules have different actions, and their matching criteria partially overlap. This means some packets will be matched by both rules, but neither rule completely covers the other. The outcome for a specific packet depends on which rule is evaluated first, making the policy order-sensitive and unpredictable. For example, one rule might allow traffic from a specific IP to a range of ports, while a second rule denies traffic from that same IP to a specific port within that range. The order of these rules determines the packet’s fate [28]. Depending on the specific security policy in place, this anomaly could be classified as either:

Conflicting: If the firewall processes rules in a way that allows the second rule to override the first, it creates a direct conflict in action.

Redundant: If the policy dictates a specific order of rule execution (e.g., the first rule to match wins), then one of the rules essentially becomes redundant, as its action will never be applied to that specific packet. Table 2 shows an instance of a conflicting correlation anomaly. Table 3 shows an instance of redundancy in correlation anomaly.

3.1.3. Generalization Firewall Anomaly

A generalization anomaly occurs when one rule’s matching criteria are a superset of a second rule’s criteria, and they have different actions. The more general rule (the superset) appears after the more specific rule. Because firewalls process rules sequentially, the specific rule will always be evaluated first for any packet that matches both. The more general rule will, therefore, never be applied to those specific packets, making it effectively useless for that subset of traffic. For example, one rule allows traffic from a single IP address, and a second, more general rule (that follows the first) denies all traffic from the entire network subnet that includes that IP. The second rule will never affect the traffic from that specific IP because the first rule will always be a match [29]. Table 4 shows this instance of a generalization anomaly.

3.1.4. Irrelevant Firewall Anomaly

An irrelevant firewall anomaly occurs when a rule in a firewall policy is never able to be matched by any incoming traffic. This happens because another rule that appears earlier in the policy is more general and already covers all the traffic that the later, more specific rule would have matched [30]. We can identify two main scenarios where a firewall rule is irrelevant:

Erroneous IP entries: The source and destination IP addresses are identical within the rule. This rule is ineffective because a network packet cannot originate and terminate at the same host.
Irrelevant addressing: The rule contains non-existent IP addresses or services that do not align with the network’s addressing scheme. For instance, a rule might specify a destination or port that does not exist on the network, making it impossible for any traffic to match the rule.

Because the source and destination IP addresses in firewall rule are identical (as seen in Table 5), the rule is ineffective and will not match any incoming packets.

As an example of a case 2 irrelevant anomaly, the rule in Table 6 is ineffective. The packet with the tuple (UDP, 60.0.0.3, 70) would never match the rule because: the destination IP address (60.0.0.3) does not belong to any machine on the network, and no application is actively listening for traffic on port 70. Because the destination, service, or both do not exist, the firewall rule will never match any incoming packets. Table 6 shows an instance of an irrelevant anomaly (case 2).

3.2. Anomalies in Distributed Firewalls

Inter-firewall anomalies: This specific type of anomaly refers to conflicts that occur between different firewalls in a distributed network. A common scenario is when one firewall allows a packet, but a subsequent firewall on the same path blocks it, leading to inconsistent security policies and potential vulnerabilities. Table 7 shows an instance of inter-firewall anomalies. In Table 7, F1 explicitly allows HTTPS traffic to the internal server, but F2 blocks HTTPS traffic to the same server.

Spuriousness Anomaly

This anomaly, often discussed in the context of distributed firewalls, occurs when a packet is allowed by an upstream firewall but is not covered by any rule in a downstream firewall. Table 8 shows an instance of a spuriousness anomaly. In Table 8, F1 explicitly allows HTTP traffic to the internal server, but F2 has no rule for HTTP traffic to 10.0.0.10.

4. Validation Rules and Knowledge Graph Illustration

This section illustrates the validation rules used for firewall anomaly detection, which are based on the definition provided in the previous section. In addition, according to [31], the standard structure of a firewall rule is ⟨order, protocol, src_ip, src_port, dst_ip, dst_port, action⟩. In this section, firewall rules are represented using first-order logic [32] and follow the same concept introduced by [31]. The general syntax of the proposed validation rule is defined as

fr(index, proto, scr_IP, src_port, dst_IP, dst_port, action). This representation is necessary for applying the proposed logic rules.

In this representation, fr refers to a firewall rule, index represents the unique identifier of the rule, proto indicates the network protocol in use, src_IP and src_port denote the source IP address and source port, respectively, while dst_IP and dst_port specify the destination IP address and destination port. Finally, action defines the decision applied to the traffic, which can be either allow or deny.

In light of the above discussion, firewall anomalies can be summarized into four categories as follows:

Shadowing ⟹ (\forall l, ({f r}_{a}^{l} \supseteq {f r}_{b}^{l})) \land ({f r}_{a}^{a c t i o n} \neq {f r}_{b}^{a c t i o n})

(1)

Redundancy ⟹ (\forall l, ({f r}_{a}^{l} \supseteq {f r}_{b}^{l})) \land ({f r}_{a}^{a c t i o n} = {f r}_{b}^{a c t i o n})

(2)

Correlation ⟹ (\exists l, ({f r}_{a}^{l} \supseteq {f r}_{b}^{l})) \land (\exists l^{'}, ({f r}_{a}^{l^{'}} \subseteq {f r}_{b}^{l^{'}})) \land ({f r}_{a}^{a c t i o n} \neq {f r}_{b}^{a c t i o n})

(3)

Generalization ⟹ (\forall l, ({f r}_{a}^{l} \subseteq {f r}_{b}^{l})) \land ({f r}_{a}^{a c t i o n} \neq {f r}_{b}^{a c t i o n})

(4)

Here, fr denotes a firewall policy, while a and b represent the ordering of firewall policies, with a > b indicating higher priority. The action specifies the firewall operation, which belongs to {allow, deny}, meaning that the packet is either forwarded or dropped. The symbol l refers to an individual field of a firewall policy, such as the source IP (src_IP), destination IP (dst_IP), source port (src_Port), destination port (dst_Port), or protocol type (Proto), where

l \in {Proto, SrcIP, SrcPort, DstIP, DstPort}

and

l \neq l^{'}

.

4.1. Dataset Description and Utilization

The dataset used in this paper is derived from real operational backbone network data collected from Stanford University’s campus network infrastructure, commonly referred to in the literature as the Stanford backbone dataset (Stanford University, “Campus backbone router configuration and ARP table data,” Stanford, CA, USA, unpublished research dataset) [33]. The dataset consists of router-level information extracted from core network devices, including Address Resolution Protocol (ARP) tables, IP–MAC mappings, interface identifiers, and VLAN assignments.

The data reflects actual backbone-level network configurations, capturing multiple subnets, virtual LANs, and inter-router connections. IP address ranges such as 171.64.0.0/16, which are publicly assigned to Stanford University, confirm the provenance of the dataset. The dataset does not contain packet payloads or user-level information; instead, it represents control-plane and configuration-level data, making it suitable for firewall policy modeling and anomaly analysis.

In this work, the dataset is used to reconstruct realistic network scopes and policy domains, from which firewall rules and packet-matching conditions are inferred. This enables the evaluation of firewall anomalies such as shadowing, redundancy, correlation, and generalization under real network conditions rather than synthetic rule sets. Since the dataset originates from operational infrastructure, it provides a realistic and challenging benchmark for validating firewall policy verification techniques.

To prepare and normalize this dataset for use in Neo4j, we developed the software described in Table 9, which outlines the firewall policy graph construction and the port normalization algorithm.

In the following section, first-order logic (FOL) rules are provided to illustrate how each anomaly type is represented. The use of first-order logic (FOL) for developing validation rules was previously explained by [6].

4.2. Shadowing Detection

The general form of a shadow anomaly is as below:

∀ fr, PC: (I₁ < I₂) ∧ (PC₁∈ fr₁) ∧ (PC2∈ fr₂) ∧ (PC₂ ∈ fr₁) ∧(PC1\== PC₂) ⟹fr₂ is Shadowing of fr₁

(5)

Equation (5) indicates that rule fr₂ is positioned after fr₁ in the firewall rule ordering; therefore, fr₁ is evaluated first. Rule fr₁ matches two packet classes (PC₁ and PC₂), whereas fr₂ matches only packet class PC₂. Since all packet fields matched by fr₂ are already covered by fr₁, Equation (5) represents a general shadowing anomaly, in which fr₂ is completely shadowed by fr₁. Moreover, when both rules specify the same action, this situation is additionally classified as a redundancy anomaly.

In the following, the shadowing anomaly has been classified into four cases as follows:

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧(proto₁=proto₂)∧(src_IP₁=src_IP₂)∧(dst_IP₁= dst_IP₂)∧(dst_port₁=dst_port₂)∧(action₁~=action₂)⟹Shadowing_1

(6)

Equation (6) represents the first case of a shadowing anomaly, where rules fr₁ and fr₂ are modeled using predicates. Both rules share identical protocol types, source IP addresses, destination IP addresses, and destination ports, while specifying different actions. Since fr₁ precedes fr₂ in the rule order, fr₂ is completely shadowed by fr₁. This scenario is referred to as the first case of shadowing. Figure 1 illustrates the detection of the first case of shadowing in the dataset after applying Rule 6.

In Figure 1, packet 2001 has a first case of shadowing with packet 139. Table 10 and Table 11 show the definitions of packets 2001 and 139 respectively. As is shown from the definitions of packets 2001 and 139, they have the same protocol types, source and destination IP addresses, and destination ports, but define different actions. Therefore, this case is considered the first case of shadowing.

The same pair of rules can meet more than one shadowing definition at the same time. This does not mean the detection is redundant; rather, it shows that the pair satisfies multiple independent shadowing criteria (e.g., an exact match as well as broader field coverage), and each criterion is treated as a separate anomaly case in our model.

The subsequent sections present the remaining possible forms of shadowing anomalies.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂, action₂)∧ (I₁<I₂)∧ ((proto₁ = any) ∨(proto₂ ∈ proto₁))∧ (src_IP₂ = src_IP₁) ∧ (dst_IP₂ = dst_IP₁) ∧ (dst_port_{2 =} dst_post₁) ∧(action₂~=action₁)⟹ Shadowing_2

(7)

Equation (7) indicates that fr₂ is shadowed by fr₁. In fr₁, the protocol is set to “any”, meaning it matches all protocol types (e.g., TCP or UDP). Therefore, regardless of the protocol value in fr₂, it will always be included within the protocol scope of fr₁. In the second case, where proto₂ ∈ proto₁, the protocol specified in fr₂ is a subset of the protocol set defined in fr1. For example, if proto₁ = {TCP, UDP}, then proto₂ must be one of these protocols, i.e., proto₂ ∈ proto₁. All remaining packet fields are identical in fr₁ and fr₂, but the two rules apply different actions, which leads to the shadowing relationship. Figure 2 illustrates the detection of the second case of shadowing in the dataset after applying Rule 7.

Figure 2. Illustration of the detection of the second case of shadowing.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,pro₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧(proto₂=proto₁)∧ ((src_IP₁ = any) ∨(src_IP₂ ∈ src_IP₁)) ∧ (dst_IP₂ = dst_IP₁)∧ (dst_port_{2 =} dst_post₁) ∧(action₂~= action₁)⟹ Shadowing_3

(8)

Equation (8) indicates that fr₂ is shadowed by fr₁. In this equation, the source IP of fr₁ can appear in two forms: it may be set to “any”, or it may be an aggregate source IP. An aggregate source IP means that a single rule (fr₁) specifies multiple source IP addresses or a range of addresses. In contrast, the source IP of fr₂ is a single IP that is included within the source IP set defined in fr₁. For example, if the source IP of fr₁ is {10.0.0.1–10.0.0.10} and the source IP of fr₂ is {10.0.0.5}, then the source IP of fr₂ is contained within fr1. In Equation 8, since the source IP of fr₂ is included in fr₁, and the remaining packet fields are equivalent while the actions differ, fr₂ becomes a shadowed rule of fr₁. Figure 3 illustrates the detection of the third case of shadowing in the dataset after applying Equation (8).

Figure 3. Illustration of the detection of the third case of shadowing.

∀:I,pro,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧(src_IP₂= src_IP₁)∧((dst_IP_{1 =} any)∨ (dst_IP₂ ∈ dst_IP₁))∧(action₂~=action₁)⟹Shadowing_4

(9)

Equation (9) shows that fr₂ is shadowed by fr₁. In this case, the destination IP of fr₁ can take one of two forms: it may be specified as “any”, or it may be an aggregate destination IP. An aggregate destination IP means that a single rule (fr₁) contains multiple destination IP addresses or a destination IP range. In contrast, fr₂ includes a single destination IP that is contained within the destination IP set of fr₁. For example, if the destination IP in fr₁ is {10.0.0.1–10.0.0.10} and the destination IP in fr₂ is {10.0.0.5}, then the destination IP of fr₂ belongs to the destination IP range defined in fr₁. Since Equation (9) indicates that the destination IP of fr₂ is included in fr₁, while the remaining packet fields are equivalent, but the actions are different, fr₂ is considered a shadowed rule of fr₁. Equation (9) denotes the fourth case of shadowing.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧(src_IP₂= src_IP₁)∧(dst_IP₂=dst_IP₁)∧((dst_port₁=any)∨(dst_port₂∈dst_port₁))∧ (action₂~=action₁)⟹ Shadowing_5

(10)

Equation (10) denotes that fr₂ is shadowed by fr₁. In this equation, there are two scenarios of the destination port of fr₁, either equal to “any”, or it is an aggregate port. An aggregate destination port means there are multiple destination ports in single rule, fr₁. The destination port of fr₂ is a single port and belongs to the destination port of fr₁. For instance, suppose destination port of fr₁ = {40, 60, 80} and destination port of fr₂ is {40}; hence the destination port of fr₂ belongs to fr₁. Following the concept in Equation (5) (the general form of the shadow anomaly), the destination port of fr₂ belongs to fr₁ and the rest of the packet details are equivalent with different actions. Therefore, fr₂ is a shadow of fr₁. Equation (10) denotes the fifth case of shadowing. Figure 4 shows a snapshot of the fifth case of shadowing.

The following discusses the rules used to detect correlation anomalies. Shadowing occurs when the entire set of packets matched by fr₂ is included within fr₁, yet the two rules specify different actions. However, a correlation anomaly appears when fr₁ and fr₂ partially overlap, meaning some packets of fr₂ are covered by fr₁, and some packets of fr₁ are also matched by fr₂, with different actions applied.

The visual structure of the knowledge graph does not explicitly encode the causal reason for shadowing. Consequently, all shadowing cases exhibit the same graphical pattern. Differentiation among these cases is achieved through semantic annotations, including the relationship type, the underlying logical conditions, and the specific rule attributes involved.

4.3. Correlation Detection Top of Form

The general form of a correlation anomaly is illustrated by Equation (11) as follows:

∃PC,∀fr:(I₁<I₂)∧(PC₁ ∈ fr₁)∧(PC₂ ∈ fr₂)∧((PC₂∈ fr₁)∨(PC₁∈ fr₂))⟹ fr₂ is correlated with fr₁

(11)

Equation (11) represents the general form of a correlation anomaly, where PC refers to packets and fr denotes a firewall rule. In this equation, PC₁ belongs to fr₁, PC₂ belongs to fr₂, and fr₁ precedes fr₂. A correlation anomaly occurs when there is a partial overlap between the two rules, meaning that some packets matched by fr₂ are also matched by fr₁, or some packets matched by fr₁ are also matched by fr₂. Therefore, fr₂ is correlated with fr₁.

By using a distributive property, Equation (11) can be decomposed into two parts; part one is (I₁ < I₂) ∧ (PC₁ ∈ fr₁) ∧ (PC₂ ∈ fr₁), and the second part is (I₁ < I₂)∧(PC₁ ∈ fr₁)∧(PC₂ ∈ fr₂)∧ (PC₁ ∈ fr₂).

With respect to the first part, when packets that belong to fr₂ are also matched by fr₁, then fr₂ is treated as being shadowed by fr₁. Accordingly, the shadowing analysis can be used to represent the first component of the correlation anomaly. The second part of the general correlation equation (I₁ < I₂)∧(PC₁ ∈ fr1)∧(PC₂ ∈ fr₂)∧(PC₁∈ fr₂) is explained below:

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧((proto₂=any)∨proto₁∈proto₂))∧(src_IP₂=src_IP₁)∧(dst_IP₂=dst_IP₁)∧(dst_port₂=dst_port₁)∧(action₂~=action₁)⟹ Correlation1

(12)

Equation (12) indicates that fr₂ is correlated with fr₁, where the rule with index I₁ is a subset of the rule with index I₂. In fr₂, the protocol may be set to “any”, meaning it covers all protocol types; therefore, fr₁ will always fall within fr₂ regardless of the protocol specified in fr₁. Another situation occurs when proto₂ ∈ proto₁, which means the protocol defined in fr₂ is included within the protocol set of fr₁ (e.g., if proto1 = {TCP, UDP}, then proto₂ belongs to this set). All other packet-matching fields remain identical in fr₁ and fr₂, but the two rules apply different actions.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,ation2)∧(I1<I2)∧(proto₂ = proto₁)∧((src_IP₂ = any)∨(src_IP₁ ∈ src_IP₂)) ∧ (dst_IP₂ = dst_IP₁)∧(dst_port₂ = dst_port₁)∧(action₂~=action₁)⟹ Correlation2

(13)

Figure 5 shows a snapshot of the second case of correlation.

The two diagrams in Figure 1, Figure 3 and Figure 5 illustrate representative snapshots of the detected anomalies. Each diagram highlights one or more packets and their anomaly relationships with other packets. For example, in Figure 1, the first diagram presents packet No. 2001 and its associated relationships, whereas the second diagram presents packets No. 88 and 99 with their corresponding anomaly relationships.

Equation (13) shows that fr₂ is correlated with fr₁. In this case, the source IP in fr₂ can appear in two forms: it may be set to “any”, or it may represent an aggregate source IP range. Meanwhile, the source IP of fr₁ is a single address or a narrower IP range that falls within the broader source IP range specified by fr₂.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,ation₂)∧(I₁<I₂)∧(src_IP₂ = src_IP₁)∧((dst_IP₂=any)∨(dst_IP₁∈ dst_IP₂))∧(action₂ ~= action₁) ⟹ Correlation3

(14)

Equation (14) indicates that fr₂ is correlated with fr₁. In this equation, the destination IP in fr₂ has two possible cases: it can be set to “any”, or it can represent an aggregate range of destination IP addresses within the rule fr₂. In contrast, the destination IP of fr₁ is either a single address or a narrower destination IP range that falls within the destination IP range of fr₂. Therefore, the destination IP of fr1 belongs to fr₂. In Equation (14), the destination IP of fr₁ is contained within fr₂, while all other packet-matching fields remain identical between fr₁ and fr₂, but the two rules specify different actions. Bottom of form

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,ation₁)∧fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,ation₂)∧(I₁<I₂)∧(src_IP₂=src_IP₁)∧(dst_IP₁ = dst_IP₁)∧((dst_port₂ = any)∨(dst_port₁ ∈ dst_port₂))∧(action₂~=action₁) ⟹ Correlation4

(15)

Equation (15) indicates that fr₂ is correlated with fr₁. In this equation, the destination port of fr₂ can take one of two forms: it may be set to “any,” or it may represent an aggregate set of destination ports. An aggregate destination port implies that multiple destination ports are specified within a single rule (fr₂). The destination port of fr₁ is either a single port or a narrower range of ports that falls within the destination port set of fr₂. Thus, in Equation (15), the destination port of fr1 is contained within that of fr₂, while all other packet-matching fields in fr1 and fr₂ are identical but the actions differ. Consequently, fr₁ is correlated with fr₂.

The following discusses the rules used to detect correlation anomalies.

4.4. Generalization Detection

A generalization anomaly arises when two rules produce conflicting actions, and all packets matched by one rule are a subset of the other rule. The distinction between shadowing and generalization anomalies can be expressed by the following equation:

∀ fr, PC: (fr₁<fr₂) ∧ (PC₁ ∈ fr₁) ∧ (PC₂ ∈ fr₂) ∧ (PC₁ ∈ fr₂)⟹ fr₂ is generalization of fr₁

(16)

This generalized form of the generalization anomaly is analogous to the second case of the correlation anomaly. Therefore, the same equations used to characterize the second part of the correlation anomaly, namely, Equations (12)–(15), are also applicable for identifying the generalization anomaly. Figure 6 shows a snapshot of a generalization.

4.5. Irrelevant and Redundancy Detection

An irrelevant firewall rule is a rule that does not match any packets transmitted within the network. This type of anomaly may arise when firewall rules are not updated to reflect changes in network topology or configuration (e.g., the addition or removal of devices, or changes in the addressing scheme), or because of rule misconfigurations. Equation (17) represents the irrelevant anomaly.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr(I,proto,src_IP,src_port,dst_IP, dst_port, action)∧ (src_IP = dst_IP)⟹ Irrelevant1

(17)

In Equation (17), the source and destination IP addresses are identical, indicating a misconfiguration.

A redundant firewall rule is a rule that matches all packets of a preceding firewall rule and applies the same action. Equation [18] denotes the redundancy case.

∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr₁(I₁,proto₁,src_IP₁,src_port₁,dst_IP₁,dst_port₁,action₁)∧ fr₂(I₂,proto₂,src_IP₂,src_port₂,dst_IP₂,dst_port₂,action₂)∧(I₁<I₂)∧(proto₁=proto₂)∧(src_IP₁=src_IP₂)∧(dst_IP₁= dst_IP₂)∧(dst_port₁=dst_port₂)∧(action₁=action₂)⟹redundancy

(18)

5. Prove the Correctness of the Proposed Rules Using Machine Learning

In this section, machine learning—specifically logistic regression—is employed to validate the correctness and applicability of the proposed logic rules by training a logistic regression classifier to perform anomaly classification.

Firewall anomalies (shadowing, redundancy, correlation, and generalization) are not directly observable labels. Instead, they represent logical relationships between pairs of firewall rules. Therefore, logic-based rules are first applied to detect anomalies, and the resulting outputs are then used as labels to train a machine learning classifier to learn these anomaly patterns.

The general methodology for applying machine learning to firewall anomaly detection is as follows:

Read the firewall rules dataset.
Perform logic-based anomaly detection, which serves as the ground truth (labels).
Extract relevant features.
Train a machine learning algorithm as a classifier.
Perform anomaly classification.

In the following, we are using logistic regression to find shadowing firewall rules by using our logic Equation (5) and logic Equation (17) to find redundant firewall rules, and logic Equation (11) to find correlated firewall rules.

Our proposed logic rules can provide ground truth for logistic regression by serving as a formal, deterministic labeling mechanism when explicit annotated datasets are unavailable. The process can be explained as follows:

Firewall anomalies (shadowing, redundancy, correlation) are not directly observable attributes but are defined through logical relations between rule fields. Therefore, each firewall rule (or rule pair) is first evaluated against a set of formally defined logic rules (Equations (5), (17) and (11)). If a rule satisfies a given logical condition, it is assigned a corresponding anomaly label (e.g., redundant = 1, non-redundant = 0). These logic-derived labels constitute the ground truth for supervised learning.

Once labeled, the firewall rules are transformed into numerical feature vectors and used to train a logistic regression classifier. The model learns to approximate the decision boundaries induced by the logic rules and can generalize these patterns to unseen configurations or noisy data.

5.1. Find Shadowing Firewall Rules Using Logistic Regression

In this section, machine learning methodology is used to detect shadowing anomalies in our used dataset. Since shadowing anomalies are not explicitly labeled in real-world firewall configurations, Equation (5) which denotes the general form of shadowing anomaly is first employed to generate ground-truth labels. A supervised machine learning model, namely logistic regression, is then trained to learn and generalize these shadowing patterns. For each rule pair, a set of binary features is extracted to represent the logical relationships between rules. These features include protocol matching, source and destination IP containment, source and destination port containment, and action conflict. This representation enables the machine learning model to capture the structural characteristics of shadowing anomalies without relying on raw rule syntax.

Given a feature vector:

x = (x_{1}, x_{2}, \dots, x_{n})

extracted from a rule pair

(f r_{1}, f r_{2})

, logistic regression estimates the conditional probability that the pair corresponds to a shadowing anomaly. Each firewall rule pair is encoded using a set of binary features derived from the underlying logic-based conditions. These features capture whether the protocol of the later rule is subsumed by the earlier rule, whether the source and destination IP addresses of the later rule fall within the corresponding address ranges of the earlier rule, whether the source and destination ports are contained within the earlier rule’s port specifications, and whether the two rules enforce conflicting actions. This abstraction enables the model to learn structural patterns of rule dominance rather than relying on raw textual representations of firewall rules. Table 12 shows snapshot of detected shadowing rules.

5.2. Find Redundant Firewall Rules

Redundancy anomalies are not directly observable labels but are logical relations derived from firewall rule attributes. Therefore, Equation (17) was first applied to generate ground-truth labels, where a rule is considered redundant if its source and destination IP addresses are identical. Subsequently, a logistic regression model was trained using structural features extracted from firewall fields to learn redundancy patterns. This hybrid logic–machine learning approach enables scalable detection while preserving interpretability and consistency with formal firewall anomaly definitions. Table 13 shows a snapshot of detected redundant firewall rules. In the following, the mathematical representation of feature extraction is presented:

In this paper, a firewall rule is defined as:

f r_{i} = (I_{i}, p r o t o_{i}, s r c I P_{i}, s r c P o r t_{i}, d s t I P_{i}, d s t P o r t_{i}, a c t i o n_{i})

From each rule

f r_{i}

, a feature vector

x_{i} \in R^{d}

is constructed as follows:

x_{i} = [\begin{matrix} x_{i 1} \\ x_{i 2} \\ x_{i 3} \\ x_{i 4} \\ x_{i 5} \end{matrix}]

Then using the provided logical equation, the redundancy label

y_{i}

is defined as:

y_{i} = {\begin{matrix} 1, & if s r c I P_{i} = d s t I P_{i} \\ 0, & otherwise \end{matrix}

Table 13. Snapshot of detected redundant firewall rules.

Rule Index	Protocol	SrcIP	DstIP	SrcPort	DstPort	Action	Redundancy Description
88	any	any	any	any	any	permit	Source and destination IP addresses are identical (any), resulting in redundant IP specification
102	tcp	any	any	any	445	deny	Identical source and destination IP fields; filtering depends only on destination port
102	tcp	any	any	any	140–65535	permit	Redundant IP fields with overlapping port ranges
215	udp	10.0.0.5	10.0.0.5	any	53	permit	Rule matches traffic where source and destination are the same host
347	any	192.168.1.0/24	192.168.1.0/24	any	any	deny	Identical source and destination subnet ranges
411	tcp	any	any	1024–65535	80	permit	IP fields are redundant; behavior governed solely by port constraints

5.3. Find Correlated Firewall Rules Using Logistic Regression

As we have mentioned before, Equation (11) denotes the general form of a correlation between firewall rules. Following the methodology discussed in Section 5, Table 14 shows the mapping of Equation (11) to ML features. Algorithm 1 shows logic-based labeling using Equation (11) as ground truth. Table 15 shows the snapshot of detected correlated firewall rules.

Table 14. Mapping of Equation (11) to ML features.

Logic Condition	ML Feature
(I_1 < I_2)	enforced by rule pairing
(PC_1 \in fr_1)	rule exists
(PC_2 \in fr_2)	rule exists
(PC_2 \in fr_1)	partial field overlap
S(PC_1 \in fr_2)	partial field overlap
Different actions	action_diff = 1

Algorithm 1. Logic-based labeling using Equation (11) as ground truth.

pairs_df = pd.DataFrame(pairs)
def correlation_label(row):
if (
row.proto_eq
and row.action_diff
and (row.src_overlap or row.dst_overlap)
and not (row.sport_eq and row.dport_eq)
):
return 1 # correlated
return 0 # not correlated
pairs_df[“label”] = pairs_df.apply(correlation_label, axis=1)

Following Equation (11), firewall correlation anomalies were modeled as a supervised rule-pair classification problem. Each ordered rule pair

(f r_{1}, f r_{2})

was transformed into a fixed-length feature vector capturing protocol equivalence, partial field overlap, and action inconsistency. Logic-based conditions derived from Equation (11) were used to generate labels, which were then employed to train a logistic regression classifier. This approach enables scalable and generalizable detection of correlation anomalies while preserving interpretability.

6. Conclusions

This paper presented a formal, logic-driven framework for the systematic detection and analysis of firewall anomalies, supported by knowledge graph visualization using Neo4j. First-order logic (FOL) was employed to formally model firewall rules and precisely define major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, in both single and distributed firewall environments. By expressing these anomalies as formal logical rules, the proposed method provides a clear, unambiguous, and extensible framework for firewall policy validation.

The use of a real operational backbone dataset from Stanford University demonstrates the practical applicability of the proposed approach under realistic network conditions. Experimental results confirm that the developed FOL rules are capable of accurately identifying different anomaly cases, including complex overlapping scenarios that are often difficult to detect using traditional methods. In addition, the Neo4j-based knowledge graph representation offers an intuitive visualization mechanism that assists network engineers in understanding policy interactions, diagnosing configuration errors, and improving firewall rule management. Moreover, a machine learning-based approach was employed to validate the correctness of the proposed logic rules. The results presented in Table 12, Table 13 and Table 15 confirm the effectiveness and correctness of these logic rules.

The related work is categorized into two main streams. The first focuses on logic-based approaches for the formalization of firewall anomaly detection, while the second examines machine learning techniques for developing predictive models to address firewall anomalies. Compared with prior logic-based approaches, the proposed framework provides a more comprehensive analysis of firewall anomalies by defining five shadowing cases, four correlation cases, and corresponding first-order logic (FOL) detection rules for each scenario. Furthermore, the framework extends its coverage to generalization, redundancy, and irrelevant anomalies, enabling a unified logic-based detection methodology.

In contrast to existing machine learning-based and heuristic approaches, the proposed method offers superior interpretability, completeness, determinism, and formal correctness. Since it does not depend on training data or labeled samples, it is particularly well suited for security environments where explainability, reproducibility, and policy transparency are essential. Furthermore, its logical formulation provides a flexible foundation that can be readily reused and extended to support emerging anomaly definitions and continuously evolving network policies.

A key contribution of this work lies in the systematic decomposition of complex anomalies—particularly correlation and generalization—into well-defined logical cases, enabling comprehensive detection of overlapping and order-dependent rule interactions. The integration of Neo4j-based knowledge graphs further strengthens the framework by providing an intuitive and actionable visualization layer, allowing network engineers to trace anomaly causes, understand rule dependencies, and validate firewall policies with high confidence. As such, it represents a significant contribution to firewall policy verification and network security management. Figure 7 illustrates the overall distribution of firewall anomaly categories identified in the dataset by applying the proposed detection rules, highlighting the relative prevalence of each anomaly type.

In Figure 7, the bar chart summarizes the total number of anomaly instances detected per category, aggregating all shadowing subtypes into a single class. The results indicate that redundancy and shadowing dominate firewall misconfigurations, while generalization anomalies occur far less frequently.

Runtime and scalability are critical factors in validating the practicality of the proposed logical solution. Detecting firewall anomalies through logic-based reasoning can be formulated as an AI search problem. In general, runtime and scalability limitations are primarily associated with complete or blind search strategies, where the search space grows exponentially. In contrast, the proposed predefined FOL rules act as heuristic search operators, enabling direct, guided, and efficient anomaly identification without exhaustive exploration of the full rule space. This heuristic characteristic significantly enhances scalability while reducing runtime overhead. Moreover, our previous study by [9] experimentally demonstrated that FOL rule-based heuristic reasoning provides scalable performance with reasonable execution time.

In Section 3, Anomalies in Firewall Policy, we systematically define and highlight all known firewall policy anomalies, supported by references [8,27,28,29,30]. To the best of our knowledge, and based on the established literature, Section 3 comprehensively covers all recognized anomaly cases in firewall policies. Accordingly, because the logical rules developed in Section 4 address every anomaly type discussed in Section 3, we argue that the proposed logical detection framework is complete with respect to the established anomaly space reported in prior studies.

This study is limited to static firewall policy analysis, where the rule set is assumed to be fully specified and unchanged during evaluation. Accordingly, the proposed FOL-based framework does not address incomplete, ambiguous, or dynamically changing firewall rules. Extending the framework to support dynamic policy updates and adaptive network environments remains an important direction for future work.

Future work will focus on scaling the proposed approach to large-scale enterprise networks, integrating automatic rule optimization and remediation mechanisms, and extending the model to support dynamic and software-defined networking (SDN) environments. Incorporating temporal analysis and real-time policy updates into the knowledge graph is another promising direction for enhancing proactive firewall management.

Author Contributions

Conceptualization, A.O.E.; methodology, A.O.E. and A.A. (Amer Aljaedi); software, A.A. (Abdulhadi Albluwi) and A.O.E.; validation, M.H.M.N.; writing—original draft preparation, A.O.E. and A.A. (Amer Aljaedi); writing—review and editing, M.H.M.N. and A.A. (Abdulhadi Albluwi) and supervision, A.O.E. and A.A. (Amer Aljaedi). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Data Availability Statement

The data presented in this study are openly available in [Real Time Network Policy Checking Using Header Space Analysis]. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/kazemian (accessed on 1 June 2025). The data used in this study consist of network configuration datasets used for evaluating the Header Space Analysis framework. Due to privacy and security considerations, these datasets are not publicly available but can be requested from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bringhenti, D.; Marchetto, G.; Sisto, R.; Valenza, F.; Yusupov, J. Automated firewall configuration in virtual networks. IEEE Trans. Dependable Secur. Comput. 2022, 20, 1559–1576. [Google Scholar] [CrossRef]
Mijwil, M.; Unogwu, O.J.; Filali, Y.; Bala, I.; Al-Shahwani, H. Exploring the top five evolving threats in cybersecurity: An in-depth overview. Mesopotamian J. Cybersecur. 2023, 2023, 57–63. [Google Scholar] [CrossRef]
Coscia, A.; Dentamaro, V.; Galantucci, S.; Maci, A.; Pirlo, G. An innovative two-stage algorithm to optimize Firewall rule ordering. Comput. Secur. 2023, 134, 103423. [Google Scholar] [CrossRef]
Alicea, M.; Alsmadi, I. Misconfiguration in firewalls and network access controls: Literature review. Future Internet 2021, 13, 283. [Google Scholar] [CrossRef]
Gupta, S.; Gosain, D.; Kwon, M.; Acharya, H.B. DeeP4R: Deep Packet Inspection in P4 using Packet Recirculation. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications; IEEE: Piscataway, NJ, USA, 2023; pp. 1–10. [Google Scholar]
Elfaki, A.O.; Aljaedi, A. Deep Analysis and Detection of Firewall Anomalies Using Knowledge Graph. In 12th International Conference on Pattern Recognition Applications and Methods; Springer: Berlin/Heidelberg, Germany, 2023; pp. 411–417. [Google Scholar]
Kim, T.; Kwon, T.; Lee, J.; Song, J. F/Wvis: Hierarchical Visual Approach for Effective Optimization of Firewall Policy. IEEE Access 2021, 9, 105989–106004. [Google Scholar] [CrossRef]
Dhrir, H.; Charfeddine, M.; Tarhouni, N.; Kammoun, H.M. Machine learning-and deep learning-based anomaly detection in firewalls: A survey. J. Supercomput. 2025, 81, 761. [Google Scholar] [CrossRef]
Elfaki, A.O. A rule-based approach to detect and prevent inconsistency in the domain-engineering process. Expert Syst. 2016, 33, 3–13. [Google Scholar] [CrossRef]
Arthur, J.K.; Kwadwo, E.; Doh, R.F.; Mantey, E.A. Firewall rule anomaly detection and resolution using particle swarm optimization algorithm. Int. J. Comput. Appl. 2019, 178, 975–8887. [Google Scholar] [CrossRef]
Moradi Vartouni, A.; Teshnehlab, M.; Sedighian Kashi, S. Leveraging deep neural networks for anomaly-based web application firewall. IET Inf. Secur. 2019, 13, 352–361. [Google Scholar] [CrossRef]
Valenza, F.; Cheminod, M. An Optimized Firewall Anomaly Resolution. J. Internet Serv. Inf. Secur. 2020, 10, 22–37. [Google Scholar]
Togay, C.; Kasif, A.; Catal, C.; Tekinerdogan, B. A firewall policy anomaly detection framework for reliable network security. IEEE Trans. Reliab. 2021, 71, 339–347. [Google Scholar] [CrossRef]
Kulyadi, S.P.; Mohandas, P.; Kumar, S.K.S.; Raman, M.S.; Vasan, V.S. Anomaly detection using generative adversarial networks on firewall log message data. In 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI); IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Fotiadou, K.; Velivassaki, T.H.; Voulkidis, A.; Skias, D.; Tsekeridou, S.; Zahariadis, T. Network traffic anomaly detection via deep learning. Information 2021, 12, 215. [Google Scholar] [CrossRef]
Lorenz, C.; Schnor, B. Firewall management: Rapid anomaly detection. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys); IEEE: Piscataway, NJ, USA, 2022; pp. 1465–1472. [Google Scholar]
Toprak, S.; Yavuz, A.G. Web application firewall based on anomaly detection using deep learning. Acta Infologica 2022, 6, 219–244. [Google Scholar] [CrossRef]
Lin, Z.; Yao, Z. Firewall anomaly detection based on double decision tree. Symmetry 2022, 14, 2668. [Google Scholar] [CrossRef]
Khummanee, S.; Chomphuwiset, P.; Pruksasri, P. DSSF: Decision Support System to Detect and Solve Firewall Rule Anomalies based on a Probability Approach. ECTI Trans. Comput. Inf. Technol. (ECTI-CIT) 2022, 16, 56–73. [Google Scholar] [CrossRef]
Aljabri, M.; Alahmadi, A.A.; Mohammad, R.M.A.; Aboulnour, M.; Alomari, D.M.; Almotiri, S.H. Classification of firewall log data using multiclass machine learning models. Electronics 2022, 11, 1851. [Google Scholar] [CrossRef]
Shaheed, A.; Kurdy, M.H.D. Web application firewall using machine learning and features engineering. Secur. Commun. Netw. 2022, 2022, 5280158. [Google Scholar] [CrossRef]
Al-Haijaa, Q.A.; Ishtaiwia, A. Machine learning based model to identify firewall decisions to improve cyber-defense. Int. J. Adv. Sci. Eng. Inf. Technol. 2021, 11, 1688–1695. [Google Scholar] [CrossRef]
Andalib, A.; Babamir, S.M. Anomaly detection of policies in distributed firewalls using data log analysis. J. Supercomput. 2023, 79, 19473–19514. [Google Scholar] [CrossRef]
Bringhenti, D.; Seno, L.; Valenza, F. An Optimized Approach for Assisted Firewall Anomaly Resolution. IEEE Access 2023, 11, 119693–119710. [Google Scholar] [CrossRef]
Komadina, A.; Kovačević, I.; Štengl, B.; Groš, S. Detecting anomalies in firewall logs using artificially generated attacks. In 2023 17th International Conference on Telecommunications (ConTEL); IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
Wang, Z.; AnilKumar, A. W2R: An Ensemble Anomaly Detection Model Inspired by Language Models for Web Application Firewalls Security. Master’s Thesis, Halmast University, Halmstad, Sweden, 2023. Available online: https://www.diva-portal.org/smash/get/diva2:1772725/FULLTEXT02 (accessed on 1 June 2025).
Park, J.; Park, B.; Kim, T.S. Development of an Anomaly Classification Model and a Decision Support Tool for Firewall Policy Configuration. Appl. Sci. 2025, 15, 2979. [Google Scholar] [CrossRef]
Karafili, E.; Valenza, F.; Chen, Y.; Lupu, E.C. Towards a Framework for Automatic Firewalls Configuration via Argumentation Reasoning. In NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
Ahmed, Z.; Askari, S.M.S. Firewall Rule Anomaly Detection: A Survey. Int. J. Comput. Intell. IoT 2018, 2. [Google Scholar]
Hakani, D.; Mann, P.S. Intra Firewall Anomaly Policies Detection in Cloud Environment Using Firewall Tree. Trans. Indian Natl. Acad. Eng. 2025, 10, 63–72. [Google Scholar] [CrossRef]
Chao, C.S. A Feasible Anomaly Diagnosis Mechanism for Stateful Firewall Rules. In 2018 27th International Conference on Computer Communication and Networks (ICCCN); IEEE: Piscataway, NJ, USA, 2018; pp. 1–2. [Google Scholar]
Elfaki, A.O.; Fong, S.L.; Aik, K.L.T.; Johar, M.G.M. Towards detecting redundancy in domain engineering process using first order logic rules. Int. J. Knowl. Eng. Soft Data Paradig. 2013, 4, 1–20. [Google Scholar] [CrossRef]
Kazemian, P.; Chang, M.; Zeng, H.; Varghese, G.; McKeown, N.; Whyte, S. Real time network policy checking using header space analysis. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13); USENIX Association: Berkeley, CA, USA, 2013; pp. 99–111. [Google Scholar]

Figure 1. Illustration of the detection of the first shadowing case.

Figure 4. Illustration of the detection of the fifth case of shadowing.

Figure 5. Snapshot of the second case of correlation.

Figure 6. Snapshot of a generalization.

Figure 7. Illustration of the overall distribution of firewall anomaly categories.

Table 1. An instance of a shadowing anomaly.

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	TCP	10.0.0.0/28	*	20.0.0.0/28	22	Deny
I₂	TCP	10.0.0.5	*	20.0.0.3	22	Allow

Table 2. An instance of a correlation anomaly: Conflicting.

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	TCP	10.0.0.5	*	20.0.0.0/28	22	Allow
I₂	TCP	10.0.0.0/28	*	20.0.0.3	22	Deny

Table 3. An instance of a correlation anomaly: Redundant.

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	TCP	10.0.0.5	*	20.0.0.0/28	22	Allow
I₂	TCP	10.0.0.0/28	*	20.0.0.3	22	Allow

Table 4. An instance of a generalization anomaly.

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	TCP	10.0.0.5	*	20.0.0.3	22	Allow
I₂	TCP	10.0.0.0/28	*	20.0.0.0/28	22	Deny

Table 5. An instance of an irrelevant anomaly (case 1).

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	TCP	10.0.0.5	*	10.0.0.5	22	allow

Table 6. An instance of an irrelevant anomaly (case 2).

Index	Proto.	Src-IP	Src-Port	Dst-IP	Dst-Port	Action
I₁	UDP	10.0.0.5	*	60.0.0.3	70	allow

Table 7. An instance of inter-firewall anomalies.

Firewall F1 (Upstream)
Rule	Source	Destination	Protocol	Action
R₁	*	10.0.0.20	HTTP	allow
Firewall F2 (Downstream)
Rule	Source	Destination	Protocol	Action
R₁	*	10.0.0.20	HTTP	deny

Table 8. An instance of a spuriousness anomaly.

Firewall F1 (Upstream)
Rule	Source	Destination	Protocol	Action
R₁	*	10.0.0.10	HTTP	allow
Firewall F2 (Downstream)
Rule	Source	Destination	Protocol	Action
R₁	192.168.1.0/24	10.0.0.10	SSH	allow
R2	*	*	*	Deny

Table 9. Firewall policy graph construction and port normalization algorithm.

#	Step
1	Input: Firewall policy dataset in CSV format
2	Output: Neo4j knowledge graph with normalized destination ports
3	Load firewall policy CSV into Neo4j
4	For each record in the dataset, create or merge a node representing a firewall rule
5	Assign rule attributes: Protocol, source IP, destination IP, source port, destination port, and action
6	For each firewall rule node, inspect the destination port field
7	If the destination port is expressed as a range (e.g., x–y), convert it into a numeric interval ([x, y])
8	Ensure that the destination port attribute is represented as a list
9	For each element in the destination port list, perform normalization
10	Map symbolic port values (any, established) to predefined numeric codes
11	Convert all remaining port values from string format to integers
12	Update the firewall rule node with the normalized destination port list
13	End processing of all firewall rules
14	Return the constructed knowledge graph

Table 10. Definition of Packet 2001.

Action: “deny”

DstIP: “any”

DstPort: “any”

dstPortFrom: 0

dstPortTo: 65535

Index: 2001

Protocol: “ip”

SrcIP: “any”

SrcPort: “any”

Table 11. Definition of packet 139.

Action: “permit”

DstIP: “any”

DstPort: “any”

dstPortFrom: 0

dstPortTo: 65535

Index: 139

Protocol: “ip”

SrcIP: “any”

SrcPort: “any”

Table 12. Snapshot of detected shadowing rules.

fr₁ Index	fr₂ Index	Reason for Shadowing
12	25	fr₁ specifies source IP as any, fully covering the specific source IP defined in fr₂, with conflicting actions
47	63	fr₁ uses protocol any, thereby matching all packets of fr₂ that specify a concrete protocol
88	97	fr₁ defines a generalized destination IP range that completely includes the destination IP of fr₂, with different actions

Table 15. Snapshots of correlated rules.

fr₁ Index	fr₂ Index	Reason
15	27	Partial source IP overlap, different actions
42	58	Same protocol, overlapping destination range
103	118	Shared packets, conflicting actions

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Elfaki, A.O.; Albluwi, A.; Aljaedi, A.; Nerma, M.H.M. Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics 2026, 15, 1714. https://doi.org/10.3390/electronics15081714

AMA Style

Elfaki AO, Albluwi A, Aljaedi A, Nerma MHM. Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics. 2026; 15(8):1714. https://doi.org/10.3390/electronics15081714

Chicago/Turabian Style

Elfaki, Abdelrahman Osman, Abdulhadi Albluwi, Amer Aljaedi, and Mohamed Hussien Mohamed Nerma. 2026. "Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation" Electronics 15, no. 8: 1714. https://doi.org/10.3390/electronics15081714

APA Style

Elfaki, A. O., Albluwi, A., Aljaedi, A., & Nerma, M. H. M. (2026). Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics, 15(8), 1714. https://doi.org/10.3390/electronics15081714

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation

Abstract

1. Introduction

2. Related Work

3. Anomalies in Firewall Policy

3.1. Anomalies in a Single Firewall

3.1.1. Shadowing Firewall Anomalies

3.1.2. Correlation Firewall Anomaly

3.1.3. Generalization Firewall Anomaly

3.1.4. Irrelevant Firewall Anomaly

3.2. Anomalies in Distributed Firewalls

Spuriousness Anomaly

4. Validation Rules and Knowledge Graph Illustration

4.1. Dataset Description and Utilization

4.2. Shadowing Detection

4.3. Correlation Detection Top of Form

4.4. Generalization Detection

4.5. Irrelevant and Redundancy Detection

5. Prove the Correctness of the Proposed Rules Using Machine Learning

5.1. Find Shadowing Firewall Rules Using Logistic Regression

5.2. Find Redundant Firewall Rules

5.3. Find Correlated Firewall Rules Using Logistic Regression

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI