Next Article in Journal
Mitigating Acoustic Multipath Effects Using OFDM: An Experimental SDR Study
Next Article in Special Issue
A Methodology for Quantitative Security Evaluation of Operating Systems: Scenario-Based Comparison of Qubes OS and Windows 11
Previous Article in Journal
Secrecy Performance of MIMOME Communications in Low-Altitude Economic Networking with Keyhole Channels
Previous Article in Special Issue
Hierarchical Deep Learning for File Fragment Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation

by
Abdelrahman Osman Elfaki
1,*,
Abdulhadi Albluwi
2,
Amer Aljaedi
2 and
Mohamed Hussien Mohamed Nerma
3
1
Department of Computer Science, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia
2
Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia
3
Department of Computer engineering, Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(8), 1714; https://doi.org/10.3390/electronics15081714
Submission received: 5 March 2026 / Revised: 13 April 2026 / Accepted: 14 April 2026 / Published: 17 April 2026

Abstract

Firewall policy misconfigurations remain a major source of security vulnerabilities in modern networks, particularly as firewall rule sets grow in size and complexity. Such misconfigurations, commonly referred to as firewall anomalies, can lead to unintended access control behavior and undermine network security. In this paper, we propose a formal logic rule-based framework for the systematic detection and investigation of firewall anomalies, supported by knowledge graph-based visualization. First-order logic (FOL) is employed to precisely model firewall rules and to define major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, in both single and distributed firewall environments. The proposed framework introduces explicit and comprehensive logical definitions for each anomaly type, enabling deterministic, interpretable, and complete detection of rule conflicts and overlaps. Complex anomalies, particularly correlation and generalization, are systematically decomposed into well-defined logical cases to facilitate the accurate identification of subtle, order-dependent interactions among firewall rules. To enhance usability and analysis, firewall rules and detected anomalies are represented using Neo4j knowledge graphs, providing intuitive visual insights into rule relationships and anomaly causes. The effectiveness of the proposed approach is validated using a real operational backbone network dataset collected from Stanford University’s campus network. Experimental results demonstrate the framework’s ability to accurately detect both simple and complex firewall anomalies under realistic network conditions. To further validate the proposed logic rules, a machine learning-based evaluation was conducted. The findings confirm their effectiveness in accurately characterizing firewall anomalies. Unlike machine learning or heuristic-based methods, the proposed approach does not require training data and guarantees formal correctness and explainability. These features make it a robust and practical solution for firewall policy verification and network security management.

1. Introduction

The Internet has become an integral part of our daily lives in numerous ways. Hence, our contemporary lives depend on the Internet to complete and perform a lot of tasks. Protecting data during online transactions is crucial for maintaining privacy and security. A firewall is a device for network security that examines incoming and outgoing network traffic, determining whether to permit or restrict specific traffic based on a predefined set of security rules [1]. Firewalls are essential security components used to protect computer networks from unauthorized access and potential threats. They act as a barrier between an internal network (such as a corporate network) and external networks (such as the Internet), controlling the traffic flow based on predefined rules. A firewall is considered one of the most well-known cybersecurity tools and plays a key role in many cybersecurity strategies [2].
Firewalls can accumulate a significant number of rules as network environments become more complex. Firewall configurations can consist of a substantial number of rules, often ranging from hundreds to even thousands of rules [3]. The number of rules in a firewall configuration can vary widely depending on the complexity of the network and the specific security requirements. In large organizations or networks with diverse services and applications, the firewall rule set can grow significantly. Networks with multiple subnets, different security zones, and complex traffic routing often require a larger number of rules to ensure proper network segmentation and protection [4,5]. Configuring and setting up a firewall is a complex and error-prone task for the following four reasons:
(i)
The firewall system often contains a large number of rules.
(ii)
Legacy firewall rules may have been designed and implemented by different administrators.
(iii)
Rules can be added to or removed from the firewall system at different times.
(iv)
Firewall policies might be maintained in the network by more than one administrator.
The aforementioned reasons illustrate the complex nature of managing firewall policies and their implementation. Therefore, automated tools are necessary to ensure the accuracy and efficiency of firewall rules, given the large number of rules in place. The primary function of these automated tools is to validate the set of filtering rules and assist in the optimization process [6].
In real-world scenarios, network security issues can arise as a result of firewall configuration errors. In the literature, these configuration errors within firewall rules are commonly referred to as firewall anomalies. The prominent firewall anomalies discussed in the literature include shadowing, redundancy, correlation, generalization, and irrelevance [7,8]. In this paper, the analysis of firewall anomalies using FOL supported by knowledge graphs has been introduced. First-order logic (FOL) is utilized to define the firewall anomalies. FOL has been selected due to its ability to describe the relationships between objects [9]. Secondly, the denoted notations have been described. Third, new definitions of each firewall anomaly have been introduced in the form of FOL rules. Finally, experiments have been conducted to prove the correctness and applicability of our FOL rules. In this paper, we have used a dataset which is real backbone network configuration data collected from Stanford University campus routers.
The paper makes the following contributions. It proposes a formal first-order logic-based framework for firewall policy analysis that enables precise, deterministic, and explainable detection of firewall anomalies. The paper introduces complete logical definitions for all major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, across both single and distributed firewall environments. Complex anomalies are systematically decomposed into explicit logical cases, ensuring accurate detection of order-dependent and overlapping rules. The framework is further strengthened by knowledge graph-based visualization using Neo4j, and its effectiveness is validated on real backbone network configuration data, demonstrating practical applicability under realistic conditions. To prove the correctness, a machine learning approach is utilized to prove the correctness of our proposed logic rules.
The remainder of this paper is organized as follows. Section 2 reviews and analyzes the related literature. Section 3 presents the definitions and categories of firewall policy anomalies. Section 4 introduces the proposed validation rules and the knowledge graph-based visualization framework. Section 5 employs machine learning techniques to validate the correctness and practical applicability of the proposed logic rules. Finally, Section 6 concludes the paper and outlines future research directions.

2. Related Work

Arthur et al. [10] developed an automated tool to determine and resolve these firewall anomalies. This tool solves the anomalies by selecting the best position of the rule index by using a heuristic approach based on the particle swarm optimization. The work in [11] developed a model for firewall anomalies detected by utilizing the parallel-feature-fusion technique. They used a stacked autoencoder and deep belief network as feature learning methods. The work in [12] developed a formal system utilizing logic solver for detecting firewall anomalies.
Togay et al. [13] suggested an anomaly detection framework based on the Java web application platform. This framework is based on linear logic where traditional anomalies could be detected. Kulyadi et al. [14] designed a firewall anomaly detection model using a Recurrent Neural Network (RNN) that learns the normal behavior of the firewall and the complex spatio-temporal correlations in the data. The model detects anomalies that can potentially be malware. The work in [15] developed a model that combines Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) to analyze log files and detect firewall anomalies. In this work, logic notation is used as a formalization tool. The work in [16] developed an approach based on Header Space Analysis (HSA) for enhancing the performance of firewall anomaly detection. Their HAS approach allows for statically checking network specifications and configurations. Toprak and Yavuz [17] developed a deep learning semi-supervised model to detect firewall anomalies. For training and testing the model, they utilized the PayloadAllTheThings dataset. The work in [18] developed a firewall anomaly detection algorithm based on an asymmetric double decision tree. They utilized the packet filter, the first matching rule for the practical decision. The work in [19] developed a prototype of a decision support system for optimizing firewall rules aiming to detect firewall anomalies by using a probability approach. The work in [20] designed machine learning and deep learning multi class models for analyzing firewall logs and classifying the actions. Their results show the applicability of the two models in discovering firewall anomalies. In [21] they developed four classification algorithms (Naive Bayes, logistic regression, decision tree, and support vector machine) for discovering firewall anomalies using feature engineering. The work in [22] developed a machine learning classifier that can be used to provide the correct action in firewall records. They employed confusion matrix parameters as evaluation metrics.
The work in [23] developed a model that analyzes log files to identify firewall anomaly patterns. The work in [24] developed an optimized model that aids administrators in detecting firewall anomalies, which used rule-based logic as an optimization tool. The work in [25] conducted experiments by using injected anomalies into the existing firewall logs, with the aim of comparing the efficiency of supervised and unsupervised learning techniques. Their results proved that the unsupervised learning method had difficulty detecting the injected anomalies. The work in [26] proposed a new unsupervised anomaly detection model. They utilized natural language processing methods for feature extraction and PCA for dimension reduction.
A new model and visualization tool, developed by [27], helps information security managers identify and prioritize anomalies in firewall policies. This model is developed based on considering exceptional rules, which are often misclassified as anomalies in existing models. However, there is no proof that this model is able to detect all cases of firewall anomalies; completeness is an issue.

3. Anomalies in Firewall Policy

This section examines firewall policy anomalies at two levels. The first level focuses on anomalies occurring within a single firewall, whereas the second level considers anomalies arising in distributed firewall architectures. According to [8], the primary rule anomalies in a single firewall are shadowing, correlation, generalization, and irrelevance. Each of these anomalies is defined and discussed in the subsequent subsections. Additionally, distributed firewall environments may exhibit inter-firewall anomalies and spuriousness anomalies [27].

3.1. Anomalies in a Single Firewall

In the following section, the anomalies in a single firewall have been defined and illustrated.

3.1.1. Shadowing Firewall Anomalies

A networking rule is considered a shadowing anomaly if it is placed after another rule that already covers all the same network traffic. Because the first rule will always be applied, the “shadowed” rule will never be activated or counted. Table 1 shows an instance of a shadowing anomaly. In Table 1, rule 2 (index:I2) is shadowed by rule 1 (index:I1). Rule 2 follows rule 1, and rule 2 is a subset match of rule 1 and the actions of rule 1 and rule 2 are different. Rule 2 is never activated because an earlier, more general rule captures the traffic it is meant to handle.

3.1.2. Correlation Firewall Anomaly

A correlation anomaly occurs when two firewall rules have different actions, and their matching criteria partially overlap. This means some packets will be matched by both rules, but neither rule completely covers the other. The outcome for a specific packet depends on which rule is evaluated first, making the policy order-sensitive and unpredictable. For example, one rule might allow traffic from a specific IP to a range of ports, while a second rule denies traffic from that same IP to a specific port within that range. The order of these rules determines the packet’s fate [28]. Depending on the specific security policy in place, this anomaly could be classified as either:
Conflicting: If the firewall processes rules in a way that allows the second rule to override the first, it creates a direct conflict in action.
Redundant: If the policy dictates a specific order of rule execution (e.g., the first rule to match wins), then one of the rules essentially becomes redundant, as its action will never be applied to that specific packet. Table 2 shows an instance of a conflicting correlation anomaly. Table 3 shows an instance of redundancy in correlation anomaly.

3.1.3. Generalization Firewall Anomaly

A generalization anomaly occurs when one rule’s matching criteria are a superset of a second rule’s criteria, and they have different actions. The more general rule (the superset) appears after the more specific rule. Because firewalls process rules sequentially, the specific rule will always be evaluated first for any packet that matches both. The more general rule will, therefore, never be applied to those specific packets, making it effectively useless for that subset of traffic. For example, one rule allows traffic from a single IP address, and a second, more general rule (that follows the first) denies all traffic from the entire network subnet that includes that IP. The second rule will never affect the traffic from that specific IP because the first rule will always be a match [29]. Table 4 shows this instance of a generalization anomaly.

3.1.4. Irrelevant Firewall Anomaly

An irrelevant firewall anomaly occurs when a rule in a firewall policy is never able to be matched by any incoming traffic. This happens because another rule that appears earlier in the policy is more general and already covers all the traffic that the later, more specific rule would have matched [30]. We can identify two main scenarios where a firewall rule is irrelevant:
  • Erroneous IP entries: The source and destination IP addresses are identical within the rule. This rule is ineffective because a network packet cannot originate and terminate at the same host.
  • Irrelevant addressing: The rule contains non-existent IP addresses or services that do not align with the network’s addressing scheme. For instance, a rule might specify a destination or port that does not exist on the network, making it impossible for any traffic to match the rule.
Because the source and destination IP addresses in firewall rule are identical (as seen in Table 5), the rule is ineffective and will not match any incoming packets.
As an example of a case 2 irrelevant anomaly, the rule in Table 6 is ineffective. The packet with the tuple (UDP, 60.0.0.3, 70) would never match the rule because: the destination IP address (60.0.0.3) does not belong to any machine on the network, and no application is actively listening for traffic on port 70. Because the destination, service, or both do not exist, the firewall rule will never match any incoming packets. Table 6 shows an instance of an irrelevant anomaly (case 2).

3.2. Anomalies in Distributed Firewalls

Inter-firewall anomalies: This specific type of anomaly refers to conflicts that occur between different firewalls in a distributed network. A common scenario is when one firewall allows a packet, but a subsequent firewall on the same path blocks it, leading to inconsistent security policies and potential vulnerabilities. Table 7 shows an instance of inter-firewall anomalies. In Table 7, F1 explicitly allows HTTPS traffic to the internal server, but F2 blocks HTTPS traffic to the same server.

Spuriousness Anomaly

This anomaly, often discussed in the context of distributed firewalls, occurs when a packet is allowed by an upstream firewall but is not covered by any rule in a downstream firewall. Table 8 shows an instance of a spuriousness anomaly. In Table 8, F1 explicitly allows HTTP traffic to the internal server, but F2 has no rule for HTTP traffic to 10.0.0.10.

4. Validation Rules and Knowledge Graph Illustration

This section illustrates the validation rules used for firewall anomaly detection, which are based on the definition provided in the previous section. In addition, according to [31], the standard structure of a firewall rule is ⟨order, protocol, src_ip, src_port, dst_ip, dst_port, action⟩. In this section, firewall rules are represented using first-order logic [32] and follow the same concept introduced by [31]. The general syntax of the proposed validation rule is defined as
fr(index, proto, scr_IP, src_port, dst_IP, dst_port, action). This representation is necessary for applying the proposed logic rules.
In this representation, fr refers to a firewall rule, index represents the unique identifier of the rule, proto indicates the network protocol in use, src_IP and src_port denote the source IP address and source port, respectively, while dst_IP and dst_port specify the destination IP address and destination port. Finally, action defines the decision applied to the traffic, which can be either allow or deny.
In light of the above discussion, firewall anomalies can be summarized into four categories as follows:
Shadowing     ( l , ( f r a l f r b l ) ) ( f r a a c t i o n f r b a c t i o n )
Redundancy ( l , ( f r a l f r b l ) ) ( f r a a c t i o n = f r b a c t i o n )
Correlation   ( l , ( f r a l f r b l ) ) ( l , ( f r a l f r b l ) ) ( f r a a c t i o n f r b a c t i o n )
Generalization   ( l , ( f r a l f r b l ) ) ( f r a a c t i o n f r b a c t i o n )
Here, fr denotes a firewall policy, while a and b represent the ordering of firewall policies, with a > b indicating higher priority. The action specifies the firewall operation, which belongs to {allow, deny}, meaning that the packet is either forwarded or dropped. The symbol l refers to an individual field of a firewall policy, such as the source IP (src_IP), destination IP (dst_IP), source port (src_Port), destination port (dst_Port), or protocol type (Proto), where l { Proto , SrcIP , SrcPort , DstIP , DstPort } and l l .

4.1. Dataset Description and Utilization

The dataset used in this paper is derived from real operational backbone network data collected from Stanford University’s campus network infrastructure, commonly referred to in the literature as the Stanford backbone dataset (Stanford University, “Campus backbone router configuration and ARP table data,” Stanford, CA, USA, unpublished research dataset) [33]. The dataset consists of router-level information extracted from core network devices, including Address Resolution Protocol (ARP) tables, IP–MAC mappings, interface identifiers, and VLAN assignments.
The data reflects actual backbone-level network configurations, capturing multiple subnets, virtual LANs, and inter-router connections. IP address ranges such as 171.64.0.0/16, which are publicly assigned to Stanford University, confirm the provenance of the dataset. The dataset does not contain packet payloads or user-level information; instead, it represents control-plane and configuration-level data, making it suitable for firewall policy modeling and anomaly analysis.
In this work, the dataset is used to reconstruct realistic network scopes and policy domains, from which firewall rules and packet-matching conditions are inferred. This enables the evaluation of firewall anomalies such as shadowing, redundancy, correlation, and generalization under real network conditions rather than synthetic rule sets. Since the dataset originates from operational infrastructure, it provides a realistic and challenging benchmark for validating firewall policy verification techniques.
To prepare and normalize this dataset for use in Neo4j, we developed the software described in Table 9, which outlines the firewall policy graph construction and the port normalization algorithm.
In the following section, first-order logic (FOL) rules are provided to illustrate how each anomaly type is represented. The use of first-order logic (FOL) for developing validation rules was previously explained by [6].

4.2. Shadowing Detection

The general form of a shadow anomaly is as below:
∀ fr, PC: (I1 < I2) ∧ (PC1fr1) ∧ (PC2∈ fr2) ∧ (PC2fr1) ∧(PC1\== PC2) ⟹fr2 is Shadowing of fr1
Equation (5) indicates that rule fr2 is positioned after fr1 in the firewall rule ordering; therefore, fr1 is evaluated first. Rule fr1 matches two packet classes (PC1 and PC2), whereas fr2 matches only packet class PC2. Since all packet fields matched by fr2 are already covered by fr1, Equation (5) represents a general shadowing anomaly, in which fr2 is completely shadowed by fr1. Moreover, when both rules specify the same action, this situation is additionally classified as a redundancy anomaly.
In the following, the shadowing anomaly has been classified into four cases as follows:
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,action2)∧(I1<I2)∧(proto1=proto2)∧(src_IP1=src_IP2)∧(dst_IP1= dst_IP2)∧(dst_port1=dst_port2)∧(action1~=action2)⟹Shadowing_1
Equation (6) represents the first case of a shadowing anomaly, where rules fr1 and fr2 are modeled using predicates. Both rules share identical protocol types, source IP addresses, destination IP addresses, and destination ports, while specifying different actions. Since fr1 precedes fr2 in the rule order, fr2 is completely shadowed by fr1. This scenario is referred to as the first case of shadowing. Figure 1 illustrates the detection of the first case of shadowing in the dataset after applying Rule 6.
In Figure 1, packet 2001 has a first case of shadowing with packet 139. Table 10 and Table 11 show the definitions of packets 2001 and 139 respectively. As is shown from the definitions of packets 2001 and 139, they have the same protocol types, source and destination IP addresses, and destination ports, but define different actions. Therefore, this case is considered the first case of shadowing.
The same pair of rules can meet more than one shadowing definition at the same time. This does not mean the detection is redundant; rather, it shows that the pair satisfies multiple independent shadowing criteria (e.g., an exact match as well as broader field coverage), and each criterion is treated as a separate anomaly case in our model.
The subsequent sections present the remaining possible forms of shadowing anomalies.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1) fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2, action2) (I1<I2)∧ ((proto1 = any) ∨(proto2 ∈ proto1))∧ (src_IP2 = src_IP1) ∧ (dst_IP2 = dst_IP1) ∧ (dst_port2 = dst_post1) (action2~=action1) Shadowing_2
Equation (7) indicates that fr2 is shadowed by fr1. In fr1, the protocol is set to “any”, meaning it matches all protocol types (e.g., TCP or UDP). Therefore, regardless of the protocol value in fr2, it will always be included within the protocol scope of fr1. In the second case, where proto2 ∈ proto1, the protocol specified in fr2 is a subset of the protocol set defined in fr1. For example, if proto1 = {TCP, UDP}, then proto2 must be one of these protocols, i.e., proto2 ∈ proto1. All remaining packet fields are identical in fr1 and fr2, but the two rules apply different actions, which leads to the shadowing relationship. Figure 2 illustrates the detection of the second case of shadowing in the dataset after applying Rule 7.
Figure 2. Illustration of the detection of the second case of shadowing.
Figure 2. Illustration of the detection of the second case of shadowing.
Electronics 15 01714 g002
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,pro2,src_IP2,src_port2,dst_IP2,dst_port2,action2)∧(I1<I2)∧(proto2=proto1)∧ ((src_IP1 = any) ∨(src_IP2 ∈ src_IP1)) ∧ (dst_IP2 = dst_IP1)∧ (dst_port2 = dst_post1) ∧(action2~= action1) Shadowing_3
Equation (8) indicates that fr2 is shadowed by fr1. In this equation, the source IP of fr1 can appear in two forms: it may be set to “any”, or it may be an aggregate source IP. An aggregate source IP means that a single rule (fr1) specifies multiple source IP addresses or a range of addresses. In contrast, the source IP of fr2 is a single IP that is included within the source IP set defined in fr1. For example, if the source IP of fr1 is {10.0.0.1–10.0.0.10} and the source IP of fr2 is {10.0.0.5}, then the source IP of fr2 is contained within fr1. In Equation 8, since the source IP of fr2 is included in fr1, and the remaining packet fields are equivalent while the actions differ, fr2 becomes a shadowed rule of fr1. Figure 3 illustrates the detection of the third case of shadowing in the dataset after applying Equation (8).
Figure 3. Illustration of the detection of the third case of shadowing.
Figure 3. Illustration of the detection of the third case of shadowing.
Electronics 15 01714 g003
∀:I,pro,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1) fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,action2)(I1<I2)∧(src_IP2= src_IP1)∧((dst_IP1 = any)∨ (dst_IP2 ∈ dst_IP1))(action2~=action1)Shadowing_4
Equation (9) shows that fr2 is shadowed by fr1. In this case, the destination IP of fr1 can take one of two forms: it may be specified as “any”, or it may be an aggregate destination IP. An aggregate destination IP means that a single rule (fr1) contains multiple destination IP addresses or a destination IP range. In contrast, fr2 includes a single destination IP that is contained within the destination IP set of fr1. For example, if the destination IP in fr1 is {10.0.0.1–10.0.0.10} and the destination IP in fr2 is {10.0.0.5}, then the destination IP of fr2 belongs to the destination IP range defined in fr1. Since Equation (9) indicates that the destination IP of fr2 is included in fr1, while the remaining packet fields are equivalent, but the actions are different, fr2 is considered a shadowed rule of fr1. Equation (9) denotes the fourth case of shadowing.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,action2)∧(I1<I2)∧(src_IP2= src_IP1)∧(dst_IP2=dst_IP1)∧((dst_port1=any)∨(dst_port2∈dst_port1))∧ (action2~=action1)⟹ Shadowing_5
Equation (10) denotes that fr2 is shadowed by fr1. In this equation, there are two scenarios of the destination port of fr1, either equal to “any”, or it is an aggregate port. An aggregate destination port means there are multiple destination ports in single rule, fr1. The destination port of fr2 is a single port and belongs to the destination port of fr1. For instance, suppose destination port of fr1 = {40, 60, 80} and destination port of fr2 is {40}; hence the destination port of fr2 belongs to fr1. Following the concept in Equation (5) (the general form of the shadow anomaly), the destination port of fr2 belongs to fr1 and the rest of the packet details are equivalent with different actions. Therefore, fr2 is a shadow of fr1. Equation (10) denotes the fifth case of shadowing. Figure 4 shows a snapshot of the fifth case of shadowing.
The following discusses the rules used to detect correlation anomalies. Shadowing occurs when the entire set of packets matched by fr2 is included within fr1, yet the two rules specify different actions. However, a correlation anomaly appears when fr1 and fr2 partially overlap, meaning some packets of fr2 are covered by fr1, and some packets of fr1 are also matched by fr2, with different actions applied.
The visual structure of the knowledge graph does not explicitly encode the causal reason for shadowing. Consequently, all shadowing cases exhibit the same graphical pattern. Differentiation among these cases is achieved through semantic annotations, including the relationship type, the underlying logical conditions, and the specific rule attributes involved.

4.3. Correlation Detection Top of Form

The general form of a correlation anomaly is illustrated by Equation (11) as follows:
∃PC,∀fr:(I1<I2)∧(PC1 ∈ fr1)∧(PC2 ∈ fr2)∧((PC2∈ fr1)∨(PC1∈ fr2))⟹ fr2 is correlated with fr1
Equation (11) represents the general form of a correlation anomaly, where PC refers to packets and fr denotes a firewall rule. In this equation, PC1 belongs to fr1, PC2 belongs to fr2, and fr1 precedes fr2. A correlation anomaly occurs when there is a partial overlap between the two rules, meaning that some packets matched by fr2 are also matched by fr1, or some packets matched by fr1 are also matched by fr2. Therefore, fr2 is correlated with fr1.
By using a distributive property, Equation (11) can be decomposed into two parts; part one is (I1 < I2) ∧ (PC1 ∈ fr1) ∧ (PC2 ∈ fr1), and the second part is (I1 < I2)∧(PC1 ∈ fr1)∧(PC2 ∈ fr2)∧ (PC1 ∈ fr2).
With respect to the first part, when packets that belong to fr2 are also matched by fr1, then fr2 is treated as being shadowed by fr1. Accordingly, the shadowing analysis can be used to represent the first component of the correlation anomaly. The second part of the general correlation equation (I1 < I2)∧(PC1 ∈ fr1)∧(PC2 ∈ fr2)∧(PC1∈ fr2) is explained below:
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,action2)∧(I1<I2)∧((proto2=any)∨proto1∈proto2))∧(src_IP2=src_IP1)∧(dst_IP2=dst_IP1)∧(dst_port2=dst_port1)∧(action2~=action1)⟹ Correlation1
Equation (12) indicates that fr2 is correlated with fr1, where the rule with index I1 is a subset of the rule with index I2. In fr2, the protocol may be set to “any”, meaning it covers all protocol types; therefore, fr1 will always fall within fr2 regardless of the protocol specified in fr1. Another situation occurs when proto2 ∈ proto1, which means the protocol defined in fr2 is included within the protocol set of fr1 (e.g., if proto1 = {TCP, UDP}, then proto2 belongs to this set). All other packet-matching fields remain identical in fr1 and fr2, but the two rules apply different actions.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,ation2)∧(I1<I2)∧(proto2 = proto1)∧((src_IP2 = any)∨(src_IP1 ∈ src_IP2)) ∧ (dst_IP2 = dst_IP1)∧(dst_port2 = dst_port1)∧(action2~=action1)⟹ Correlation2
Figure 5 shows a snapshot of the second case of correlation.
The two diagrams in Figure 1, Figure 3 and Figure 5 illustrate representative snapshots of the detected anomalies. Each diagram highlights one or more packets and their anomaly relationships with other packets. For example, in Figure 1, the first diagram presents packet No. 2001 and its associated relationships, whereas the second diagram presents packets No. 88 and 99 with their corresponding anomaly relationships.
Equation (13) shows that fr2 is correlated with fr1. In this case, the source IP in fr2 can appear in two forms: it may be set to “any”, or it may represent an aggregate source IP range. Meanwhile, the source IP of fr1 is a single address or a narrower IP range that falls within the broader source IP range specified by fr2.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,ation2)∧(I1<I2)∧(src_IP2 = src_IP1)∧((dst_IP2=any)∨(dst_IP1∈ dst_IP2))∧(action2 ~= action1) ⟹ Correlation3
Equation (14) indicates that fr2 is correlated with fr1. In this equation, the destination IP in fr2 has two possible cases: it can be set to “any”, or it can represent an aggregate range of destination IP addresses within the rule fr2. In contrast, the destination IP of fr1 is either a single address or a narrower destination IP range that falls within the destination IP range of fr2. Therefore, the destination IP of fr1 belongs to fr2. In Equation (14), the destination IP of fr1 is contained within fr2, while all other packet-matching fields remain identical between fr1 and fr2, but the two rules specify different actions. Bottom of form
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,ation1)∧fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,ation2)∧(I1<I2)∧(src_IP2=src_IP1)∧(dst_IP1 = dst_IP1)∧((dst_port2 = any)∨(dst_port1 ∈ dst_port2))∧(action2~=action1) ⟹ Correlation4
Equation (15) indicates that fr2 is correlated with fr1. In this equation, the destination port of fr2 can take one of two forms: it may be set to “any,” or it may represent an aggregate set of destination ports. An aggregate destination port implies that multiple destination ports are specified within a single rule (fr2). The destination port of fr1 is either a single port or a narrower range of ports that falls within the destination port set of fr2. Thus, in Equation (15), the destination port of fr1 is contained within that of fr2, while all other packet-matching fields in fr1 and fr2 are identical but the actions differ. Consequently, fr1 is correlated with fr2.
The following discusses the rules used to detect correlation anomalies.

4.4. Generalization Detection

A generalization anomaly arises when two rules produce conflicting actions, and all packets matched by one rule are a subset of the other rule. The distinction between shadowing and generalization anomalies can be expressed by the following equation:
∀ fr, PC: (fr1<fr2) ∧ (PC1 ∈ fr1) ∧ (PC2 ∈ fr2) ∧ (PC1 ∈ fr2)⟹ fr2 is generalization of fr1
This generalized form of the generalization anomaly is analogous to the second case of the correlation anomaly. Therefore, the same equations used to characterize the second part of the correlation anomaly, namely, Equations (12)–(15), are also applicable for identifying the generalization anomaly. Figure 6 shows a snapshot of a generalization.

4.5. Irrelevant and Redundancy Detection

An irrelevant firewall rule is a rule that does not match any packets transmitted within the network. This type of anomaly may arise when firewall rules are not updated to reflect changes in network topology or configuration (e.g., the addition or removal of devices, or changes in the addressing scheme), or because of rule misconfigurations. Equation (17) represents the irrelevant anomaly.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr(I,proto,src_IP,src_port,dst_IP, dst_port, action)∧ (src_IP = dst_IP)⟹ Irrelevant1
In Equation (17), the source and destination IP addresses are identical, indicating a misconfiguration.
A redundant firewall rule is a rule that matches all packets of a preceding firewall rule and applies the same action. Equation [18] denotes the redundancy case.
∀:I,proto,src_IP,src_port,dst_IP,dst_port,action: fr1(I1,proto1,src_IP1,src_port1,dst_IP1,dst_port1,action1)∧ fr2(I2,proto2,src_IP2,src_port2,dst_IP2,dst_port2,action2)∧(I1<I2)∧(proto1=proto2)∧(src_IP1=src_IP2)∧(dst_IP1= dst_IP2)∧(dst_port1=dst_port2)∧(action1=action2)⟹redundancy

5. Prove the Correctness of the Proposed Rules Using Machine Learning

In this section, machine learning—specifically logistic regression—is employed to validate the correctness and applicability of the proposed logic rules by training a logistic regression classifier to perform anomaly classification.
Firewall anomalies (shadowing, redundancy, correlation, and generalization) are not directly observable labels. Instead, they represent logical relationships between pairs of firewall rules. Therefore, logic-based rules are first applied to detect anomalies, and the resulting outputs are then used as labels to train a machine learning classifier to learn these anomaly patterns.
The general methodology for applying machine learning to firewall anomaly detection is as follows:
  • Read the firewall rules dataset.
  • Perform logic-based anomaly detection, which serves as the ground truth (labels).
  • Extract relevant features.
  • Train a machine learning algorithm as a classifier.
  • Perform anomaly classification.
In the following, we are using logistic regression to find shadowing firewall rules by using our logic Equation (5) and logic Equation (17) to find redundant firewall rules, and logic Equation (11) to find correlated firewall rules.
Our proposed logic rules can provide ground truth for logistic regression by serving as a formal, deterministic labeling mechanism when explicit annotated datasets are unavailable. The process can be explained as follows:
Firewall anomalies (shadowing, redundancy, correlation) are not directly observable attributes but are defined through logical relations between rule fields. Therefore, each firewall rule (or rule pair) is first evaluated against a set of formally defined logic rules (Equations (5), (17) and (11)). If a rule satisfies a given logical condition, it is assigned a corresponding anomaly label (e.g., redundant = 1, non-redundant = 0). These logic-derived labels constitute the ground truth for supervised learning.
Once labeled, the firewall rules are transformed into numerical feature vectors and used to train a logistic regression classifier. The model learns to approximate the decision boundaries induced by the logic rules and can generalize these patterns to unseen configurations or noisy data.

5.1. Find Shadowing Firewall Rules Using Logistic Regression

In this section, machine learning methodology is used to detect shadowing anomalies in our used dataset. Since shadowing anomalies are not explicitly labeled in real-world firewall configurations, Equation (5) which denotes the general form of shadowing anomaly is first employed to generate ground-truth labels. A supervised machine learning model, namely logistic regression, is then trained to learn and generalize these shadowing patterns. For each rule pair, a set of binary features is extracted to represent the logical relationships between rules. These features include protocol matching, source and destination IP containment, source and destination port containment, and action conflict. This representation enables the machine learning model to capture the structural characteristics of shadowing anomalies without relying on raw rule syntax.
Given a feature vector:
x = ( x 1 , x 2 , , x n )
extracted from a rule pair ( f r 1 , f r 2 ) , logistic regression estimates the conditional probability that the pair corresponds to a shadowing anomaly. Each firewall rule pair is encoded using a set of binary features derived from the underlying logic-based conditions. These features capture whether the protocol of the later rule is subsumed by the earlier rule, whether the source and destination IP addresses of the later rule fall within the corresponding address ranges of the earlier rule, whether the source and destination ports are contained within the earlier rule’s port specifications, and whether the two rules enforce conflicting actions. This abstraction enables the model to learn structural patterns of rule dominance rather than relying on raw textual representations of firewall rules. Table 12 shows snapshot of detected shadowing rules.

5.2. Find Redundant Firewall Rules

Redundancy anomalies are not directly observable labels but are logical relations derived from firewall rule attributes. Therefore, Equation (17) was first applied to generate ground-truth labels, where a rule is considered redundant if its source and destination IP addresses are identical. Subsequently, a logistic regression model was trained using structural features extracted from firewall fields to learn redundancy patterns. This hybrid logic–machine learning approach enables scalable detection while preserving interpretability and consistency with formal firewall anomaly definitions. Table 13 shows a snapshot of detected redundant firewall rules. In the following, the mathematical representation of feature extraction is presented:
In this paper, a firewall rule is defined as:
f r i = ( I i , p r o t o i , s r c I P i , s r c P o r t i , d s t I P i , d s t P o r t i , a c t i o n i )
From each rule f r i , a feature vector x i R d is constructed as follows:
x i = [ x i 1 x i 2 x i 3 x i 4 x i 5 ]
Then using the provided logical equation, the redundancy label y i is defined as:
y i = { 1 , if   s r c I P i = d s t I P i 0 , otherwise
Table 13. Snapshot of detected redundant firewall rules.
Table 13. Snapshot of detected redundant firewall rules.
Rule IndexProtocolSrcIPDstIPSrcPortDstPortActionRedundancy Description
88anyanyanyanyanypermitSource and destination IP addresses are identical (any), resulting in redundant IP specification
102tcpanyanyany445denyIdentical source and destination IP fields; filtering depends only on destination port
102tcpanyanyany140–65535permitRedundant IP fields with overlapping port ranges
215udp10.0.0.510.0.0.5any53permitRule matches traffic where source and destination are the same host
347any192.168.1.0/24192.168.1.0/24anyanydenyIdentical source and destination subnet ranges
411tcpanyany1024–6553580permitIP fields are redundant; behavior governed solely by port constraints

5.3. Find Correlated Firewall Rules Using Logistic Regression

As we have mentioned before, Equation (11) denotes the general form of a correlation between firewall rules. Following the methodology discussed in Section 5, Table 14 shows the mapping of Equation (11) to ML features. Algorithm 1 shows logic-based labeling using Equation (11) as ground truth. Table 15 shows the snapshot of detected correlated firewall rules.
Table 14. Mapping of Equation (11) to ML features.
Table 14. Mapping of Equation (11) to ML features.
Logic ConditionML Feature
(I_1 < I_2)enforced by rule pairing
(PC_1 \in fr_1)rule exists
(PC_2 \in fr_2)rule exists
(PC_2 \in fr_1)partial field overlap
S(PC_1 \in fr_2)partial field overlap
Different actionsaction_diff = 1
Algorithm 1. Logic-based labeling using Equation (11) as ground truth.
  • pairs_df = pd.DataFrame(pairs)
  • def correlation_label(row):
  • if (
  • row.proto_eq
  • and row.action_diff
  • and (row.src_overlap or row.dst_overlap)
  • and not (row.sport_eq and row.dport_eq)
  • ):
  • return 1  # correlated
  • return 0   # not correlated
  • pairs_df[“label”] = pairs_df.apply(correlation_label, axis=1)
Following Equation (11), firewall correlation anomalies were modeled as a supervised rule-pair classification problem. Each ordered rule pair ( f r 1 , f r 2 ) was transformed into a fixed-length feature vector capturing protocol equivalence, partial field overlap, and action inconsistency. Logic-based conditions derived from Equation (11) were used to generate labels, which were then employed to train a logistic regression classifier. This approach enables scalable and generalizable detection of correlation anomalies while preserving interpretability.

6. Conclusions

This paper presented a formal, logic-driven framework for the systematic detection and analysis of firewall anomalies, supported by knowledge graph visualization using Neo4j. First-order logic (FOL) was employed to formally model firewall rules and precisely define major anomaly types, including shadowing, redundancy, correlation, generalization, and irrelevance, in both single and distributed firewall environments. By expressing these anomalies as formal logical rules, the proposed method provides a clear, unambiguous, and extensible framework for firewall policy validation.
The use of a real operational backbone dataset from Stanford University demonstrates the practical applicability of the proposed approach under realistic network conditions. Experimental results confirm that the developed FOL rules are capable of accurately identifying different anomaly cases, including complex overlapping scenarios that are often difficult to detect using traditional methods. In addition, the Neo4j-based knowledge graph representation offers an intuitive visualization mechanism that assists network engineers in understanding policy interactions, diagnosing configuration errors, and improving firewall rule management. Moreover, a machine learning-based approach was employed to validate the correctness of the proposed logic rules. The results presented in Table 12, Table 13 and Table 15 confirm the effectiveness and correctness of these logic rules.
The related work is categorized into two main streams. The first focuses on logic-based approaches for the formalization of firewall anomaly detection, while the second examines machine learning techniques for developing predictive models to address firewall anomalies. Compared with prior logic-based approaches, the proposed framework provides a more comprehensive analysis of firewall anomalies by defining five shadowing cases, four correlation cases, and corresponding first-order logic (FOL) detection rules for each scenario. Furthermore, the framework extends its coverage to generalization, redundancy, and irrelevant anomalies, enabling a unified logic-based detection methodology.
In contrast to existing machine learning-based and heuristic approaches, the proposed method offers superior interpretability, completeness, determinism, and formal correctness. Since it does not depend on training data or labeled samples, it is particularly well suited for security environments where explainability, reproducibility, and policy transparency are essential. Furthermore, its logical formulation provides a flexible foundation that can be readily reused and extended to support emerging anomaly definitions and continuously evolving network policies.
A key contribution of this work lies in the systematic decomposition of complex anomalies—particularly correlation and generalization—into well-defined logical cases, enabling comprehensive detection of overlapping and order-dependent rule interactions. The integration of Neo4j-based knowledge graphs further strengthens the framework by providing an intuitive and actionable visualization layer, allowing network engineers to trace anomaly causes, understand rule dependencies, and validate firewall policies with high confidence. As such, it represents a significant contribution to firewall policy verification and network security management. Figure 7 illustrates the overall distribution of firewall anomaly categories identified in the dataset by applying the proposed detection rules, highlighting the relative prevalence of each anomaly type.
In Figure 7, the bar chart summarizes the total number of anomaly instances detected per category, aggregating all shadowing subtypes into a single class. The results indicate that redundancy and shadowing dominate firewall misconfigurations, while generalization anomalies occur far less frequently.
Runtime and scalability are critical factors in validating the practicality of the proposed logical solution. Detecting firewall anomalies through logic-based reasoning can be formulated as an AI search problem. In general, runtime and scalability limitations are primarily associated with complete or blind search strategies, where the search space grows exponentially. In contrast, the proposed predefined FOL rules act as heuristic search operators, enabling direct, guided, and efficient anomaly identification without exhaustive exploration of the full rule space. This heuristic characteristic significantly enhances scalability while reducing runtime overhead. Moreover, our previous study by [9] experimentally demonstrated that FOL rule-based heuristic reasoning provides scalable performance with reasonable execution time.
In Section 3, Anomalies in Firewall Policy, we systematically define and highlight all known firewall policy anomalies, supported by references [8,27,28,29,30]. To the best of our knowledge, and based on the established literature, Section 3 comprehensively covers all recognized anomaly cases in firewall policies. Accordingly, because the logical rules developed in Section 4 address every anomaly type discussed in Section 3, we argue that the proposed logical detection framework is complete with respect to the established anomaly space reported in prior studies.
This study is limited to static firewall policy analysis, where the rule set is assumed to be fully specified and unchanged during evaluation. Accordingly, the proposed FOL-based framework does not address incomplete, ambiguous, or dynamically changing firewall rules. Extending the framework to support dynamic policy updates and adaptive network environments remains an important direction for future work.
Future work will focus on scaling the proposed approach to large-scale enterprise networks, integrating automatic rule optimization and remediation mechanisms, and extending the model to support dynamic and software-defined networking (SDN) environments. Incorporating temporal analysis and real-time policy updates into the knowledge graph is another promising direction for enhancing proactive firewall management.

Author Contributions

Conceptualization, A.O.E.; methodology, A.O.E. and A.A. (Amer Aljaedi); software, A.A. (Abdulhadi Albluwi) and A.O.E.; validation, M.H.M.N.; writing—original draft preparation, A.O.E. and A.A. (Amer Aljaedi); writing—review and editing, M.H.M.N. and A.A. (Abdulhadi Albluwi) and supervision, A.O.E. and A.A. (Amer Aljaedi). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Data Availability Statement

The data presented in this study are openly available in [Real Time Network Policy Checking Using Header Space Analysis]. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/kazemian (accessed on 1 June 2025). The data used in this study consist of network configuration datasets used for evaluating the Header Space Analysis framework. Due to privacy and security considerations, these datasets are not publicly available but can be requested from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bringhenti, D.; Marchetto, G.; Sisto, R.; Valenza, F.; Yusupov, J. Automated firewall configuration in virtual networks. IEEE Trans. Dependable Secur. Comput. 2022, 20, 1559–1576. [Google Scholar] [CrossRef]
  2. Mijwil, M.; Unogwu, O.J.; Filali, Y.; Bala, I.; Al-Shahwani, H. Exploring the top five evolving threats in cybersecurity: An in-depth overview. Mesopotamian J. Cybersecur. 2023, 2023, 57–63. [Google Scholar] [CrossRef]
  3. Coscia, A.; Dentamaro, V.; Galantucci, S.; Maci, A.; Pirlo, G. An innovative two-stage algorithm to optimize Firewall rule ordering. Comput. Secur. 2023, 134, 103423. [Google Scholar] [CrossRef]
  4. Alicea, M.; Alsmadi, I. Misconfiguration in firewalls and network access controls: Literature review. Future Internet 2021, 13, 283. [Google Scholar] [CrossRef]
  5. Gupta, S.; Gosain, D.; Kwon, M.; Acharya, H.B. DeeP4R: Deep Packet Inspection in P4 using Packet Recirculation. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications; IEEE: Piscataway, NJ, USA, 2023; pp. 1–10. [Google Scholar]
  6. Elfaki, A.O.; Aljaedi, A. Deep Analysis and Detection of Firewall Anomalies Using Knowledge Graph. In 12th International Conference on Pattern Recognition Applications and Methods; Springer: Berlin/Heidelberg, Germany, 2023; pp. 411–417. [Google Scholar]
  7. Kim, T.; Kwon, T.; Lee, J.; Song, J. F/Wvis: Hierarchical Visual Approach for Effective Optimization of Firewall Policy. IEEE Access 2021, 9, 105989–106004. [Google Scholar] [CrossRef]
  8. Dhrir, H.; Charfeddine, M.; Tarhouni, N.; Kammoun, H.M. Machine learning-and deep learning-based anomaly detection in firewalls: A survey. J. Supercomput. 2025, 81, 761. [Google Scholar] [CrossRef]
  9. Elfaki, A.O. A rule-based approach to detect and prevent inconsistency in the domain-engineering process. Expert Syst. 2016, 33, 3–13. [Google Scholar] [CrossRef]
  10. Arthur, J.K.; Kwadwo, E.; Doh, R.F.; Mantey, E.A. Firewall rule anomaly detection and resolution using particle swarm optimization algorithm. Int. J. Comput. Appl. 2019, 178, 975–8887. [Google Scholar] [CrossRef]
  11. Moradi Vartouni, A.; Teshnehlab, M.; Sedighian Kashi, S. Leveraging deep neural networks for anomaly-based web application firewall. IET Inf. Secur. 2019, 13, 352–361. [Google Scholar] [CrossRef]
  12. Valenza, F.; Cheminod, M. An Optimized Firewall Anomaly Resolution. J. Internet Serv. Inf. Secur. 2020, 10, 22–37. [Google Scholar]
  13. Togay, C.; Kasif, A.; Catal, C.; Tekinerdogan, B. A firewall policy anomaly detection framework for reliable network security. IEEE Trans. Reliab. 2021, 71, 339–347. [Google Scholar] [CrossRef]
  14. Kulyadi, S.P.; Mohandas, P.; Kumar, S.K.S.; Raman, M.S.; Vasan, V.S. Anomaly detection using generative adversarial networks on firewall log message data. In 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI); IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
  15. Fotiadou, K.; Velivassaki, T.H.; Voulkidis, A.; Skias, D.; Tsekeridou, S.; Zahariadis, T. Network traffic anomaly detection via deep learning. Information 2021, 12, 215. [Google Scholar] [CrossRef]
  16. Lorenz, C.; Schnor, B. Firewall management: Rapid anomaly detection. In 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys); IEEE: Piscataway, NJ, USA, 2022; pp. 1465–1472. [Google Scholar]
  17. Toprak, S.; Yavuz, A.G. Web application firewall based on anomaly detection using deep learning. Acta Infologica 2022, 6, 219–244. [Google Scholar] [CrossRef]
  18. Lin, Z.; Yao, Z. Firewall anomaly detection based on double decision tree. Symmetry 2022, 14, 2668. [Google Scholar] [CrossRef]
  19. Khummanee, S.; Chomphuwiset, P.; Pruksasri, P. DSSF: Decision Support System to Detect and Solve Firewall Rule Anomalies based on a Probability Approach. ECTI Trans. Comput. Inf. Technol. (ECTI-CIT) 2022, 16, 56–73. [Google Scholar] [CrossRef]
  20. Aljabri, M.; Alahmadi, A.A.; Mohammad, R.M.A.; Aboulnour, M.; Alomari, D.M.; Almotiri, S.H. Classification of firewall log data using multiclass machine learning models. Electronics 2022, 11, 1851. [Google Scholar] [CrossRef]
  21. Shaheed, A.; Kurdy, M.H.D. Web application firewall using machine learning and features engineering. Secur. Commun. Netw. 2022, 2022, 5280158. [Google Scholar] [CrossRef]
  22. Al-Haijaa, Q.A.; Ishtaiwia, A. Machine learning based model to identify firewall decisions to improve cyber-defense. Int. J. Adv. Sci. Eng. Inf. Technol. 2021, 11, 1688–1695. [Google Scholar] [CrossRef]
  23. Andalib, A.; Babamir, S.M. Anomaly detection of policies in distributed firewalls using data log analysis. J. Supercomput. 2023, 79, 19473–19514. [Google Scholar] [CrossRef]
  24. Bringhenti, D.; Seno, L.; Valenza, F. An Optimized Approach for Assisted Firewall Anomaly Resolution. IEEE Access 2023, 11, 119693–119710. [Google Scholar] [CrossRef]
  25. Komadina, A.; Kovačević, I.; Štengl, B.; Groš, S. Detecting anomalies in firewall logs using artificially generated attacks. In 2023 17th International Conference on Telecommunications (ConTEL); IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
  26. Wang, Z.; AnilKumar, A. W2R: An Ensemble Anomaly Detection Model Inspired by Language Models for Web Application Firewalls Security. Master’s Thesis, Halmast University, Halmstad, Sweden, 2023. Available online: https://www.diva-portal.org/smash/get/diva2:1772725/FULLTEXT02 (accessed on 1 June 2025).
  27. Park, J.; Park, B.; Kim, T.S. Development of an Anomaly Classification Model and a Decision Support Tool for Firewall Policy Configuration. Appl. Sci. 2025, 15, 2979. [Google Scholar] [CrossRef]
  28. Karafili, E.; Valenza, F.; Chen, Y.; Lupu, E.C. Towards a Framework for Automatic Firewalls Configuration via Argumentation Reasoning. In NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
  29. Ahmed, Z.; Askari, S.M.S. Firewall Rule Anomaly Detection: A Survey. Int. J. Comput. Intell. IoT 2018, 2. [Google Scholar]
  30. Hakani, D.; Mann, P.S. Intra Firewall Anomaly Policies Detection in Cloud Environment Using Firewall Tree. Trans. Indian Natl. Acad. Eng. 2025, 10, 63–72. [Google Scholar] [CrossRef]
  31. Chao, C.S. A Feasible Anomaly Diagnosis Mechanism for Stateful Firewall Rules. In 2018 27th International Conference on Computer Communication and Networks (ICCCN); IEEE: Piscataway, NJ, USA, 2018; pp. 1–2. [Google Scholar]
  32. Elfaki, A.O.; Fong, S.L.; Aik, K.L.T.; Johar, M.G.M. Towards detecting redundancy in domain engineering process using first order logic rules. Int. J. Knowl. Eng. Soft Data Paradig. 2013, 4, 1–20. [Google Scholar] [CrossRef]
  33. Kazemian, P.; Chang, M.; Zeng, H.; Varghese, G.; McKeown, N.; Whyte, S. Real time network policy checking using header space analysis. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13); USENIX Association: Berkeley, CA, USA, 2013; pp. 99–111. [Google Scholar]
Figure 1. Illustration of the detection of the first shadowing case.
Figure 1. Illustration of the detection of the first shadowing case.
Electronics 15 01714 g001
Figure 4. Illustration of the detection of the fifth case of shadowing.
Figure 4. Illustration of the detection of the fifth case of shadowing.
Electronics 15 01714 g004
Figure 5. Snapshot of the second case of correlation.
Figure 5. Snapshot of the second case of correlation.
Electronics 15 01714 g005
Figure 6. Snapshot of a generalization.
Figure 6. Snapshot of a generalization.
Electronics 15 01714 g006
Figure 7. Illustration of the overall distribution of firewall anomaly categories.
Figure 7. Illustration of the overall distribution of firewall anomaly categories.
Electronics 15 01714 g007
Table 1. An instance of a shadowing anomaly.
Table 1. An instance of a shadowing anomaly.
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1TCP10.0.0.0/28*20.0.0.0/2822Deny
I2TCP10.0.0.5*20.0.0.322Allow
Table 2. An instance of a correlation anomaly: Conflicting.
Table 2. An instance of a correlation anomaly: Conflicting.
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1TCP10.0.0.5*20.0.0.0/2822Allow
I2TCP10.0.0.0/28*20.0.0.322Deny
Table 3. An instance of a correlation anomaly: Redundant.
Table 3. An instance of a correlation anomaly: Redundant.
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1TCP10.0.0.5*20.0.0.0/2822Allow
I2TCP10.0.0.0/28*20.0.0.322Allow
Table 4. An instance of a generalization anomaly.
Table 4. An instance of a generalization anomaly.
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1TCP10.0.0.5*20.0.0.322Allow
I2TCP10.0.0.0/28*20.0.0.0/2822Deny
Table 5. An instance of an irrelevant anomaly (case 1).
Table 5. An instance of an irrelevant anomaly (case 1).
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1TCP10.0.0.5*10.0.0.522allow
Table 6. An instance of an irrelevant anomaly (case 2).
Table 6. An instance of an irrelevant anomaly (case 2).
IndexProto.Src-IPSrc-PortDst-IPDst-PortAction
I1UDP10.0.0.5*60.0.0.370allow
Table 7. An instance of inter-firewall anomalies.
Table 7. An instance of inter-firewall anomalies.
Firewall F1 (Upstream)
RuleSourceDestinationProtocolAction
R1*10.0.0.20HTTPallow
Firewall F2 (Downstream)
RuleSourceDestinationProtocolAction
R1*10.0.0.20HTTPdeny
Table 8. An instance of a spuriousness anomaly.
Table 8. An instance of a spuriousness anomaly.
Firewall F1 (Upstream)
RuleSourceDestinationProtocolAction
R1*10.0.0.10HTTPallow
Firewall F2 (Downstream)
RuleSourceDestinationProtocolAction
R1192.168.1.0/2410.0.0.10SSHallow
R2***Deny
Table 9. Firewall policy graph construction and port normalization algorithm.
Table 9. Firewall policy graph construction and port normalization algorithm.
#Step
1Input: Firewall policy dataset in CSV format
2Output: Neo4j knowledge graph with normalized destination ports
3Load firewall policy CSV into Neo4j
4For each record in the dataset, create or merge a node representing a firewall rule
5Assign rule attributes: Protocol, source IP, destination IP, source port, destination port, and action
6For each firewall rule node, inspect the destination port field
7If the destination port is expressed as a range (e.g., xy), convert it into a numeric interval ([x, y])
8Ensure that the destination port attribute is represented as a list
9For each element in the destination port list, perform normalization
10Map symbolic port values (any, established) to predefined numeric codes
11Convert all remaining port values from string format to integers
12Update the firewall rule node with the normalized destination port list
13End processing of all firewall rules
14Return the constructed knowledge graph
Table 10. Definition of Packet 2001.
Table 10. Definition of Packet 2001.
Action: “deny”
DstIP: “any”
DstPort: “any”
dstPortFrom: 0
dstPortTo: 65535
Index: 2001
Protocol: “ip”
SrcIP: “any”
SrcPort: “any”
Table 11. Definition of packet 139.
Table 11. Definition of packet 139.
Action: “permit”
DstIP: “any”
DstPort: “any”
dstPortFrom: 0
dstPortTo: 65535
Index: 139
Protocol: “ip”
SrcIP: “any”
SrcPort: “any”
Table 12. Snapshot of detected shadowing rules.
Table 12. Snapshot of detected shadowing rules.
fr1 Indexfr2 IndexReason for Shadowing
1225fr1 specifies source IP as any, fully covering the specific source IP defined in fr2, with conflicting actions
4763fr1 uses protocol any, thereby matching all packets of fr2 that specify a concrete protocol
8897fr1 defines a generalized destination IP range that completely includes the destination IP of fr2, with different actions
Table 15. Snapshots of correlated rules.
Table 15. Snapshots of correlated rules.
fr1 Indexfr2 IndexReason
1527Partial source IP overlap, different actions
4258Same protocol, overlapping destination range
103118Shared packets, conflicting actions
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Elfaki, A.O.; Albluwi, A.; Aljaedi, A.; Nerma, M.H.M. Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics 2026, 15, 1714. https://doi.org/10.3390/electronics15081714

AMA Style

Elfaki AO, Albluwi A, Aljaedi A, Nerma MHM. Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics. 2026; 15(8):1714. https://doi.org/10.3390/electronics15081714

Chicago/Turabian Style

Elfaki, Abdelrahman Osman, Abdulhadi Albluwi, Amer Aljaedi, and Mohamed Hussien Mohamed Nerma. 2026. "Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation" Electronics 15, no. 8: 1714. https://doi.org/10.3390/electronics15081714

APA Style

Elfaki, A. O., Albluwi, A., Aljaedi, A., & Nerma, M. H. M. (2026). Explainable Logic-Driven Firewall Anomaly Detection with Knowledge Graph Visualization and Machine Learning Validation. Electronics, 15(8), 1714. https://doi.org/10.3390/electronics15081714

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop