This section illustrates the validation rules used for firewall anomaly detection, which are based on the definition provided in the previous section. In addition, according to [
31], the standard structure of a firewall rule is ⟨order, protocol, src_ip, src_port, dst_ip, dst_port, action⟩. In this section, firewall rules are represented using first-order logic [
32] and follow the same concept introduced by [
31]. The general syntax of the proposed validation rule is defined as
4.1. Dataset Description and Utilization
The dataset used in this paper is derived from real operational backbone network data collected from Stanford University’s campus network infrastructure, commonly referred to in the literature as the
Stanford backbone dataset (Stanford University, “Campus backbone router configuration and ARP table data,” Stanford, CA, USA, unpublished research dataset) [
33]. The dataset consists of router-level information extracted from core network devices, including Address Resolution Protocol (ARP) tables, IP–MAC mappings, interface identifiers, and VLAN assignments.
The data reflects actual backbone-level network configurations, capturing multiple subnets, virtual LANs, and inter-router connections. IP address ranges such as 171.64.0.0/16, which are publicly assigned to Stanford University, confirm the provenance of the dataset. The dataset does not contain packet payloads or user-level information; instead, it represents control-plane and configuration-level data, making it suitable for firewall policy modeling and anomaly analysis.
In this work, the dataset is used to reconstruct realistic network scopes and policy domains, from which firewall rules and packet-matching conditions are inferred. This enables the evaluation of firewall anomalies such as shadowing, redundancy, correlation, and generalization under real network conditions rather than synthetic rule sets. Since the dataset originates from operational infrastructure, it provides a realistic and challenging benchmark for validating firewall policy verification techniques.
To prepare and normalize this dataset for use in Neo4j, we developed the software described in
Table 9, which outlines the firewall policy graph construction and the port normalization algorithm.
In the following section, first-order logic (FOL) rules are provided to illustrate how each anomaly type is represented. The use of first-order logic (FOL) for developing validation rules was previously explained by [
6].
4.2. Shadowing Detection
The general form of a shadow anomaly is as below:
Equation (5) indicates that rule fr2 is positioned after fr1 in the firewall rule ordering; therefore, fr1 is evaluated first. Rule fr1 matches two packet classes (PC1 and PC2), whereas fr2 matches only packet class PC2. Since all packet fields matched by fr2 are already covered by fr1, Equation (5) represents a general shadowing anomaly, in which fr2 is completely shadowed by fr1. Moreover, when both rules specify the same action, this situation is additionally classified as a redundancy anomaly.
In the following, the shadowing anomaly has been classified into four cases as follows:
Equation (6) represents the first case of a shadowing anomaly, where rules
fr1 and
fr2 are modeled using predicates. Both rules share identical protocol types, source IP addresses, destination IP addresses, and destination ports, while specifying different actions. Since
fr1 precedes
fr2 in the rule order,
fr2 is completely shadowed by
fr1. This scenario is referred to as the first case of shadowing.
Figure 1 illustrates the detection of the first case of shadowing in the dataset after applying Rule 6.
In
Figure 1, packet 2001 has a first case of shadowing with packet 139.
Table 10 and
Table 11 show the definitions of packets 2001 and 139 respectively. As is shown from the definitions of packets 2001 and 139, they have the same protocol types, source and destination IP addresses, and destination ports, but define different actions. Therefore, this case is considered the first case of shadowing.
The same pair of rules can meet more than one shadowing definition at the same time. This does not mean the detection is redundant; rather, it shows that the pair satisfies multiple independent shadowing criteria (e.g., an exact match as well as broader field coverage), and each criterion is treated as a separate anomaly case in our model.
The subsequent sections present the remaining possible forms of shadowing anomalies.
Equation (7) indicates that
fr2 is shadowed by
fr1. In
fr1, the protocol is set to “any”, meaning it matches all protocol types (e.g., TCP or UDP). Therefore, regardless of the protocol value in
fr2, it will always be included within the protocol scope of
fr1. In the second case, where proto
2 ∈ proto
1, the protocol specified in
fr2 is a subset of the protocol set defined in
fr1. For example, if proto
1 = {TCP, UDP}, then proto
2 must be one of these protocols, i.e., proto
2 ∈ proto
1. All remaining packet fields are identical in
fr1 and
fr2, but the two rules apply different actions, which leads to the shadowing relationship.
Figure 2 illustrates the detection of the second case of shadowing in the dataset after applying Rule 7.
Figure 2.
Illustration of the detection of the second case of shadowing.
Figure 2.
Illustration of the detection of the second case of shadowing.
Equation (8) indicates that
fr2 is shadowed by
fr1. In this equation, the source IP of
fr1 can appear in two forms: it may be set to “any”, or it may be an aggregate source IP. An aggregate source IP means that a single rule (
fr1) specifies multiple source IP addresses or a range of addresses. In contrast, the source IP of
fr2 is a single IP that is included within the source IP set defined in
fr1. For example, if the source IP of
fr1 is {10.0.0.1–10.0.0.10} and the source IP of
fr2 is {10.0.0.5}, then the source IP of
fr2 is contained within
fr1. In Equation 8, since the source IP of
fr2 is included in
fr1, and the remaining packet fields are equivalent while the actions differ,
fr2 becomes a shadowed rule of
fr1.
Figure 3 illustrates the detection of the third case of shadowing in the dataset after applying Equation (8).
Figure 3.
Illustration of the detection of the third case of shadowing.
Figure 3.
Illustration of the detection of the third case of shadowing.
Equation (9) shows that
fr2 is shadowed by
fr1. In this case, the destination IP of
fr1 can take one of two forms: it may be specified as “any”, or it may be an aggregate destination IP. An aggregate destination IP means that a single rule (
fr1) contains multiple destination IP addresses or a destination IP range. In contrast,
fr2 includes a single destination IP that is contained within the destination IP set of
fr1. For example, if the destination IP in
fr1 is {10.0.0.1–10.0.0.10} and the destination IP in
fr2 is {10.0.0.5}, then the destination IP of
fr2 belongs to the destination IP range defined in
fr1. Since Equation (9) indicates that the destination IP of
fr2 is included in
fr1, while the remaining packet fields are equivalent, but the actions are different,
fr2 is considered a shadowed rule of
fr1. Equation (9) denotes the fourth case of shadowing.
Equation (10) denotes that
fr2 is shadowed by
fr1. In this equation, there are two scenarios of the destination port of
fr1, either equal to “any”, or it is an aggregate port. An aggregate destination port means there are multiple destination ports in single rule,
fr1. The destination port of
fr2 is a single port and belongs to the destination port of
fr1. For instance, suppose destination port of
fr1 = {40, 60, 80} and destination port of
fr2 is {40}; hence the destination port of
fr2 belongs to
fr1. Following the concept in Equation (5) (the general form of the shadow anomaly), the destination port of
fr2 belongs to
fr1 and the rest of the packet details are equivalent with different actions. Therefore,
fr2 is a shadow of
fr1. Equation (10) denotes the fifth case of shadowing.
Figure 4 shows a snapshot of the fifth case of shadowing.
The following discusses the rules used to detect correlation anomalies. Shadowing occurs when the entire set of packets matched by fr2 is included within fr1, yet the two rules specify different actions. However, a correlation anomaly appears when fr1 and fr2 partially overlap, meaning some packets of fr2 are covered by fr1, and some packets of fr1 are also matched by fr2, with different actions applied.
The visual structure of the knowledge graph does not explicitly encode the causal reason for shadowing. Consequently, all shadowing cases exhibit the same graphical pattern. Differentiation among these cases is achieved through semantic annotations, including the relationship type, the underlying logical conditions, and the specific rule attributes involved.
4.3. Correlation Detection Top of Form
The general form of a correlation anomaly is illustrated by Equation (11) as follows:
Equation (11) represents the general form of a correlation anomaly, where PC refers to packets and fr denotes a firewall rule. In this equation, PC1 belongs to fr1, PC2 belongs to fr2, and fr1 precedes fr2. A correlation anomaly occurs when there is a partial overlap between the two rules, meaning that some packets matched by fr2 are also matched by fr1, or some packets matched by fr1 are also matched by fr2. Therefore, fr2 is correlated with fr1.
By using a distributive property, Equation (11) can be decomposed into two parts; part one is (I1 < I2) ∧ (PC1 ∈ fr1) ∧ (PC2 ∈ fr1), and the second part is (I1 < I2)∧(PC1 ∈ fr1)∧(PC2 ∈ fr2)∧ (PC1 ∈ fr2).
With respect to the first part, when packets that belong to fr
2 are also matched by fr
1, then fr
2 is treated as being shadowed by fr
1. Accordingly, the shadowing analysis can be used to represent the first component of the correlation anomaly. The second part of the general correlation equation (I
1 < I
2)∧(PC
1 ∈ fr1)∧(PC
2 ∈ fr
2)∧(PC
1∈ fr
2) is explained below:
Equation (12) indicates that
fr2 is correlated with
fr1, where the rule with index
I1 is a subset of the rule with index
I2. In
fr2, the protocol may be set to “any”, meaning it covers all protocol types; therefore,
fr1 will always fall within
fr2 regardless of the protocol specified in
fr1. Another situation occurs when proto
2 ∈ proto
1, which means the protocol defined in
fr2 is included within the protocol set of
fr1 (e.g., if proto1 = {TCP, UDP}, then proto
2 belongs to this set). All other packet-matching fields remain identical in
fr1 and
fr2, but the two rules apply different actions.
Figure 5 shows a snapshot of the second case of correlation.
The two diagrams in
Figure 1,
Figure 3 and
Figure 5 illustrate representative snapshots of the detected anomalies. Each diagram highlights one or more packets and their anomaly relationships with other packets. For example, in
Figure 1, the first diagram presents packet No. 2001 and its associated relationships, whereas the second diagram presents packets No. 88 and 99 with their corresponding anomaly relationships.
Equation (13) shows that
fr2 is correlated with
fr1. In this case, the source IP in
fr2 can appear in two forms: it may be set to “any”, or it may represent an aggregate source IP range. Meanwhile, the source IP of
fr1 is a single address or a narrower IP range that falls within the broader source IP range specified by
fr2.
Equation (14) indicates that
fr2 is correlated with
fr1. In this equation, the destination IP in
fr2 has two possible cases: it can be set to “any”, or it can represent an aggregate range of destination IP addresses within the rule
fr2. In contrast, the destination IP of
fr1 is either a single address or a narrower destination IP range that falls within the destination IP range of
fr2. Therefore, the destination IP of fr1 belongs to
fr2. In Equation (14), the destination IP of
fr1 is contained within
fr2, while all other packet-matching fields remain identical between
fr1 and
fr2, but the two rules specify different actions. Bottom of form
Equation (15) indicates that fr2 is correlated with fr1. In this equation, the destination port of fr2 can take one of two forms: it may be set to “any,” or it may represent an aggregate set of destination ports. An aggregate destination port implies that multiple destination ports are specified within a single rule (fr2). The destination port of fr1 is either a single port or a narrower range of ports that falls within the destination port set of fr2. Thus, in Equation (15), the destination port of fr1 is contained within that of fr2, while all other packet-matching fields in fr1 and fr2 are identical but the actions differ. Consequently, fr1 is correlated with fr2.
The following discusses the rules used to detect correlation anomalies.