ABAC Policy Mining Using Complex Network Analysis Techniques

Díaz-Rodríguez, Héctor; Díaz-Pérez, Arturo

doi:10.3390/app152312571

Open AccessArticle

ABAC Policy Mining Using Complex Network Analysis Techniques

by

Héctor Díaz-Rodríguez

^*

and

Arturo Díaz-Pérez

Center for Research and Advanced Studies of the National Polytechnic Institute, Zapopan 45017, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12571; https://doi.org/10.3390/app152312571

Submission received: 25 August 2025 / Revised: 8 October 2025 / Accepted: 14 November 2025 / Published: 27 November 2025

(This article belongs to the Special Issue Security and Privacy in Complicated Computing Environments)

Download

Browse Figures

Versions Notes

Abstract

Recent computing technologies and modern information systems require an access control model that provides flexibility, granularity, and dynamism. The Attribute-Based Access Control (ABAC) model was developed to address the new challenges of emerging applications. Designing and implementing an ABAC policy manually is usually a complex and costly task; therefore, many organizations prefer to keep their access control mechanisms in operation rather than incur the costs associated with the migration process. A solution to the above is to automate the process of creating access control policies. This action is known as policy mining. In this paper, we present a novel approach, based on complex network analysis, for mining an ABAC policy from an access control log. The proposed approach is based on the data and the relationships that can be generated from them. The proposed methodology is divided into five phases: (1) data preprocessing, (2) network model, (3) community detection, (4) policy rule extraction, and (5) policy refinement. The results show that it is possible to obtain an ABAC policy using the approach based on complex networks. In addition, our proposed methodology outperforms existing ABAC mining algorithms regarding quality. Finally, we present a novel access decision process that reduces the number of rules to evaluate based on a rule network.

Keywords:

policy mining; access control; complex networks

1. Introduction

The importance of information security has increased as the world has become more interconnected [1]. The main objective of information security is to protect sensitive and confidential data from unauthorized access, use, and disclosure. According to a 2024 study by IBM and the Ponemon Institute, the average cost of a data breach was USD 4.88 million [2]. It is clear that organizations, industries, and national security depend on a robust information security program to prevent security breaches and protect critical information.

By adopting an Access Control (AC) mechanism, resources and information can be protected from unauthorized entities and malicious activity. The goal of access control is to ensure that only authorized entities are granted access to the resources and information they need to perform their job functions [3]. AC has become more important due to the development of recent technologies such as cloud computing [4], the Internet of Things [5], and Fog Computing [6].

The process by which an access request is permitted or denied depends on the AC model implemented. There are four traditional AC models: Discretionary Access Control (DAC), Mandatory Access Control (MAC), Role-based Access Control (RBAC), and Attribute-based Access Control (ABAC). The AC model, in addition to the evaluation process, has an AC policy that defines the set of rules and requirements by which the AC mechanism is regulated. When an access request is received, the AC mechanism interacts with the policy to determine whether the subject has the right to access the requested object.

In addition to traditional AC models, other AC models have been developed to improve the AC mechanism or address the limitations of the traditional approaches. For example, the RBAC and ABAC Combining Access Control (RACAC) model [7] adopts the strengths of both models. Approaches such as the Hierarchical Group and Attribute-Based Access Control (HGABAC) model [8] and Higher-order Attribute-Based Access Control (HoBAC) model [9] formally introduce the concept of hierarchy in ABAC. Ciphertext-Policy Attribute-Based Encryption (CP-ABE) [10] adds a layer of encryption to the data in the ABAC model. The Hierarchical, Extensible, Advanced, and Dynamic Access Control (HEAD) metamodel [11] is a proposal that confronts most of the challenges of recent technologies.

To address the present application problems and overcome the limitations of DAC, MAC, and RBAC models that make them less suitable for emerging large and dynamic application domains, the ABAC model was developed [12,13,14,15]. The ABAC model uses attributes to make access control decisions, making it flexible, scalable, and context-sensitive. The attributes are the characteristics of an entity (subject, object, environment) in the AC mechanism. The ABAC policy is the set of authorization rules defined over these attributes. An ABAC rule specifies whether a subject can access a resource in a particular environment. When an access request is received, the ABAC model evaluates the subject (requester)’s attributes, the object’s attributes, and the environmental conditions with a set of rules (policy) for overall attributes.

However, the main problem with implementing an ABAC model is the policy specification. Policy specification is typically performed by expert humans, which can make it costly, time-consuming, subjective, complex, and error-prone. Consequently, organizations prefer to keep traditional models (MAC, DAC, and RBAC) in operation rather than face all the challenges of the migration and implementation process. Policy mining techniques can be used to automate the policy specification process and address these challenges.

Policy mining is a technique that automates the process of AC policy specification based on an existing AC state, such as Access Control Lists (ACLs), natural language, or access logs. Specifically, the access log contains information on all access requests that are either permitted or denied by the AC mechanism. By using the access log as input, a policy mining process can generate an AC policy as an output. In the context of the ABAC model, this process involves identifying a reduced set of AC rules that properly model the request records in the access log.

Several approaches have been developed for policy mining. Xu and Stoller [16] pioneered this field by creating rules from user-permission logs, progressively refining them for broader coverage. However, their algorithm demonstrates reduced performance with larger access logs. Iyer and Masoumzadeh [17] introduced an ABAC policy mining approach using the PRISM algorithm [18]. Despite its effectiveness in generating positive and negative rules, scalability becomes a challenge with larger access logs, leading to an excess of rules. Cotrini et al. [19] proposed the Rhapsody algorithm based on APRIORI-SD [20], featuring stages for rule generation, extraction, and length reduction. While introducing a liability metric, the exponential growth of rules in the presence of numerous attributes remains a limitation. Jabal et al. [21] presented an ABAC policy mining framework integrating data mining, statistics, and machine learning techniques. Despite its effectiveness, the approach faces challenges due to the substantial number of combinations required for policy extraction. Karimi et al. [22] contributed to the first machine learning-based approach involving phases such as parameter selection, clustering, rule extraction, and clean-up. Recent investigations [23,24,25] have chosen to detect clusters of access requests before the policy mining process; then, the rule generation algorithms are applied to those clusters. Table 1 summarizes the challenges and techniques of the state-of-the-art ABAC policy mining approaches.

In summary, an AC mechanism is a complex and dynamic system in which a large number of entities (subjects, objects, environmental conditions, attributes, operations) interact with each other. A purely data-driven approach for policy mining is not sufficient to functionally capture the high complexity of the AC mechanism. To overcome these limitations, recent studies have turned to complex network theory, a powerful mathematical framework for analyzing and modeling complex systems. Specifically, an AC system network model is able to detect hidden patterns and implicit relationships between entities that may not be apparent in the original access log.

Briefly, a complex network is a network (graph) with a set of nodes (vertices) joined together in pairs by a set of links (edges). A network has the function of reducing a complex system to an abstract structure. It captures only the fundamental aspects of entities, which serve as nodes, and their connection patterns, which serve as links. This theory has been a powerful tool with which for scientists to model and analyze real-world systems across a wide range of fields such as sociology, neuroscience, biology, and technology [26,27]. Specifically, in security information research, some investigations use complex network analysis for anomaly detection in computer networks [28], vulnerability analysis [29], or modeling cyber–physical cybersecurity networks [30]. In the complex network theory, we can find several network properties such as scale-free degree distribution, community structure, density, and centrality that can help to extract hidden knowledge from the complex interaction system. With the above, we can discover a wide range of added information, e.g., how data flows on the Internet, who the most important person in a terrorist group is, the reliability of a power grid, or groups of people on social media apps.

This paper presents a new proposal for ABAC policy mining based on complex networks. Our proposal uses the access log to generate an ABAC policy that best represents the access requests in the access log. The access log reflects the behavior of the system in a real environment; therefore, it can be used to model a complex network. A bipartite network is generated based on user–resource interactions. Then, a weighted projection to the bipartite network is performed to obtain a user network. A community detection algorithm is applied to the user network to group users and attributes based on the network topology. Each detected community is analyzed to extract the patterns that best represent the community. In this way, it is possible to generate policy rules according to user and resource attributes. Finally, an algorithm is applied to refine the mined ABAC policy and increase its quality. The main contributions of this work are as follows:

A new method based on complex network theory for mining ABAC access control policies from access records.
A bipartite network model to represent user–resource interactions (access requests) in an access control system.
A rule extraction algorithm to extract patterns that best represent the users and the resources of the community.
A network model of the mined rules for evaluating access requests.

The rest of the paper is organized as follows. Section 2 defines the ABAC policy mining problem, discusses the challenges and requirements, and presents the evaluation metrics in this problem. Section 3 presents our proposed ABAC policy mining approach. In Section 4, we describe the process of access decision in our policy based on the rule network. Section 5 presents the performance of our approach on two datasets as well as detailed results for every phase of the proposed model. Finally, Section 6 provides additional discussion and conclusions.

2. Fundamental Principles

2.1. ABAC Model

In the ABAC model, the authorization of a set of requests is determined by evaluating the attributes associated with a subject, an object, an operation, and the environmental conditions against a set of rules that describe the allowed operations for a given resource.

Attributes are characteristics that define specific aspects of the subject, object, and environment. These attributes can be set by the security administrator and by the dynamics of the system. A subject is an entity that requests access to an object. A subject can be a user, application, process, or service. The object is the entity which the subjects access. The objects can be resources, files, devices, registries, processes, services, or programs. An operation is the execution of a function toward an object. Operations can read, write, execute, modify, delete, etc. A rule is the combination of attributes and operations that condition the access of a request. Rules are statements of the type "If, then" that can grant or deny resources; they can even override other rules. Each ABAC rule specifies whether a subject can access a resource in a particular environment. When an access request is received, the ABAC model evaluates the subject (requester) attributes, the object attributes, and the environmental conditions with a set of rules (policy). An example of a rule is If class = CP01 and type = task and op = read then permit. In this approach, we use an adapted ABAC policy model definition from the work [22].

Definition 1

(ABAC Policy Model). An ABAC policy model is composed of the following elements:

U, R, S, and $O P$ refer to the set of users, resources, sessions, and operations, respectively, in the system. $E = U \cup R \cup S$ is the set of all system entities.
$A_{U}$ , $A_{R}$ , and $A_{S}$ refer to the set of user attributes, resource attributes, and session attributes, respectively. $A = A_{U} \cup A_{R} \cup A_{S}$ is the set of all system attributes. Given an attribute $a \in A$ , the attribute range $V_{a}$ is the set of all valid values for attribute a in the system.
The $f_{e a} (e, a)$ function returns the value of attribute $a \in A$ for entity $e \in E$ .
An attribute filter is a set of tuples defined as $F = {〈 a, v 〉 | a \in A \land v \in V_{a}}$ .
An access request is a tuple $q = 〈 u, r, s, o p, d 〉$ , where a user $u \in U$ initiates an operation request $o p \in O P$ to the resource $r \in R$ in the session $s \in S$ and d is the access decision made by the access control system.
An Access decision d has two possible values: permit or deny. If $d = p e r m i t$ , the user u can access the resource r, performing an operation $o p$ in session s. In contrast, if $d = d e n y$ , the user u cannot access the resource r.
The access log $L$ is the union of the positive access log $L^{+}$ and negative access log $L^{-}$ defined as

$\begin{matrix} L^{+} = {〈 q, d 〉 | 〈 q, d 〉 \in L \land d = p e r m i t} \\ L^{-} = {〈 q, d 〉 | 〈 q, d 〉 \in L \land d = d e n y} \end{matrix}$

(1)

An access request from positive (negative) access log $q^{+} \in L^{+}$ ( $q^{-} \in L^{-}$ ) is called a positive access request (negative access request).
A rule is a tuple $ρ = 〈 F, o p, d 〉$ , where $F$ is an attribute filter, $o p$ is an operation, and d an access decision.
A rule satisfies an access request $q ⊧ ρ$ iff

$\begin{matrix} 〈 u, r, s 〉 ⊧ F \land o p_{q} = o p_{p} \end{matrix}$

(2)
An ABAC policy is a tuple $π = 〈 E, O P, A, f_{e a}, P 〉$ , where E is the set of entities, $O P$ the set of operations, A is the set of attributes, $f_{e a}$ is the attribute function, and $P$ is the set of rules.
The decision of policy $P$ for an access request q denoted as $d_{π} (q)$ is permit if

$\begin{matrix} \exists ρ \in P : q ⊧ ρ \end{matrix}$

(3)

Otherwise, the decision is deny.

Policy mining algorithms help to migrate traditional AC models to the ABAC model automatically. The ABAC policy mining problem defined below is an adaptation of [22]:

Problem 1

(ABAC Policy Mining Problem). Given an access log

L

, the ABAC policy mining problem is to find a set of rules

P

that achieves high quality with respect to

L

, ensuring that

P

clearly defines the entities, attributes, operations, and access decisions in the AC mechanism.

We refer to the original policy as the access control policy that gave rise to the access log

L

. In contrast, we refer to the mined policy as the set of rules

P

that was automatically inferred from

L

. In this context, the ABAC policy mining problem aims for the mined policy to approximate or replicate the access decisions made by the original policy.

2.2. Challenges and Requirements

The following challenges associated with the ABAC policy mining problem can be highlighted:

Accuracy: The extracted rules must be consistent with the access log of the original policy. In inconsistent situations, rules are more permissive or more restrictive than the original policy.
Complexity: Policies are managed by humans; therefore, rules need to be easy to interpret, manage, and audit.
Imbalanced access log: There is an imbalance between permitted and denied access requests, the former being the majority class. Denied access requests occur less frequently because they are mainly user errors.
Sparse access log: In most cases, the access log does not represent the complete behavior of the system. Therefore, it is necessary that an access log with few records can extract a useful policy for the entire access control system.
Noisy access log: It is possible to find access requests that are erroneous or present some inconsistency. This happens when there is a misconfiguration in the system, sometimes due to updates made by the administrators. Noisy access requests can affect the quality of the mined policy, making it more permissive or more restrictive than it originally was.
Private access log: The data information in access control systems is sensitive; therefore, it is difficult to obtain an access log due to the security implications for the company.
Negative rules: Both positive and negative rules can exist in access control policies. Negative rules are used to establish exceptions to what is permitted by positive rules. Therefore, generating this type of rule can help improve the quality of the mined policy.

2.3. Policy Mining Algorithms

Although there exist different approaches to policy mining, there are common processes shared between them. The general methodology comprises three phases: (1) data preprocessing, (2) rule mining, and (3) policy refinement. During the first phase, data cleaning and data preprocessing tasks are conducted, for example, the handling of missing values and the conversion of numerical values into categorical ones. In the second phase, candidate ABAC rules are generated based on the attribute–value tuple frequency. Additionally, rules are modified to generalize them, to cover a broader set of attribute–value tuples or to make them more specific. In the last phase, the complete policy generated in the previous phase is evaluated to measure its quality. An accurate and comprehensible policy is a high-quality policy. Generating new rules, merging candidate rules, or simplifying candidate rules can help to improve the policy quality.

There are some variations in the policy mining approaches. For instance, Iyer and Masoumzadeh [17], Davari et al. [23], and Quan et al.’s [24] proposals permit the generation of both positive and negative rules. Karimi et al. [22], Davari et al. [23], Quan et al. [24], and Shang et al. [25] employ an algorithm to detect clusters in their access logs for the previous rule mining process.

We can observe several limitations in state-of-the-art approaches. The algorithm’s efficiency drops as the access log expands [16,17] or if there are many attributes and values [19]. In [21,22,23,24,25], many control variables must be selected to achieve a high-quality policy.

2.4. Evaluation Metrics

To evaluate the performance of our approach, we compare the access decision made by the mined policy with the original policy. It is a classification problem where the two classes are permitted and denied. If an access request is permitted in the original policy, the same access request must be permitted by the mined policy; we call this true positive (TP). If an access request is permitted by the original policy but not in the mined policy, it is known as false negative (FN). If an access request is denied by the original policy and the mined policy, then it is a true negative (TN). If an access request is denied by the original policy and permitted by the mined policy, we call this false positive (FP).

We use the F-score to measure the accuracy of our proposal. It is the harmonic mean of precision (exactness) and recall (completeness). The precision measures how many of the permitted requests are correct, while recall measures how many of the correct requests are permitted. The F-score, precision, and recall are calculated as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

F - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

We seek mined policies that are easy to interpret, manage, and audit by human operators. The Weighted Structural Complexity (WSC) is a generalization of policy size. It was introduced for RBAC policies [31] and then adapted for ABAC policies [16]. The WSC is the sum of the attribute–value expressions used to describe the extracted rules. It is calculated using the following formulas:

\begin{matrix} W S C (π) = W S C (P) \\ W S C (P) = \sum_{ρ \in P} W S C (ρ) \\ W S C (ρ = 〈 F 〉) = | F | \end{matrix}

(7)

where

P

is the ABAC rule set,

p \in P

is an ABAC rule, and

F

is the attribute filter of the ABAC rule. Our objective is to generate policies with a high F-score, which are more equivalent to the original policies, and with a low WSC, which are easier to manage.

3. Policy Mining Proposed Approach

The conceptual methodology of our proposed ABAC policy mining is shown in Figure 1. Our approach is divided into five phases: (1) data preprocessing, (2) network model, (3) community detection, (4) rule extraction, and (5) rule refinement. The input of the methodology is an access log

L

, and the output is a set of rules

P

.

3.1. Data Preprocessing

In this phase, we convert raw data into a cleaner form so that in the next phases of the methodology, it provides more useful information. The input of the phase is the access log

L

, and the output is the new clean access log

L^{'}

. We execute four tasks in this phase: (1) handling missing and null values, (2) converting continuous values to categorical values, (3) removing duplicated access requests, and (4) selecting the most used resources.

In the access log, we can find access requests with an error, that present some inconsistency or that are incomplete because there is a misconfiguration in the system or a modification is made by the administrators. This noisy access request and null values can affect the quality of the mined policy. For each attribute in A with a missing or null value, we introduce a unique placeholder value. For example, if there is a null value in a user attribute u.class and a missing value in a resource attribute r.type, we create two new values UNK1 and UKN2, respectively.

In the second task, we convert continuous values to categorical values. Some attributes have continuous values, for instance, the size of a resource r.size; in this case, there are an infinite number of values. To address such an issue, we map the continuous values into categories. In the example above, we can generate groups based on ranges of values.

Our bipartite network model consists of positive unique access requests; therefore, we remove all duplicated access requests, i.e., the same user accessing the same resource. In the last task, we calculate the access frequency of each resource in the access log. There are many resources with few access requests. In other words, just a small number of resources could help to extract more information to build a bipartite network model. We remove all resources under a specified threshold of number of access requests

T_{R}

.

3.2. Network Model

The objective of this phase is to build a network model from access log data. The input of the phase is the positive access log

L^{+}

, and the output is a complex network that models access interactions of users and resources in the AC mechanism. In this phase, we execute three tasks: (1) bipartite network construction, (2) user network construction, and (3) user network analysis.

3.2.1. Access Request Bipartite Network

The first task is to identify both the entities that will function as the nodes of the network, and the relationship between entities that will represent the links. The basic function of the access control mechanism is to decide whether to grant or reject an access request made by a user toward a resource. Therefore, we identify two disjoint sets of nodes: users and resources. A link exists between a user node and a resource node if a user makes an access request toward a resource. A user cannot make an access request toward another user, and a resource cannot make an access request toward another resource; for this reason, the bipartite network definition condition holds. We define the access request bipartite network formally as follows:

Definition 2

(Access Request Bipartite Network (ARBN)). The access request bipartite network is triplet

G_{U - R} = (V_{U}, V_{R}, E_{U - R})

defined as follows:

1.: $V_{U}$ is the set of nodes that represent users U.
2.: $V_{R}$ is the set of nodes that represent resources R.
3.: $V_{U} \cap V_{R} = \emptyset$
4.: $E_{U - R}$ is the set of links that exists between a user node $u \in V_{U}$ toward a resource node $r \in V_{R}$ if $q = 〈 u, r 〉 \in L^{+}$ .

3.2.2. User Network

Bipartite network projection is a method for obtaining two one-mode networks, in our case, a user network (a network with only user nodes) and a resource network (a network with only resource nodes). The one-mode networks are less informative than the bipartite network, but it is possible to apply a weighted projection to minimize the information loss. Our proposed approach only uses the user network. Two user nodes are joined by a link if both users have an access request for the same resource. The weight of the link is equal to the product of the importance of the common resource of each user node. The importance is determined based on the total number of resources each user accesses. We define the user network formally as follows:

Definition 3

(User Network (UN)). Let

G_{U - R} = (V_{U}, V_{R}, E_{U - R})

be an access request bipartite network. The

V_{U}

projection generates a user network

G_{U - U} = (V_{U - U}, E_{U - U})

defined as follows:

1.: $V_{U - U} \subseteq V_{U}$ is the set of nodes that represent users U.
2.: $E_{U - U}$ is the set of links that exists between two user nodes $u_{i}, v_{j} \in V_{U - U}$ if they have at least one common neighbor in $V_{R}$ of the network $G_{U - R}$ . The weight of a link is given by

$w (u_{i} u_{j}) = \frac{| N (u_{i}) \cap N (u_{j}) |^{2}}{d (u_{i}) \times d (u_{j})}$

(8)

where $N (u)$ is the neighborhood of the node u, i.e., the set of nodes that are linked to node u. $d (u)$ is the degree of the node u, i.e., the number of nodes linked to node u.

We analyze the generated user network to evaluate its properties and conclude whether we can use it as a complex network. We use the complex network definition from [32]:

Definition 4

(Complex Network). A Complex Network is a network

G = (V, E)

consisting of a non-empty set of nodes V and a set of links E. Furthermore, it complies with the following properties:

1.: $n = | V |$ , $m = | E |$ commonly in the order of thousands or millions.
2.: Low average degree $〈 k 〉 < < n$ .
3.: Low density $d < < 1$ .
4.: Scale-free distribution $P (k) \approx k^{- α}$ .
5.: Low average path length $(L) < < n$ .
6.: High average cluster coefficient $1 / n < < (C) < < 1$ .

3.3. Community Detection

The main objective of this phase is to group the user nodes in communities or subnetworks. When we identify communities, the nodes within a community may share common properties or play similar roles in the complex network [33]. A community is a group of nodes where there are a large number of links between nodes within the same community and with the rest of the nodes in other communities. From the communities, it is feasible to extract user characteristics or properties that can help in the generation of ABAC policy rules.

The input data of this phase is the user network obtained in the previous phase. The output data will be a node partition

C_{G_{U - U}}

. In the current phase, two tasks are executed: (1) community detection and (2) community classification.

In the first task, a node partition is obtained by a community detection algorithm, and then, they are classified into three groups based on the number of resources accessed by each community.

3.3.1. Community Detection

We use the Louvain algorithm [34], which is based on optimizing the modularity quality function. The Louvain algorithm was chosen based on a prior comparative analysis of various methods, where we sought a balance between maximizing modularity and computational efficiency. The Louvain method demonstrated superior performance in this regard. Modularity is one of the most popular quality functions for measuring the quality in obtained community partition [35]. It compares the link density of each community with the density that could be obtained from the same community, but from a random network, where a community structure is not obtained [33]. By definition,

- 1 \leq Q (P) \leq 1

; a value close to unity indicates a better partition. The definition of modularity is

Q (P) = \frac{1}{2 m} \sum_{i j} [A_{i j} - \frac{k_{i} k_{j}}{2 m}] δ (C_{i}, C_{j})

(9)

where m is the number of links in the network, and A is the adjacency matrix. The term

\frac{k_{i} k_{j}}{2 m}

represents the expected number of links given the degree of the two nodes and the total number of links in the network. The function

δ

returns one if the nodes i and j belong to the same community; it returns zero in the other case.

Louvain’s algorithm to detect communities iteratively executes two phases. First, each node is a new community. Next, each node joins the community of one of its neighbors which provides a higher modularity value. The above is repeated for all nodes until the modularity value does not improve with new node joins. In the second phase, all the nodes of a community are grouped, and a new network is built where the nodes are the communities of the previous phase. The time complexity of the Louvain algorithm is

O (n \times l o g (n))

, where n is the number of nodes in the network. The algorithm is non-deterministic and belongs to the group of agglomerative hierarchical algorithms.

In our approach, the detected communities correspond to groups of users that exhibit similar interaction patterns within the access control system. These communities arise from the projection of the ARBN, where nodes represent users and resources, and edges indicate recorded accesses. Projecting this graph onto the set of users generates a network in which users are connected if they share access to common resources. This structured projection allows the application of community detection techniques, grouping users who access similar sets of resources. Thus, communities reflect functional groupings that can be aligned with organizational roles or access profiles, providing a structural basis for rule inference in the ABAC model. At the end of the algorithm, we will obtain the user communities defined as follows:

Definition 5

(User Communities). Let

G_{U - U} = (V_{U - U}, E_{U - U})

be a user network. A community

C = (V_{U - U}^{C}, E_{U - U}^{C})

is a sub-network of the user network

G_{U - U}

, where

V_{U - U}^{C} \subseteq V_{U - U}

and

E_{U - U}^{C} \subseteq E_{U - U}

. The user communities

C_{G_{U - U}}

of the user network

G_{U - U}

is

\begin{matrix} C_{G_{U - U}} = {C_{1}, C_{2}, \dots, C_{l}} where C_{1} \cup C_{2} \cup \dots \cup C_{l} = V_{U - U} and \\ for all i, j = 1, 2, \dots, l, C_{i} \cap C_{j} = \emptyset as long as i \neq j \end{matrix}

(10)

In the community partition

C_{G_{U - U}}

, we can find sparse communities, i.e., networks with a low density. Our objective is to obtain dense communities; for this reason, we apply the Louvain algorithm to those sparse communities to obtain dense sub-communities. Ultimately, we expect that all communities and sub-communities have a density value close to a unit; in other words, the groups of users detected are densely connected.

3.3.2. Community Classification

We classify communities based on the number of resources accessed by each community. There are three types of communities: F-type: focused; M-type: medium; and S-type: sparse. Regardless of the size of the community, focused communities access few resources, typically one or two. Sparse communities, on the other hand, have access to a large number of resources, greater than a threshold

T_{S}

. Medium communities access between three and

T_{S}

.

Each detected community

c \in C_{G_{U - U}}

is evaluated based on the number of resources accessed

R^{c}

. To begin, we find the maximum number of resources accessed by any community, denoted as

R_{m a x}^{C}

. Two thresholds

t_{R 1}

and

t_{R 2}

are set, using a fraction of the number of resources

t_{1}

and

t_{2}

. S-type communities have greater resources than

t_{R_{1}}

. M-type communities have resources between

t_{R_{2}}

and

t_{R_{1}}

, and F-type communities have resources below

t_{R_{2}}

. Algorithm 1 shows the process of community classification.

Each detected community

c \in C_{G_{U - U}}

is evaluated based on the number of resources accessed

R^{c}

. To begin, we find the maximum number of resources accessed by any community, denoted as

R_{m a x}^{C}

. Two thresholds

t_{R 1}

and

t_{R 2}

are set, using a fraction of the number of resources

t_{1}

and

t_{2}

. S-type communities have greater resources than

t_{R_{1}}

. M-type communities have resources between

t_{R_{2}}

and

t_{R_{1}}

, and F-type communities have resources below

t_{R_{2}}

. Algorithm 1 shows the process of community classification.

Algorithm 1 Community classification algorithm.

1:: procedure CommunityClassification
Input:: $C_{G_{U - U}}$ , $t_{1}$ , $t_{2}$
Output:: $C_{G_{U - U}}^{'}$
2:: $R_{m a x} \leftarrow m a x ([R^{c} f o r c \in C_{G_{U - U}}])$
3:: $t_{R 1} = t_{1} * R_{m a x}$
4:: $t_{R 2} = t_{2} * R_{m a x}$
5:: for all $C \in C_{G_{U - U}}$ do
6:: if $R^{c} > t_{R 1}$ then
7:: $C . t y p e \leftarrow S$
8:: else if $R^{c} > t_{R 2}$ then
9:: $C . t y p e \leftarrow M$
10:: else
11:: $C . t y p e \leftarrow F$
12:: end if
13:: end for

3.4. Policy Rule Extraction

In this phase, we analyze the communities and generate ABAC policy rules. Then, a rule network is generated. The input data are the classified user communities, obtained from the previous phase. The output is a network of ABAC policy rules.

According to Definition 1, a rule is a tuple

ρ = 〈 F, o p, d 〉

. In our proposal, rules are limited to having only the attribute filter

F

, and the decision value d. Additionally, we add the unique identifier of the communities from which the rules were inferred. The identifier will help the access decision in the AC. An example of a rule generated in our proposal is shown below:

\begin{matrix} r u l e 1 = 〈 [i d C o m, i d 1], [u . d p t o = c c], [u . c l a s s = m a t h], \\ [r . g r o u p = a], [r . c l a s s s = m a t h], p e r m i t 〉 \end{matrix}

3.4.1. Rule Inference

There are three ways to extract rules from communities, depending on the type of community. Algorithm 2 outlines the process of inferring rules. The input to the algorithm is the community classification

C_{G_{U - U}}

, and the output is the set of rules

P

.

F-type communities: The first step is to extract the resource set

R^{'}

that is accessed by all users in the community. If there is more than one resource (

| R^{'} | > 1

), all attribute–value pairs in common, among all resources in

R^{'}

, are taken to add to the rule. If there is only one resource that all users in the community access, then all the attribute–value tuples that identify the resource are taken to generate the rule. The next step is to obtain the attribute–value tuples of the users. The tuples with the highest frequency in the community are obtained.

M-type communities: A less frequent resource pruning process is used (Algorithm 2, line 7). In M-type communities, it is possible to identify less frequent resources that are accessed by a small number of users compared with the rest of the resources. The information that less frequent resources can offer us does not have any meaningful impact on the generation of rules; on the contrary, it could add noise to the rule. Less frequent resources do not describe or reflect user access in a community.

Algorithm 2 Rule inference algorithm.

1:: procedure RuleInference
Input:: $C_{G_{U - U}}, t_{R - f r e q}$
Output:: $P$
2:: $P \leftarrow \emptyset$
3:: for all $C \in C_{G_{U - U}}$ do
4:: $R^{'} \leftarrow g e t R e s o u r c e s (C)$
5:: ▹ only M-type or S-type communities.
6:: if $C . t y p e \in {M, D}$ then
7:: $R^{'} \leftarrow r e m o v e L e s s F r e q u e n t R e s o u r c e s (R^{'}, t_{R - f r e q})$
8:: ▹ only S-type communities.
9:: if $C . t y p e \in {D}$ then
10:: $R_{s i g}^{'} \leftarrow g e t S i g n i f i c a n t R e s o u r c e s (R^{'})$
11:: for all $r^{'} \in R_{s i g}^{'}$ do
12:: $ρ \leftarrow {〈 i d C o m = C . i n d e x 〉}$
13:: $ρ \leftarrow ρ \cup a t t r i b u t e V a l u e (r^{'})$
14:: $ρ \leftarrow ρ \cup a t t r i b u t e V a l u e C o m m o n (V_{U - U}^{C})$
15:: $P \leftarrow P \cup {ρ}$
16:: end for
17:: $R^{'} \leftarrow R^{'} - R_{s i g}^{'}$
18:: end if
19:: end if
20:: ▹ all communities: F-type, M-type, and S-type.
21:: $ρ \leftarrow {〈 i d C o m = C . i n d e x 〉}$
22:: $ρ \leftarrow ρ \cup a t t r i b u t e V a l u e C o m m o n (R^{'})$
23:: $ρ \leftarrow ρ \cup a t t r i b u t e V a l u e C o m m o n (V_{U - U}^{C})$
24:: $P \leftarrow P \cup {ρ}$
25:: end for

To prune less frequent resources, it is necessary to set a threshold based on the number of users accessing a resource

t_{R - f r e q}

. The threshold value can vary, but it is recommended that it be

10 %

or up to

20 %

for all users—for example, if there are 20 users in community C,

| V_{U - U}^{C} | = 20

and a threshold of

10 %

is established; that is,

t_{R - f r e q} = 0.10 | V_{U - U}^{C} |

. In this example, the resources that are accessed by two users or fewer are considered less frequent resources; therefore, they are removed from the total of resources in the community. Once the resource pruning is executed, the same process is applied as for F-type communities.

S-type communities: This type of community is characterized by having a large number of users and resources. As there are more resources in the community, it is more difficult to generate rules that characterize it. It is possible to identify a small set of resources that are accessed by the most users in the community. The set of most frequent resources, called significant resources, is useful for characterizing the entire community of users.

As in the M-type communities, less frequent resources are filtered out. Subsequently, the F-type community approach is applied with a slight modification. Rules are generated with the set of significant resources, the common attribute–value tuples are taken from the set, and the rule is completed with the users of the community. Then, with the less significant resources not eliminated, rules are generated with the F-type community approach. S-type communities tend to generate more rules than the rest of the communities.

3.4.2. Rule Network Modeling

The construction of the rule network

G_{P}

takes as nodes the rules of the set

P

. A link joins two nodes if the Jaccard index is above a given threshold

t_{P}

as given in Definition 6. The threshold

t_{P}

serves two key purposes: first, to ensure that the resulting network maintains a low edge density, and second, to optimize the F-score value. Through empirical evaluation, we identify the threshold value that yields the best trade-off between sparsity and classification accuracy. Having a densely connected rule network is not beneficial because it would result in all rules being connected, which is not desirable during evaluation. The goal is to preserve the distinctiveness of individual rules.

Definition 6

(Rule Network). The rule network is a tuple

G_{P} = (V_{P}, E_{P})

defined as follows:

1.: $V_{P}$ is the set of nodes that represent rules $P$ .
2.: $E_{P}$ is the set of links that exists between a node $ρ_{1} \in V_{P}$ toward a node $ρ_{2} \in V_{P}$ if the Jaccard index $J (ρ_{1}, ρ_{2})$ is above a given threshold $t_{P}$ .

$(ρ_{1}, ρ_{2}) \in E ⟺ J (ρ_{1}, ρ_{2}) > t_{P}$

(11)

where $J (ρ_{1}, ρ_{2}) = \frac{| F_{ρ_{1}} \cap F_{ρ_{2}} |}{| F_{ρ_{1}} \cup F_{ρ_{2}} |}$ .

3.5. Policy Refinement

The objective of policy refinement is to improve the quality of the mined policy in the previous phase by tackling false negatives and false positives. We create new rules to overcome false negatives and negative rules for false positives. Negative rules restrict access to requests that match their attribute–value tuples. They have the opposite functionality of the rules generated. Because they are specific rules, they tend to contain many attribute–value tuples.

The input of policy refinement is the set of false negatives (FN), false positives (FP), and rule network

G_{P}

. At the end, the output is an augmented set of rules and an updated rule network. The phase is composed of three tasks: (1) FN-refinement, (2) FP-refinement, and (3) rule network update.

FN-refinement: The set of FNs is those access requests that are allowed by the original policy but not by the mined policy. The rule inference process is applied but as the input for the access requests in the FN set. At the end of the process, a set of rules based on the FN requests will be obtained.

FP-refinement: For FPs, it is not possible to apply the complete rule inference approach because there are no relationships between users and resources in the access requests. Refinement in FP tracks the rules that produce the requests in FP and, from these rules, generates negative rules. Value attribute tuples of requests in FP are added to the positive rule generated by the FP to generate a specific negative rule.

Rule network update: Once the rules are generated in the two previous stages, the rule network

G_{P}

is updated by adding the new rules obtained in the FN-refinement. In the same way, the Jaccard similarity index is used to create new links between the existing and new nodes. For the resolution of the requests, the community identifier obtained in the rule inference phase is considered.

4. Access Decision

In the traditional ABAC, the Policy Decision Point (PDP) computes access decisions by evaluating the ABAC policy. In our approach, we add a new stage in the PDP that computes a smaller subset of rules to compare. The main advantage of our proposal is that we reduce the number of rules to evaluate. Figure 2 shows our proposal to add in the PDP the stage Rule Selection by ID (RS), which uses the topological structure of the rule network to decide the subset of rules to evaluate for a specific access request. With this, we optimize the access decision process specifically for complex and big policies.

The flow chart of the RS stage is shown in Figure 3. First, we need to know whether a user node from the access request exists in the user network. If it does not exist, we add it as a neighbor of all nodes that access the same resource. With this, we simulate the topological position of the new user node in the projection operation. The weight of the new links will be equal to 1. The Algorithm 3 shows the process. Lines 3–6 obtain all nodes that access the same resource from the access request. In line 8, we add the user node in the network and, in the last for loop, we add all links from the new user node to each node with the same resource. The next process is to assign the new user to the community with a higher modularity value. The assignCommunity function (Algorithm 4) obtains all communities from all neighbors of the user node. Then, the user node removes and moves into the community of each neighbor and calculates the modularity value. The user node is placed into the community with the greatest modularity value. At this point, the user node belongs to a community, and all rules have the ID of the communities from which they were generated. Thus, we can obtain all rules by ID user node community. The getRulesbyID function returns all rules with the ID user node community, and the getNRules function returns the rule node neighbors of each rule with the ID user node community. All the rules of both functions are used for evaluating the request and computing an access decision.

Algorithm 3 Add a new user node to the user network.

1:: procedure addNode
Input:: $q = (u, r)$ , $G_{U - U} = (V_{U - U}, E_{U - U})$
2:: $U_{r e s o u r c e} \leftarrow \emptyset$
3:: for all $n \in V_{U - U}$ do
4:: if $n . r e s o u r c e = q . r$ then
5:: $U_{r e s o u r c e} \leftarrow U_{r e s o u r c e} \cup {n}$
6:: end if
7:: end for
8:: $V_{U - U} \leftarrow V_{U - U} \cup {q . u}$
9:: for all $n \in U_{r e s o u r c e}$ do
10:: $E_{U - U} \leftarrow E_{U - U} \cup {(q . u, n, 1)}$ ▹ add link
11:: end for

Algorithm 4 Add a new user node to the best community.

1:: procedure addCommunity
Input:: $u, G_{U - U}$
2:: $Q_{m a x} \leftarrow 0$
3:: $C_{Q m a x} \leftarrow N o n e$
4:: for all $v \in n e i g h b o r s (u)$ do
5:: $u . c o m m u n i t y \leftarrow v . c o m m u n i t y$
6:: $Q^{'} \leftarrow m o d u l a r i t y (G_{U - U})$
7:: if $Q^{'} > Q_{m a x}$ then ▹ assign to the community that increases the modularity
8:: $Q_{m a x} \leftarrow Q^{'}$
9:: $C_{Q m a x} \leftarrow v . c o m m u n i t y$
10:: end if
11:: end for
12:: $u . c o m m u n i t y \leftarrow C_{Q m a x}$

5. Results

5.1. Datasets

To evaluate the performance of our proposed methodology, there are two types of access logs (datasets): synthetic/manual and real-world. The synthetic access logs are manually written or randomly generated by an algorithm. It is possible to set parameters such as the number of rules, attributes, users, and resources, which is advantageous for testing purposes. The real-world access logs are extracted from access control systems implemented by an organization. The availability of real datasets is limited due to the security risk and data sensitivity. In our experiments, we use three datasets:

Healthcare (HC): This dataset has access control records of health center staff. The users are nurses, doctors, and patients. The resources are electronic health and history files. It is a synthetic dataset and is adapted from [16]. There are 1000 users and 2000 resources. The size of the access log is equal to 18,000 access requests: 9000 requests positive and negative.

Amazon Kaggle (AMZ): This dataset has access control records of Amazon’s employees. It is a real-world dataset where the users and resources are replaced by numerical values to maintain the organization’s privacy. The dataset contains more than 12,000 users, 7000 resources, and 32,000 access requests, of which 30,000 are positive access requests and 2000 are negative access requests. This dataset is from a Kaggle competition [36].

Smart Building IoT (IoT): This is a synthetic dataset for creating interactions between individuals and IoT devices in a smart building. First, we developed the policy rules. Then, we established attributes and values for users and resources. To generate the access log, we relied on the participation distribution of both users and resources in the real-world dataset from Amazon. Our dataset contains 120 users, 990 resources, and 49,925 access requests, where approximately the same number of positive and negative accesses occur.

Table 2 shows the details of the datasets.

| A |

is the number of attributes,

| V |

is the number of unique values that the A attributes can take in the

L

log,

| L |

shows the number of access requests in the dataset,

| L^{+} |

shows positive access requests, and

| L^{-} |

shows negative access requests by policy.

5.2. Experimental Setup

We use a computer with PopOs Linux 5.16, a 3.3 GHz Intel Core i5-2500, 8GB of RAM, and a GTX 780 graphic card. The implementation is performed in Python 3.7. We use the NetworkX library to build the network models and the igraph library for network analysis. To validate the results of our experiments, we use K-fold Cross-Validation [37] with

K = 10

,

80 %

training data, and

20 %

testing data.

5.3. General Results

Table 3 shows the results obtained after the first three phases of both datasets. After the data preprocessing phase, we obtain 2437 access requests with 621 users and 211 resources from the HC dataset. With the AMZ dataset, we obtain 9504 access requests with 5436 users and 150 resources. In the IoT dataset, we obtain 49,925 access requests with 120 users and 990 resources. In the second phase, the network model, we generate a user network with 621 nodes and 14,923 links from the HC dataset. With the AMZ dataset, we generate a user network with 5436 nodes and 743,311 links. From the IoT dataset, we generate a user network with 120 nodes and 2095 links. In the community detection phase, we obtain a modularity value of 0.48, 0.65, and 0. for the HC, AMZ, and IoT datasets, respectively.

Table 4 shows the results of the quality and complexity of the policies generated after the rule inference and policy refinement phase. The results show how the F-score improves after the refinement phase in all datasets.

Table 5 shows a comparative analysis of the policy quality and complexity results using Cotrini et al. [19], Jabal et al. [21], Karimi et al. [22], and Quan et al.’s [24] approaches. This is because, to the best of our knowledge, they constitute the only studies in the literature that employ the same datasets as our work. The remaining studies listed in Table 1 rely on different datasets or on modified/adapted versions for the evaluation of their approaches. The results lead us to conclude that our approach generates policies with high quality but is more complex than the state-of-the-art approaches.

While the recent approach by Quan et al. [24] achieves a marginally higher F-score on the

π_{A M Z^{'}}

dataset, our method demonstrates distinct advantages regarding policy transparency and structural efficiency. Note that Quan et al. do not report complexity metrics (

| P |

and WSC), making a comprehensive comparison challenging. In contrast, our network-based approach generates a policy with well-defined structural properties (156 rules, WSC = 564), providing a complete and interpretable ABAC policy. This explicit modeling of policy structure offers significant practical benefits for policy analysis, maintenance, and refinement in real-world deployments, representing an important contribution beyond raw accuracy metrics.

5.4. Detailed Results

5.4.1. HC

Analyzing the HC dataset, the size of positive access requests is equal to 8735 and negative access requests equal to 8998. It contains five user attributes and six user attributes. Applying task one, we obtain ten new elements; add at least one element for each attribute. Task two does not apply to the dataset because there are no continuous values. Then, in task three, the positive access request is reduced to 6998 with 1041 users and 2113 resources. Figure 4a shows the distribution of access requests. It is observed that there is a small number of resources that have a high number of access requests. On the other hand, there is a large number of resources that have few access requests. For the experiment, all those resources that are accessed with a number of requests greater than six are used. The dotted line shows the resources with more requests, which is equal to

10 %

of all the resources. Each request that accesses the 211 resources is accepted, i.e., 2347 positive requests with 621 users and 211 resources.

In the network model phase, the ARBN

G_{U - R}^{H C} = (V_{U}, V_{R}, E_{U - R})

, where

| V_{U} | = 621

,

| V_{R} | = 211

, and

| E_{U - R} | = 2347

generate the UN

G_{U - U} = (V_{U - U}, E_{U - U})

, where

| V_{U - U} | = 621

and

| E_{U - U} | =

14,923. Table 6 shows the complex network properties of the UN, and Figure 5a its degree distribution. According to the topological properties, the UN is a complex network.

In the third phase, community detection, the Louvain algorithm detects 13 communities with modularity value equal to

0.4750

. When applying sub-community detection, an average of 42 communities are generated. Then, the 42 communities are classified into three groups using

t_{R 1} = 0.5

and

t_{R 1} = 0.2

(see Figure 6a). The community with the largest number of resources is 94 of the 211 resources.Taking the previous community as

100 %

, the threshold for determining S-type communities is that they have access to more than 47 (

50 %

) resources, and the threshold for determining M-type communities is that they have access to more than 23 (

25 %

) resources but fewer than 48; the threshold for determining F-type communities is that they have access to fewer than 23 resources. On average, 2 S-type, 12 M-type, and 28 F-type communities are generated.

Once the communities are classified, the rule inference algorithm is applied to each community. It generates 42 rules, one per community. The

T_{R - f r e q}

threshold is set to 0.2, and the

T_{R - s i g n}

threshold is 0.5. With the rules, it generates a rule network

G_{P} = (V_{P}, E_{P})

where

| V_{P} | = 42

and

| E_{P} | = 244

with a threshold

T_{P} = 3

. At this point, the generated policy has an F-score value equal to

0.8289

and a complexity

W S C = 231

.

Finally, in the policy refinement phase, the augmented rule network

G_{P}^{'} = (V_{P}^{'}, E_{P}^{'})

, where

| V_{P}^{'} | = 106

and

| E_{P}^{'} | = 1247

. Figure 7 shows the rule network after the refinement phase. It is possible to identify the community where the rules were generated and in which phase of the methodology they were generated. The F-score of the generated policy is equal to

0.9842

; FN decreases, as well as FP. The former is because of applying our approach again in FN, and the latter is a result of the generation of negative rules. The complexity

W S C

of the generated policy is equal to 389. The average total runtime is 1.16 min.

5.4.2. AMZ

This dataset has 30,872 positive access requests and 1897 negative access requests. It contains only eight user attributes. It does not have null values or continuous values. Thus, tasks one and two of the data preprocessing phase do not apply. In task three, duplicate access requests are eliminated, and positive access requests are reduced to 24,697. Figure 4b shows the distribution of access requests. At the end of the first phase, the AMZ dataset has 9501 positive access requests, 5436 users, and 150 resources.

The AMZ ARBN network

G_{U - R}^{A M Z} = (V_{U}, V_{R}, E_{U - R})

, where

| V_{U} | = 5436

,

| V_{R} | = 150

, and

| E_{U - R} | = 9501

generate the UN

G_{U - U} = (V_{U - U}, E_{U - U})

, where

| V_{U - U} | = 5436

and

| E_{U - U} | =

743,311. According to the topological properties shown in Table 7, the UN is a complex network.

In the community detection phase, the first execution of the Louvain algorithm detects 16 communities with a modularity value equal to

0.65

. Then, the second execution generates an average of 82 communities. The threshold for determining S-type communities is that they have access to more than 51 (

50 %

) resources, the threshold for determining M-type communities is that they have access to more than 25 (

25 %

) resources but fewer than 51 and the threshold for determining F-type communities is that they have access to fewer than 25 resources, as we can see in Figure 6b. On average, 7 S-type, 8 M-type, and 67 F-type communities are generated.

The fourth phase, rule inference extraction, extracts 82 rules, one per community. Like the CS dataset, the same values are used for the thresholds:

T_{R - f r e q} = 0.2

and

T_{R - s i g n} = 0.5

. The generated rule network

G_{P} = (V_{P}, E_{P})

, where

| V_{P} | = 82

and

| E_{P} | = 946

, with a threshold

T_{P} = 2

. At this point, the generated policy has an F-score value equal to

0.9707

and a complexity

W S C = 274

.

In the last phase, the augmented rule network

G_{P}^{'} = (V_{P}^{'}, E_{P}^{'})

, where

| V_{P}^{'} | = 189

and

| E_{P}^{'} | = 1366

. Figure 8 shows the rule network after the refinement phase. We can see a large number of isolated nodes, about

50 %

of all the nodes. This type of node describes a small set of access requests that have particular attribute–value tuples; for this reason, it is not possible to reach the threshold of similarity to add a link with other rules. The F-score of the generated policy is equal to

0.9923

and its complexity

W S C = 564

. The average total runtime is 2.35 min.

5.4.3. IoT

In the preprocessing phase, the first three tasks—handling missing/null values, converting continuous values, and removing duplicate requests—are skipped due to the absence of such cases in the dataset. However, a selection of the most frequently accessed resources are still selected. Figure 4c shows the distribution of the frequency of accesses to each resource. From a total of 990 resources, 400 more frequent resources are obtained. In the end, we obtain 120 users and 400 resources with about 17 K of requests.

In the network model phase, we generate an IoT ARBN

G_{U - R}^{I o T} = (V_{U}, V_{R}, E_{U - R})

, where

| V_{U} | = 120

,

| V_{R} | = 400

and

| E_{U - R} | =

17,525. Then, we generate the UN

G_{U - U} = (V_{U - U}, E_{U - U})

, where

| V_{U - U} | = 120

and

| E_{U - U} | = 1380

. According to the topological properties shown in Table 8, the UN is a complex network.

In the third phase, community detection, the first execution of the Louvain algorithm detects 11 communities with a modularity value equal to 0.75. Then, the second execution generates an average of 49 communities. The threshold to determine S-type communities is that 200 (

50 %

) resources, M-type communities access to lower than 200 resources and greater than 100 (

18 %

). F-type communities have access to less than 40 resources. Figure 6c shows the number of resource distribution. On average, eleven S-type, five M-type, and five type F communities are generated.

The fourth phase, rule inference, extracts 49 rules. The thresholds values used are

T_{R - f r e q} = 0.2

and

T_{R - s i g n} = 0.5

. The generated rule network

G_{P} = (V_{P}, E_{P})

where

| V_{P} | = 49

and

| E_{P} | = 515

with a threshold

T_{P} = 3

. At this point, the mined policy has a F-score value equal to

0.8112

and a complexity

W S C = 203

.

Finally, in the policy refinement phase, the augmented rule network

G_{P}^{'} = (V_{P}^{'}, E_{P}^{'})

where

{| V_{P} |}^{'} = 101

and

| E_{P}^{'} | = 1722

. Figure 9 shows the rule network after the refinement phase. The F-score of the mined policy is equal to

0.9254

. The complexity

W S C

of the generated policy es equal to 461. The average total runtime is 3.20 min.

5.5. Evaluation Under Data and Structural Variations

This subsection presents an evaluation of the proposed algorithm under different data and structural conditions. We examined its performance with noisy and sparse datasets, tested its sensitivity to community classification thresholds, and compared its effectiveness against randomized network structures. Each experimental configuration was executed 10 times, and the reported results represent the average values obtained across all runs.

We assessed the efficiency of the proposed algorithm on datasets corrupted with synthetic noise, as well as on sparse datasets (hereafter referred to as partial datasets). The former was achieved by randomly reversing the access decisions (grant/deny) of a random subset of access requests; a 10% noisy dataset was produced by altering 10% of the complete access log. To evaluate performance on sparse data, we generated partial datasets by randomly selecting subsets of access requests from the complete access log. Table 9 summarizes the results of policy quality and complexity under noisy and partial conditions for the three evaluated datasets. The results show that the proposed approach maintains a stable policy structure even when the access logs are affected by noise or incompleteness. For all datasets, the F-score decreases moderately, indicating that the method is robust against perturbations in the input data. The number of mined rules (

| P |

) and the weighted structural complexity (WSC) remain nearly constant, suggesting that the underlying structure of the mined policy is preserved despite degraded data quality.

The results of the policy complexity and quality analysis across different threshold values are presented in Table 10. The data indicates a clear trade-off between policy quality, measured by F-score, and complexity, measured by the number of policies (

| P |

) and the WSC. For the HC and AMZ datasets, the configuration with thresholds

t_{1} = 0.5

and

t_{2} = 0.25

achieved the highest F-score (0.9842 and 0.9923, respectively), while also maintaining or reducing the WSC compared with other threshold pairs. The IoT dataset showed a different trend, where the F-score improved from 0.9133 to a peak of 0.9254 at

t_{1} = 0.5

, with no further improvement at

t_{6} = 0.6

, although at the cost of a slight increase in WSC. Overall, these findings suggest that an intermediate threshold value of

t_{1} = 0.5

and

t_{2} = 0.25

provides a balance, consistently yielding high-quality access control policies with manageable complexity across diverse datasets. It is important to note, however, that, while these specific threshold values (

t_{1} = 0.5

,

t_{2} = 0.25

) yielded the best results in this study, they are inherently data-dependent; the optimal parameters may vary for other datasets with different characteristics and distributions.

For comparative analysis, random networks were generated using the configuration model, preserving the original network’s degree sequence while randomizing connections. Edge weights were assigned randomly, and node attributes were copied from the original networks. The comparative evaluation of our proposed model against a random network structure is summarized in Table 11. The results demonstrate a consistent and significant superiority of our model in terms of policy quality, as reflected by the F-score. Specifically, our model achieved markedly higher F-scores across all datasets compared to the random network. In terms of complexity, the random network occasionally produced a marginally lower number of policies or WSC. These findings conclusively indicate that our model’s structured approach yields a far more favorable balance between quality and complexity, effectively prioritizing high-fidelity policy generation without a significant increase in structural overhead.

5.6. Computational Efficiency

Table 12 summarizes the time complexity of each phase and task in our approach. The first phase of our proposal includes four tasks: handling missing/null values, converting continuous values, removing duplicates, and selecting frequently accessed resources. The first two tasks have a time complexity of

O (| L | | A |)

, while the last two run in

O (| L^{+} |)

. The time complexity of the first phase is dominated by

O (| L | | A |)

. In the second phase, the bipartite network task has a time complexity of

O (| L^{+} |)

, and the user network task,

O (| V_{U} |)

.

Note that

| V_{U} | < < O (| L^{+} |)

in all datasets; thus, the network model phase is determined by

O (| L^{+} |)

. The community detection phase has a time complexity obtained by the Louvain algorithm

O (| V_{U - U} | l o g | V_{U - U} |)

. In the fourth phase, the rule inference task has a time complexity where the

R^{C}

is the resources in the community c. The time complexity of the rule network is equal to

O (P^{2})

; thus, the time complexity of the rule mining phase is dominated by the rule inference task, i.e.,

O (| V_{U - U} | \times | R^{c} |)

. Finally, the policy refinement phase is determined by the number of false negatives or false positives. The policy rule extraction phase dominates the total complexity of our proposal.

6. Conclusions and Discussions

In this work, a novel approach for ABAC policy mining based on the analysis of complex networks is proposed. Each phase of the proposal is described, and experimental results are presented. Our work receives as input an access control log which contains the records of access requests, both permitted and denied. The output is the ABAC access control policy, modeled as a network, which contains the rules and their relationships that will be evaluated to determine whether an access request is permitted or denied.

Our approach transforms access logs into a bipartite network of users and resources, which is converted into a user network through weighted projection based on shared resources. The Louvain algorithm detects communities in this network, classified into three types by resource access patterns. For each community, rules are inferred from attribute frequencies and adapted to community characteristics, forming a structured rule network.

The initial rule network is refined by addressing uncovered positive requests and incorporating negative rules, resulting in an augmented policy with improved accuracy. This network-based methodology leverages topological properties to identify functional roles and access patterns, proving to be an effective alternative to traditional clustering techniques while maintaining interpretability and scalability in access control management.

Our network-analysis-based approach generates ABAC policies with high accuracy; that is, they replicate with high similarity the access control policy that originated the access log. The set of extracted rules is modeled as a network; the policy obtained is not a list of rules but a structure of rules which helps in the process of rule evaluation. This constitutes a novel approach to modeling access control rules not previously reported in the literature.

The comprehensive evaluation demonstrates the robustness and adaptability of our approach under various challenging conditions. When tested on noisy datasets, generated by randomly reversing access decisions, and sparse datasets, created by randomly selecting subsets of access requests, our method maintained stable policy structures with only moderate decreases in F-score. This resilience to data perturbations confirms that the underlying network structure effectively captures essential access patterns rather than overfitting to specific data points. Furthermore, our investigation of threshold parameters revealed that intermediate values (

t_{1} = 0.5

,

t_{2} = 0.25)

generally provide an optimal balance between policy quality and complexity across diverse datasets, though we acknowledge that these parameters may require adjustment for datasets with substantially different characteristics.

Crucially, comparative analysis against randomly structured networks—generated using the configuration model with random edge weights—validates the significance of our structured approach. The substantially higher F-scores achieved by our model across all datasets, with only minimal increases in complexity metrics, demonstrate that the discovered network topology contains meaningful access control information beyond mere connectivity patterns. This superior performance persists even when the random networks occasionally achieve slightly lower complexity, indicating that our method’s quality improvements justify the modest complexity overhead.

While our approach demonstrates strong performance in replicating access control policies and introduces a novel network-based modeling framework, it also presents certain limitations. One notable challenge is the high number of rules generated, which can increase the complexity of the mined policy and potentially affect its manageability. Although we propose viewing complexity through the lens of rule network structure, further work is needed to develop formal metrics that capture structural interpretability and operational efficiency.

Moreover, the current methodology relies on a fixed network modeling strategy, which may not be optimal across all datasets or organizational contexts. To address this, future work will explore more flexible and context-aware modeling approaches, including alternative community detection strategies that better adapt to the structure of user–resource interactions. Additionally, we plan to investigate automated threshold selection methods and expand our evaluation to include more diverse organizational settings and security policies to further validate the generalizability of our approach.

Author Contributions

These authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by SECIHTI grant number 1074880.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Healthcare (HC) data presented in this study are openly available in the author repository at https://doi.org/10.1007/978-3-662-43936-4_18 (accessed on 24 August 2025). The Amazon (AMZ) dataset can be found here: https://www.kaggle.com/datasets/lucamassaron/amazon-employee-access-challenge (accessed on 24 August 2025). The Smart Building IoT (IoT) data presented in this study are openly available at https://cinvestav365-my.sharepoint.com/:x:/g/personal/hector_diaz_cinvestav_mx/EVD0VaKyqVtBl3_jgJz_pnQBGhNILm0ASrvxqIOFT3wLMQ?e=TPOoMt (accessed on 24 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krit, S.d.; Haimoud, E. Review on the IT security: Attack and defense. In Proceedings of the 2016 International Conference on Engineering & MIS (ICEMIS 2016), Agadir, Morocco, 22–24 September 2016; pp. 1–12. [Google Scholar]
Cost of a Data Breach 2024. Available online: https://www.ibm.com/downloads/cas/E3G5JMBP (accessed on 20 January 2025).
Sandhu, R.; Samarati, P. Access control: Principle and practice. IEEE Commun. Mag. 1994, 32, 40–48. [Google Scholar] [CrossRef]
Qiu, J.; Tian, Z.; Du, C.; Zuo, Q.; Su, S.; Fang, B. A Survey on Access Control in the Age of Internet of Things. IEEE Internet Things J. 2020, 7, 4682–4696. [Google Scholar] [CrossRef]
Ravidas, S.; Lekidis, A.; Paci, F.; Zannone, N. Access control in Internet-of-Things: A survey. J. Netw. Comput. Appl. 2019, 144, 79–101. [Google Scholar] [CrossRef]
Zhang, P.; Liu, J.; Yu, F.; Sookhak, M.; Au, M.; Luo, X. A survey on access control in fog computing. IEEE Commun. Mag. 2018, 56, 144–149. [Google Scholar] [CrossRef]
Long, S.; Yan, L. RACAC: An Approach toward RBAC and ABAC Combining Access Control. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC 2019), Chengdu, China, 6–9 December 2019; pp. 1609–1616. [Google Scholar]
Servos, D.; Osborn, S. HGABAC: Towards a formal model of hierarchical attribute-based access control. In Proceedings of the Foundations and Practice of Security: 7th International Symposium (FPS 2014), Montreal, QC, Canada, 3–5 November 2014; pp. 187–204. [Google Scholar]
Aliane, L.; Adda, M. HoBAC: Toward a higher-order attribute-based access control model. Procedia Comput. Sci. 2019, 155, 303–310. [Google Scholar] [CrossRef]
Bethencourt, J.; Sahai, A.; Waters, B. Ciphertext-policy attribute-based encryption. In Proceedings of the 2007 IEEE Symposium on Security and Privacy (SP ’07), Oakland, CA, USA, 20–23 May 2007; pp. 321–334. [Google Scholar]
Kashmar, N.; Adda, M.; Ibrahim, H. HEAD metamodel: Hierarchical, extensible, advanced, and dynamic access control metamodel for dynamic and heterogeneous structures. Sensors 2021, 21, 6507. [Google Scholar] [CrossRef] [PubMed]
Yuan, E.; Tong, J. Attributed based access control (ABAC) for web services. In Proceedings of the IEEE International Conference on Web Services (ICWS 2005), Orlando, FL, USA, 11–15 July 2005; p. 569. [Google Scholar]
Coyne, E.; Weil, T. ABAC and RBAC: Scalable, Flexible, and Auditable Access Management. IT Prof. 2013, 15, 14–16. [Google Scholar] [CrossRef]
Hu, V.C.; Ferraiolo, D.; Kuhn, R.; Schnitzer, A.; Sandlin, K.; Miller, R.; Scarfone, K. Guide to Attribute Based Access Control (ABAC) Definition and Considerations; National Institute of Standards and Technology Special Publication 800-162; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2014. [Google Scholar]
Ubale, A.; Modani, G.; Apte, S. Analysis of DAC MAC RBAC access control based models for security. Int. J. Comput. Appl. 2014, 104, 6–13. [Google Scholar] [CrossRef]
Xu, Z.; Stoller, S. Mining attribute-based access control policies. IEEE Trans. Dependable Secur. Comput. 2014, 12, 533–545. [Google Scholar] [CrossRef]
Iyer, P.; Masoumzadeh, A. Mining Positive and Negative Attribute-Based Access Control Policy Rules. In Proceedings of the 23rd ACM on Symposium on Access Control Models and Technologies (SACMAT 2018), Indianapolis, IN, USA, 13–15 June 2018; pp. 161–172. [Google Scholar]
Cendrowska, J. PRISM: An algorithm for inducing modular rules. Int. J. Man-Mach. Stud. 1987, 27, 349–370. [Google Scholar] [CrossRef]
Cotrini, C.; Weghorn, T.; Basin, D. Mining ABAC Rules from Sparse Logs. In Proceedings of the 3rd IEEE European Symposium on Security and Privacy (EuroS&P 2018), London, UK, 24–16 April 2018; pp. 31–46. [Google Scholar]
Kavšek, B.; Lavrač, N. APRIORI-SD: Adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 2006, 20, 543–583. [Google Scholar] [CrossRef]
Abu Jabal, A.; Bertino, E.; Lobo, J.; Law, M.; Russo, A.; Calo, S.; Verma, D. Polisma-a framework for learning attribute-based access control policies. In Proceedings of the 25th European Symposium on Research in Computer Security (ESORICS 2020), Guildford, UK, 14–18 September 2020; pp. 523–544. [Google Scholar]
Karimi, L.; Aldairi, M.; Joshi, J.; Abdelhakim, M. An automatic attribute-based access control policy extraction from access logs. IEEE Trans. Dependable Secur. Comput. 2021, 19, 2304–2317. [Google Scholar] [CrossRef]
Davari, M.; Zulkernine, M. Mining attribute-based access control policies. In Proceedings of the International Conference on Information Systems Security (ICISS 2022), Tirupati, India, 16–20 December 2022; pp. 186–201. [Google Scholar]
Quan, S.; Zhao, Y.; Helil, N. Attribute-Based Access Control Policy Generation Approach from Access Logs Based on the CatBoost. Comput. Inform. 2023, 42, 615–650. [Google Scholar] [CrossRef]
Shang, S.; Wang, X.; Liu, A. ABAC policy mining method based on hierarchical clustering and relationship extraction. Comput. Secur. 2024, 139, 103717. [Google Scholar] [CrossRef]
Costa, L.; Oliveira, O.; Travieso, G.; Rodrigues, F.; Boas, P.; Antiqueira, L.; Viana, M.; da Rocha, L. Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications. Adv. Phys. 2011, 60, 329–412. [Google Scholar] [CrossRef]
Newman, M.E.J. The Structure and Function of Complex Networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
Li, H.; Zhao, C.; Liu, Y.; Zhang, X. Anomaly detection by discovering bipartite structure on complex networks. Comput. Netw. 2021, 190, 107899. [Google Scholar] [CrossRef]
Wang, P.; Wen, G.; Yu, X.; Yu, W.; Wan, Y. Synchronization of resilient complex networks under attacks. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1116–1127. [Google Scholar] [CrossRef]
Wen, G.; Yu, W.; Yu, X.; Lü, J. Complex cyber-physical networks: From cybersecurity to security control. J. Syst. Sci. Complex. 2017, 30, 46–67. [Google Scholar] [CrossRef]
Molloy, I.; Chen, H.; Li, T.; Wang, Q.; Li, N.; Bertino, E.; Calo, S.; Lobo, J. Mining roles with multiple objectives. ACM Trans. Inf. Syst. Secur. 2010, 13, 1–35. [Google Scholar] [CrossRef]
Yanez-Sierra, J. Distribution of Access Control Policies to Large Document Collections. Ph.D. Thesis, Centro de Investigación y Estudios Avanzados del Instituto Politécnico Nacional, Guadalajara, Mexico, 2020. [Google Scholar]
Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
Blondel, V.; Guillaume, J.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, 10, P100008. [Google Scholar] [CrossRef]
Newman, M.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [PubMed]
Hammer, B. Amazon.com—Kaggle Employee Access Challenge. Available online: https://kaggle.com/competitions/amazon-employee-access-challenge (accessed on 20 January 2025).
Stone, M. Cross-validation and multinomial prediction. Biometrika 1974, 61, 509–515. [Google Scholar] [CrossRef]

Figure 1. Proposed policy mining approach: conceptual methodology. mining approach.

Figure 2. Proposed ACM functional points.

Figure 3. Flow chart of the proposed RS stage in the PDP.

Figure 4. Frequency of access requests for resources. The dotted line splits more and less frequent resources.

Figure 5. Degree distribution of user networks. The vertical axis indicates the fraction of nodes that have degree k. The horizontal axis represents the different degrees’ values. The graph is presented on a logarithmic scale.

Figure 6. Distribution of number of resources accessed. The vertical axis indicates the number of resources accessed by each community. The horizontal axis represents a different community. Each graph shows the two thresholds for community classification.

Figure 7. Mined Mined HC ABAC policy rule network

G_{P}^{' H C}

.

| V_{P}^{' H C} | = 106

;

| E_{P}^{' H C} | = 1247

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Figure 7. Mined Mined HC ABAC policy rule network

G_{P}^{' H C}

.

| V_{P}^{' H C} | = 106

;

| E_{P}^{' H C} | = 1247

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Figure 8. Mined AMZ ABAC policy rule network

G_{P}^{' A M Z}

.

| V_{P}^{' A M Z} | = 189

,

| E_{P}^{' A M Z} | = 1366

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Figure 8. Mined AMZ ABAC policy rule network

G_{P}^{' A M Z}

.

| V_{P}^{' A M Z} | = 189

,

| E_{P}^{' A M Z} | = 1366

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Figure 9. Mined IoT ABAC policy rule network

G_{P}^{' I o T}

.

| V_{P}^{' I o T} | = 101

,

| E_{P}^{' I o T} | = 1722

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Figure 9. Mined IoT ABAC policy rule network

G_{P}^{' I o T}

.

| V_{P}^{' I o T} | = 101

,

| E_{P}^{' I o T} | = 1722

. The color of the nodes indicates the community where they were generated. The shape of the vertex represents the phase where they were generated. A circular shape means that the rule was generated at the policy rule extraction phase, and a triangular shape, at the policy refinement phase.

Table 1. State of the art ABAC policy mining approaches.

Authors	Data Application	Rule Extraction Approach	Negative Rule Extraction	Noisy Log	Sparse Log
Xu and Stoller, 2014 [16]	Access log.	Association rule mining algorithm.	✗	✓	✗
Iyer and Masoumzadeh, 2018 [17]	Access log.	Association rule mining algorithm.	✓	✗	✗
Cotrini et al., 2018 [19]	Access log.	Association rule mining algorithm.	✗	✗	✓
Jabal et al., 2020 [21]	Access log.	Association rule mining algorithm.	✗	✓	✗
Karimi et al., 2021 [22]	Clusters in access log.	Frequency of attribute–value tuples in a cluster.	✗	✓	✓
Davari et al., 2022 [23]	Clusters in access log.	Association rule mining algorithm.	✓	✓	✓
Quan et al., 2023 [24]	Clusters in access log.	CatBoost algorithm.	✓	✓	✓
Shang et al., 2024 [25]	Clusters in access log.	Hierarchical relation in a cluster.	✗	✓	✓

Table 2. Details of the datasets used.

Dataset	$\| A \|$	$\| V \|$	$\| L \|$	$\| L^{+} \|$	$\| L^{-} \|$
Healthcare (HC)	12	77	18 K	9 K	9 K
Amazon Kaggle (AMZ)	7	15 K	32 K	30 K	2 K
Smart Building IoT (IoT)	9	52	$49 K$ K	24 K	25 K

Table 3. Results at the end of each phase. Data preprocessing, network model, and community detection phase are shown as 1, 2, and 3, respectively.

Phase	HC	AMZ	IoT
1	$\| L^{+} \| = 2437$ , $\| U \| = 621, \| R \| = 211$	$\| L^{+} \| = 9504$ , $\| U \| = 5436, \| R \| = 150$	$\| L^{+} \| = 237,274$ , $\| U \| = 120, \| R \| = 990$
2	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 621$ , $\| E_{U - U} \| = 14,923$	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 5436$ , $\| E_{U - U} \| = 743,311$	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 120$ , $\| E_{U - U} \| = 2095$
3	Avg. $\| C_{G_{U - U}} \| = 42$ Avg. modularity: 0.475 Avg. S-type comms.: 4% Avg. M-type comms.: 28% Avg. F-type comms.: 68%	Avg. $\| C_{G_{U - U}} \| = 82$ Avg. modularity: 0.65 Avg. S-type comms.: 8% Avg. M-type comms.: 9% Avg. F-type comms.: 83%	Avg. $\| C_{G_{U - U}} \| = 49$ Avg. modularity: 0.28 Avg. S-type comms.: 8% Avg. M-type comms.: 18% Avg. F-type comms.: 74%

Table 4. Results of the quality and complexity of the policy in the phases of rule inference and policy refinement.

Dataset	Phase	FN	FP	Precision	Recall	F-Score	$\| P \|$	$WSC$
HC	4	672	17	0.9899	0.7129	0.8289	41	231
	5	63	10	0.9956	0.9730	0.9842	90	389
AMZ	4	511	30	0.9966	0.9462	0.9707	82	274
	5	134	11	0.9988	0.9858	0.9923	156	564
IoT	4	3681	2760	0.7899	0.8337	0.8112	49	203
	5	1255	1402	0.9300	0.9207	0.9254	101	461

Table 5. Comparative analysis of policy complexity and quality of the mined policy with previous work on the mining of ABAC policies from the access log. The ‘-’ symbol represents that it was not reported by the authors.

$π$	Mining Approach	F-Score	$\| P \|$	$WSC$
$π_{H C^{'}}$	Our approach	0.9842	90	389
	Karimi et al. [22]	0.62	15	67
$π_{A M Z^{'}}$	Our approach	0.9923	156	564
	Quan et al. [24]	0.9951	-	-
	Karimi et al. [22]	0.97	20	44
	Cotrini et al. [19]	0.8495	1055	2431
	Jabal et al. [21]	0.8600	-	-

Table 6. Complex network properties of HC user network.

Property	Value
1. $n = \| V \|$ , $m = \| E \|$ commonly in the order of thousands or millions.	$n = 621$ , $m = 14,923$
2. Low average degree $〈 k 〉 < < n$ .	$k = 48.0612 < < 621$
3. Low density $d < < 1$ .	$d = 0.0775 < < 1$
4. Scale-free distribution $P (k) \approx k^{- α}$ .	See Figure 5a.
5. Low average path length $(L) < < n$ .	$L = 2.3625 < < 621$
6. High average cluster coefficient $1 / n < < (C) < 1$ .	$0.0016 < < C = 0.6238 < 1$

Table 7. Complex network properties of AMZ user network.

Property	Value
1. $n = \| V \|$ , $m = \| E \|$ commonly in the order of thousands or millions.	$n = 5436$ , $m = 743,311$
2. Low average degree $〈 k 〉 < < n$ .	$k = 273.4772 < < 5436$
3. Low density $d < < 1$ .	$d = 0.0503 < < 1$
4. Scale-free distribution $P (k) \approx k^{- α}$ .	See Figure 5b.
5. Low average path length $(L) < < n$ .	$L = 2.4426 < < 5435$
6. High average cluster coefficient $1 / n < < (C) < 1$ .	$0.0002 < < C = 0.8424 < 1$

Table 8. Complex network properties of IoT user network.

Property	Value
1. $n = \| V \|$ , $m = \| E \|$ commonly in the order of thousands or millions.	$n = 120$ , $m = 1380$
2. Low average degree $〈 k 〉 < < n$ .	$k = 23 < < 120$
3. Low density $d < < 1$ .	$d = 0.1933 < < 1$
4. Scale-free distribution $P (k) \approx k^{- α}$ .	See Figure 5c.
5. Low average path length $(L) < < n$ .	$L = 1.5 < < 120$
6. High average cluster coefficient $1 / n < < (C) < 1$ .	$0.0083 < < C = 0.8357 < 1$

Table 9. Results of policy complexity and quality with noisy and partial versions of the datasets.

Dataset	F-Score	$\| P \|$	$WSC$
HC	0.9837	91	391
Noisy HC (10%)	0.8019	89	379
Partial HC (90%)	0.8408	88	383
AMZ	0.9918	155	567
Noisy AMZ (10%)	0.8115	146	538
Partial AMZ (90%)	0.8462	152	592
IoT	0.9249	48	205
Noisy IoT (10%)	0.7827	46	190
Partial IoT (90%)	0.8113	45	206

Table 10. Analysis of policy complexity and quality employing diverse threshold values for community classification. The values in bold represent the best results for each dataset.

Dataset	$t_{1}$	$t_{2}$	F-Score	$\| P \|$	$WSC$
HC	0.4	0.15	0.9822	95	407
	0.5	0.25	0.9842	90	389
	0.6	0.35	0.9854	93	396
AMZ	0.4	0.15	0.9839	143	569
	0.5	0.25	0.9923	156	564
	0.6	0.35	0.9862	139	550
IoT	0.4	0.15	0.9133	45	199
	0.5	0.25	0.9254	49	203
	0.6	0.35	0.9254	46	207

Table 11. Evaluation of policy complexity and quality within a random network structure. The values in bold represent the best results for each dataset.

Dataset	Network	F-Score	$\| P \|$	$WSC$
HC	Our model	0.9838	89	387
	Random	0.8059	96	411
AMZ	Our model	0.9919	157	562
	Random	0.7115	148	594
IoT	Our model	0.9247	48	205
	Random	0.8208	44	186

Table 12. Time complexity of all phases and tasks in our approach. Notation: L is the access log,

O (| L^{+} |)

is the positive access log, A is the set of attributes,

V_{U}

is the set of user nodes in the ARBN,

V_{U - U}

is the set of user nodes in the UN, R is the set of resources,

P

is the initial set of rules,

P^{'}

is the refined set of rules, FN is the set of false negatives, and FP is the set of false positives.

Table 12. Time complexity of all phases and tasks in our approach. Notation: L is the access log,

O (| L^{+} |)

is the positive access log, A is the set of attributes,

V_{U}

is the set of user nodes in the ARBN,

V_{U - U}

is the set of user nodes in the UN, R is the set of resources,

P

is the initial set of rules,

P^{'}

is the refined set of rules, FN is the set of false negatives, and FP is the set of false positives.

Phase	Task	Time Complexity
Data Preprocessing	Null values	$O (\| L \| \times \| A \|)$
	Continuous to categorical	$O (\| L \| \times \| A \|)$
	Remove duplicates	$O (\| L \|)$
	Frequent resources	$O (\| L \|)$
Network Model	ARBN	$O (\| L^{+} \|)$
	UN	$O (\| V_{U} \|)$
Community Detection	Community detection	$O (\| V_{U - U} \| \times log \| V_{U - U} \|)$
	Community classification	$O (\| U \| \times \| L) \|$
Rule Mining	Rule inference	$O (\| V_{U - U} \| \times \| R \|)$
	Rule network	$O (\| P \|^{2})$
Policy Refinement	FN refinement	$O (\| P \| \times \| F N \|)$
	FP refinement	$O (\| P \| \times \| F P \|)$
	Rule network	$O (\| P^{'} \|^{2})$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Díaz-Rodríguez, H.; Díaz-Pérez, A. ABAC Policy Mining Using Complex Network Analysis Techniques. Appl. Sci. 2025, 15, 12571. https://doi.org/10.3390/app152312571

AMA Style

Díaz-Rodríguez H, Díaz-Pérez A. ABAC Policy Mining Using Complex Network Analysis Techniques. Applied Sciences. 2025; 15(23):12571. https://doi.org/10.3390/app152312571

Chicago/Turabian Style

Díaz-Rodríguez, Héctor, and Arturo Díaz-Pérez. 2025. "ABAC Policy Mining Using Complex Network Analysis Techniques" Applied Sciences 15, no. 23: 12571. https://doi.org/10.3390/app152312571

APA Style

Díaz-Rodríguez, H., & Díaz-Pérez, A. (2025). ABAC Policy Mining Using Complex Network Analysis Techniques. Applied Sciences, 15(23), 12571. https://doi.org/10.3390/app152312571

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Phase	HC	AMZ	IoT
1	$\| L^{+} \| = 2437$ , $\| U \| = 621, \| R \| = 211$	$\| L^{+} \| = 9504$ , $\| U \| = 5436, \| R \| = 150$	$\| L^{+} \| = 237,274$ , $\| U \| = 120, \| R \| = 990$
2	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 621$ , $\| E_{U - U} \| = 14,923$	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 5436$ , $\| E_{U - U} \| = 743,311$	$G_{U - U} = (V_{U - U}, E_{U - U})$ where $\| V_{U - U} \| = 120$ , $\| E_{U - U} \| = 2095$
3	Avg. $\| C_{G_{U - U}} \| = 42$ Avg. modularity: 0.475 Avg. S-type comms.: 4% Avg. M-type comms.: 28% Avg. F-type comms.: 68%	Avg. $\| C_{G_{U - U}} \| = 82$ Avg. modularity: 0.65 Avg. S-type comms.: 8% Avg. M-type comms.: 9% Avg. F-type comms.: 83%	Avg. $\| C_{G_{U - U}} \| = 49$ Avg. modularity: 0.28 Avg. S-type comms.: 8% Avg. M-type comms.: 18% Avg. F-type comms.: 74%

Article Menu

ABAC Policy Mining Using Complex Network Analysis Techniques

Abstract

1. Introduction

2. Fundamental Principles

2.1. ABAC Model

2.2. Challenges and Requirements

2.3. Policy Mining Algorithms

2.4. Evaluation Metrics

3. Policy Mining Proposed Approach

3.1. Data Preprocessing

3.2. Network Model

3.2.1. Access Request Bipartite Network

3.2.2. User Network

3.3. Community Detection

3.3.1. Community Detection

3.3.2. Community Classification

3.4. Policy Rule Extraction

3.4.1. Rule Inference

3.4.2. Rule Network Modeling

3.5. Policy Refinement

4. Access Decision

5. Results

5.1. Datasets

5.2. Experimental Setup

5.3. General Results

5.4. Detailed Results

5.4.1. HC

5.4.2. AMZ

5.4.3. IoT

5.5. Evaluation Under Data and Structural Variations

5.6. Computational Efficiency

6. Conclusions and Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI