Access Control Role Evolution Mechanism for Open Computing Environment

Data resources in open computing environments (including big data, internet of things and cloud computing) are characterized by large scale, wide source, and strong dynamics. Therefore, the user-permission relationship of open computing environments has a huge scale and will be dynamically adjusted over time, which enables effective permission management in the role based access control (RBAC) model to become a challenging problem. In this paper, we design an evolution mechanism of access control roles for open computing environments. The mechanism utilizes the existing user-permission relationship in the current system to mine the access control role and generate the user-role and role-permission relationship. When the user-permission relationship changes, the roles are constantly tuned and evolved to provide role support for access control of open computing environments. We propose a novel genetic-based role evolution algorithm that can effectively mine and optimize roles while preserving the core permissions of the system. In addition, a role relationship aggregation algorithm is proposed to realize the clustering of roles, which provides a supplementary reference for the security administrator to give the role real semantic information. Experimental evaluations in real-world data sets show that the proposed mechanism is effective and reliable.


Introduction
Open computing environments (including big data [1], internet of things [2] and cloud computing [3]) provide us convenient services such as data sharing and effective computing. It has been widely used in human's production and life. By analyzing and utilizing data resources in open computing environments, we can create enormous social and economic value [4]. Furthermore, the greater amount of data and the wider sources, the more value is generated. However, open computing environments also face serious security challenges when bringing new development opportunities. Various types of security accidents occur frequently [5], such as Facebook data leakage results in the illegal access of more than 50 million users' personal data. Therefore, the unauthorized sharing of data resources will bring huge security threats to users' data. Realizing the safe and controllable sharing of data resources is the basis of application and development in open computing environments. As one of the important technologies to protect data security, access control technology [6] can enable the authorized users to access the corresponding resources in the system according to their permissions, and prohibit unauthorized users access to the data. Among them, role-based access control (RBAC) technology has become a popular and effective access control method.
RBAC (role-based access control) [7,8] uses the concept of role to establish associations between users and permissions. By establishing user-role and role-permission relationships, RBAC can achieve security protection of resources. It reduces the complexity of access control management by granting roles and revoking roles to manage user permissions. Since the role is the core of access control, how to set the correct role in the system and give it the appropriate permissions becomes the basis for implementing RBAC system. The advantages of RBAC can only be realized when the role set is suitable for the security needs of organization. To solve this problem, role engineering technology [9][10][11] emerged whose purpose is to get a set of roles that can accurately describe the security requirements and functional requirements of system. However, the data resources in open computing environments have features such as large volume, wide sources and strong dynamics [12,13], which can make data resources in open computing environments and traditional data resources have different application requirements and challenges in role engineering [14,15].
Current role engineering mainly includes top-down [16] mode and bottom-up mode [17,18]. The top-down mode relies on the professional knowledge of security experts to obtain a set of roles and corresponding role relationships by manual analysis. In a closed computing environment, it is safe and feasible to perform manual role management in the face of limited data resources. However, in an open computing environment, the role management of massive and dynamic data resources with the help of professional knowledge is a labor-intensive task, and its workload is huge and error-prone. It is easy to cause excessive authorization and insufficient authorization, which affects the security and availability of the system. At the same time, with the dynamic changes of data resources, the role of access control also is required to change dynamically. Therefore, it is necessary to implement the role evolution. Therefore, in an open computing environment, role management for data resources requires automatic capabilities to improve the management efficiency of dynamic permissions.
Different from the top-down mode, the bottom-up mode utilizes data mining technology to analyze the existing access control information (user-permission relationship) in the system, so as to realize the automatic generation of role set. It reduces the manual dependence of security experts, which is also known as role mining technology [17]. Essentially, role mining is a decomposition problem of a Boolean matrix. The user-permission matrix is divided into user-role matrix and role-permission matrix. It includes precise role mining and approximate role mining. To cover all user-permission relationships, traditional precise role mining technology leads to the excessive number of roles, low efficiency, and lack of dynamic adjustment capabilities. Research [19] shows that about 40% of the roles in the role set can achieve the coverage of 90% user-permission relationship. So approximate role mining can better meet the needs of open computing environment for role mining. However, current approximate role mining methods [20][21][22] have the risk of losing the core role in the system. It will lead to the failure of the relational business process in the system, which will affect the system's availability. In addition, current role mining methods are mostly static mining methods which are unable to meet the need of dynamic role evolution in an open computing environment. So it is difficult to balance the security and availability of the system.
Role mining is an NP-hard problem [18]. So in terms of performance, it is not acceptable to traverse all the solutions to find the optimal role set. It is very necessary to find an optimization method to get an approximate optimal solution in a finite time and to make the role have the ability of dynamical evolution. To solve the above problems, this paper proposes an access control role evolution mechanism for open computing environment, which can reduce the solving cost of space search significantly. Further, it gives the role the ability of dynamical evolution with the dynamic change of data resources. The contributions of the paper include: (1) We propose a role evolution method based on genetic algorithm, which includes role mining and role optimization. The multi-dimensional mixed evaluation index is used to guide the role mining process, so that the security and availability of system can be considered at the same time. According to the dynamic change of the current user-permission relationship, the role set is adjusted to realize the role evolution periodically.
(2) We propose the concept of permission structure complexity (PSC) to evaluate the importance of permission and generate core permissions. A role optimization algorithm is designed to avoid the loss of core permissions in the role mining, which ensures the normal running of the business system.
(3) We design a role relationship aggregation algorithm to cluster the roles by analyzing the user and permission relationship of the roles. It can establish the semantic relationship among roles, guide the generation of roles in the real environment, and give semantic meaning to the role group.
The remainder of this paper is organized as follows. We review the related work in Section 2. Section 3 introduces preliminary knowledge and genetic algorithm. Section 4 proposes role evolution mechanism. The role evolution method based on genetic algorithm is elaborated in Section 5. The role relationship aggregation algorithm is elaborated in Section 6. Experimental evaluation of the proposed mechanism is discussed in Section 7. Finally, we summarize the paper and provide directions for future research.

Related Work
Kuhlmann et al. [17] firstly proposed the concept of role mining and used matrix decomposition to solve the problem. Lu et al. [23] also turned role mining problem into the optimal decomposition problem of the Boolean matrix, and used integer linear programming (ILP) to achieve role mining. Sarana et al. [24] introduced separation of duty into the role mining, using minimum biclique cover (MBC) to achieve role mining. Zhang et al. [25] designed a role mining model based on concept lattice, and found the minimum role set through the greedy algorithm of role substitution. Zhou et al. [26] proposed a semantic role mining algorithm based on formal concept analysis. It generated user's concept lattice of permission and concept lattice of attribute by calculating access control information. Then it assigned roles based on similarity analysis between concept lattices. Dong et al. [27] used bipartite networks to find roles, and proposed a method to evaluate the importance of edges in bipartite networks. It can eliminate inappropriate edges and improved the quality of generated roles. Vavilis et al. [28] studied the minimal noise role mining problem and the multiple factor optimization role mining problem to mine roles from access control logs with noise information.
To improve the efficiency of role mining, Harikat et al. [29] proposed a concurrent role mining framework to find roles under the condition of role-usage and permission-distribution cardinality constraints. For TRBAC (temporal role-based access control), Mitra et al. [30] introduced the cumulative overhead of temporal roles and permissions, using the greedy algorithm to implement the role management. Stoller et al. [31] proposed an algorithm for mining high-quality TRBAC roles from timed ACLs (Access Control Lists). The algorithm described the relationship among roles by attribute information. Narouei et al. [32] proposed a novel top-down role engineering approach that used natural language processing techniques to extract roles from documents. Kumar et al. [33] proposed a constrained role mining scheme (CRM). The scheme satisfies a cardinality condition that no role can contain more than a given number of permissions. Literature [34] proposed a prioritization method, PairCount (PC), for the role mining problem. By calculating the frequency of permissions shared by users in different roles, the priority of different roles is set to optimize the process of role mining. Vaidya et al. [35] used the subset enumeration (CM) method to design the role mining algorithm. Common permissions could exist among different roles, which met the need for overlapping permissions between different roles. Zhang et al. [36] used graph optimization (GO) theory to optimize the process of role mining to reduce the management complexity of RBAC system. Literature [37] proposed a hierarchical miner (HM). The HM is based on formal concept lattices and user-attribute information. It can balance the semantic guarantee of roles with system complexity.
In view of the difficulty in synchronous optimization for role minimization and edge concentration, Dong et al. [38] proposed a data-centric quality evaluation algorithm (DCQE), which can predict the quality of role based on the statistical characteristics of the ACL dataset. DCQE didn't need to run any role mining algorithms. Since the existing role mining method does not consider existing roles in the system, Zhai et al. [39] optimized the role mining process by calculating the similarity between the Electronics 2020, 9, 517 4 of 18 newly generated role set and the original role set. For the problem of role mining under incomplete knowledge condition, Kunz et al. [40] studied the quality criteria and feature dependence of role mining technology from 22 dimensions. Blundo et al. [41] constrained the role mining process by the number of contained permissions in the role and the number of contained in the user. Pan et al. [42] proposed a log-based role reconstruction method, which generated more targeted roles based on historical access behavior. Li et al. [43] proposed a role mining method based on fermat transformation theory and set theory for the problem of external behavior invariance and evolve-ability of legacy systems. Hachana et al. [44] studied the comparison methods among different role sets, which effectively guided security administrators to select the role set. The method can detect the misconfigured roles through the comparison.
There were also some researches [45,46] that used genetic algorithms to solve the problem of role mining, these researches only pursued the reduction of the role number. However, they ignored the consideration of other evaluation indicators. In addition, since genetic algorithm is a heuristic algorithm, the method belongs to the approximate role mining. There is a risk of losing the core permissions in the system, which will directly affect the normal running of the business system. As a result, it is difficult to directly apply those methods to the business environment. It is adapted to the actual deployment of role-based access control. In view of the shortcomings of the above methods, we use the genetic algorithm to solve the problem of role evolution and introduce the concept of permission structure complexity (PSC). When ensuring that the core permission is not lost, we optimize the performance of role mining from multiple dimensions, reduce the number of generated roles, and take into account the security and availability of the system. In addition, with the help of the role relationship aggregation algorithm, the relationship among different roles is established to provide effective support for role management.

Terms and Definitions
This section presents preliminaries on RBAC along with the terms used in the paper. Definition 1. RBAC model: It is described as a quad (U, R, P, URA, PRA), U represents the user set, R represents the role set, P represents the permission set, URA represents the user-role relationship, and PRA represents the role-permission relationship. Definition 2. User-permission relationship: It is described as an f × l Boolean matrix UPM, f represents the number of users, and l represents the number of permissions. If UPM(u i ,p j ) = 1, it means that the user u i is granted the permission p j . If UPM(u i ,p j ) = 0, it means that the user u i is not granted the permission p j . The UPM can be generated according to access control policy of system. Definition 3. User-role relationship: It is described as an f × k Boolean matrix URM, f is the number of users and k is the number of roles. If URM(u i ,r j ) = 1, it means that the user u i is granted the role r j . If URM(u i ,r j ) = 0, it means that the user u i is not granted the role r j . Definition 4. Role-permission relationship: It is described as a k × l Boolean matrix RPM, k is the number of roles and l is the number of permissions. If RPM (r i , p j ) = 1, it means that the role r i is granted the permission p j . If RPM (r i , p j ) = 0, it means that the role r i is not granted the permission p j .

Definition 5.
Basic role mining problem: Given the user set U, the permission set P, and the user-permission relationship UPM, find a role set R, a user-role relationship URM, and a role-permission relation RPM, UPM = URM × RPM is satisfied, and the number of roles k is minimized. Definition 6. Approximate role mining problem: Given the user set U, the permission set P, and the user-permission relationship UPM, find a role set R, a user-role relationship URM, and a role-permission relationship RPM, ||URM×RPM -UPM| |≤ δ•||UPM|| is satisfied (δ is an approximation coefficient), and the number of roles k is minimized. Definition 7. Dynamic role reconstruction problem: Given the role set R cur , the user-role relationship URM cur , the role-permission relationship RPM cur , and new user-permission relationship UPM new , find new role set R new , new user-role relationship URM new , and new role-Permission relationship RPM new , ||URM×RPM-UPM || ≤ δ•||UPM|| is satisfied, and the number of roles k is minimized.

Genetic Algorithm
Genetic algorithm is a heuristic computational model that is influenced by natural evolutionary ideas and biogenetic mechanisms. The optimal solution of the problem is searched by simulating the natural evolution of biological population. The genetic algorithm includes many concepts similar to biology, such as individuals, populations, genes, fitness functions, selection, crossover, and mutation. Among them, the individual is an independent entity consisting of the encoded genes. The chromosomes of each individual is a candidate solution. The solution performance can be evaluated to guide the direction of genetic evolution by the fitness function. A genotype is an internal representation of a chromosome. The value of a gene is known as an allele. The phenotype is the external representation of the individual's chromosome which represents the candidate solution. Finding the correct phenotype is the key to solve a specific task and the basis for genetic optimization. In this paper, roles are represented by genes, and role model is represented by the individual.
Genetic evolution is a process in which the population gradually adapts to the living environment and the quality is continuously improved. It includes selection operators, crossover operators, and mutation operators. In the process of genetic evolution, the parent's selection operator is to select individuals in the population and is the seed of the next generation of reproduction. In general, individuals with better fitness will be more likely to be selected, thereby further enhancing the ability of the population. However, individuals with poor fitness also have the opportunity to be selected, and their genes will have the opportunity to be passed on to the next generation. This will avoid the search mechanism being too greedy and avoid falling into local optimal solutions. The crossover operator generates a new offspring by crossing the genes which are selected from the parent. The mutation operator is responsible for creating new individuals based on existing individuals in the current population, thereby discovering new search space. Every new individual is called an offspring or a new solution. The fitness function is used to determine the pros and cons of individuals in the population. The genetic algorithm flow is shown in Figure 1.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 18 natural evolution of biological population. The genetic algorithm includes many concepts similar to biology, such as individuals, populations, genes, fitness functions, selection, crossover, and mutation. Among them, the individual is an independent entity consisting of the encoded genes. The chromosomes of each individual is a candidate solution. The solution performance can be evaluated to guide the direction of genetic evolution by the fitness function. A genotype is an internal representation of a chromosome. The value of a gene is known as an allele. The phenotype is the external representation of the individual's chromosome which represents the candidate solution.
Finding the correct phenotype is the key to solve a specific task and the basis for genetic optimization.
In this paper, roles are represented by genes, and role model is represented by the individual. Genetic evolution is a process in which the population gradually adapts to the living environment and the quality is continuously improved. It includes selection operators, crossover operators, and mutation operators. In the process of genetic evolution, the parent's selection operator is to select individuals in the population and is the seed of the next generation of reproduction. In general, individuals with better fitness will be more likely to be selected, thereby further enhancing the ability of the population. However, individuals with poor fitness also have the opportunity to be selected, and their genes will have the opportunity to be passed on to the next generation. This will avoid the search mechanism being too greedy and avoid falling into local optimal solutions. The crossover operator generates a new offspring by crossing the genes which are selected from the parent. The mutation operator is responsible for creating new individuals based on existing individuals in the current population, thereby discovering new search space. Every new individual is called an offspring or a new solution. The fitness function is used to determine the pros and cons of individuals in the population. The genetic algorithm flow is shown in Figure 1.  First, the initialization of population is performed. A certain number of individuals are randomly generated. Then the best individual is picked and placed in the initial population. This process is iterated until the number of individuals in the initial population reaches a predetermined scale. After that, the individual's fitness is calculated, which is specified according to the actual proximity of the problem solution. Then the next generation of populations is generated by the breeding process, which includes gene selection, crossover, and mutation. If the new generation population satisfies the abort condition, genetic algorithm is aborted. If the abort condition is not satisfied, new generation population is iteratively calculated until the abort condition is met. Finally, we output the final result. First, the initialization of population is performed. A certain number of individuals are randomly generated. Then the best individual is picked and placed in the initial population. This process is iterated until the number of individuals in the initial population reaches a predetermined scale. After that, the individual's fitness is calculated, which is specified according to the actual proximity of the problem solution. Then the next generation of populations is generated by the breeding process, which includes gene selection, crossover, and mutation. If the new generation population satisfies the abort condition, genetic algorithm is aborted. If the abort condition is not satisfied, new generation population is iteratively calculated until the abort condition is met. Finally, we output the final result.

Role Evolution Mechanism
The structure of role evolution mechanism is shown in Figure 2, which includes the static role mining and dynamic role reconstruction. The input of role mining is the user-permission relationship UPM. After the role is initialized, the initial role population R_POP sta is obtained. The input of role reconstruction is the existing role model (U cur , R cur , P cur , URA cur , PRA cur ) in the system, and the current role population R_POP dyn is obtained through the role encoding. R_POP sta or R_POP dyn is input into the role evolution process, and the new role set RoleModel post is obtained after the pre-processing algorithm, role evolution method based on genetic algorithm and post-processing algorithm. We input RoleModel post into the role application process, and implement access control on data resources after performance evaluation and role relationship assignment. The core work of this paper is the gray background part of the Figure 2.   The core algorithms in role mining and role reconstruction are the same which are all role evolution method based on genetic algorithm. However, the starting point for their evolution is different. For role mining, the starting point of evolution is a randomly initialized role set. For role reconstruction, the starting point of evolution is the existing role set in the system. Therefore, role evolution is not only applied to the initialization of role model, but to the full lifecycle of role management throughout the access control run phase. The role set is periodically evolved according to the change of the current system's permission status, so that the role set is continuously optimized. There are time series relationships among role sets, as shown in Figure 3.

Role Evolution Method Based on Genetic Algorithm
This section describes the important concepts and algorithm which are involved in the role evolution method. It includes core permission evaluation, encoding and decoding of role genes, calculation operators and role optimization and so on. The core algorithms in role mining and role reconstruction are the same which are all role evolution method based on genetic algorithm. However, the starting point for their evolution is different. For role mining, the starting point of evolution is a randomly initialized role set. For role reconstruction, the starting point of evolution is the existing role set in the system. Therefore, role evolution is not only applied to the initialization of role model, but to the full lifecycle of role management throughout the access control run phase. The role set is periodically evolved according to the change of the current system's permission status, so that the role set is continuously optimized. There are time series relationships among role sets, as shown in Figure 3.   The core algorithms in role mining and role reconstruction are the same which are all role evolution method based on genetic algorithm. However, the starting point for their evolution is different. For role mining, the starting point of evolution is a randomly initialized role set. For role reconstruction, the starting point of evolution is the existing role set in the system. Therefore, role evolution is not only applied to the initialization of role model, but to the full lifecycle of role management throughout the access control run phase. The role set is periodically evolved according to the change of the current system's permission status, so that the role set is continuously optimized. There are time series relationships among role sets, as shown in Figure 3.

Role Evolution Method Based on Genetic Algorithm
This section describes the important concepts and algorithm which are involved in the role evolution method. It includes core permission evaluation, encoding and decoding of role genes, calculation operators and role optimization and so on.

Role Evolution Method Based on Genetic Algorithm
This section describes the important concepts and algorithm which are involved in the role evolution method. It includes core permission evaluation, encoding and decoding of role genes, calculation operators and role optimization and so on.

Core Permission Evaluation
The lack of core permissions will prevent the proper running of the business system. The goal of role evolution is intended to cover as many user-permission relationships as possible. For a permission, during the role evolution process, if the quantity of users who have the permission is small, the possibility that the permission is discarded will be greater. In general, the importance of permissions is inversely proportional to the coverage of permissions in access control systems. The more important the permission, the fewer users will have the permission so that the permission is not misused. For the permission owned by many users, on the one hand, the significance of this permission is relatively low. On the other hand, this permission is hard to lose during role evolution. Even if one user does not get the permission, many other users do, thus ensuring the availability of the system. Therefore, the goal of the core permission evaluation is to find those permissions that cover fewer users and are more easily discarded. We evaluate the permissions by permission structure complexity (PSC) to generate the core permissions. The calculation method of PSC is shown in Equation (1), where m i,j is the value of the i-th row and the j-th column in the user-permission relationship UPM of f × l. α1 and α2 are weights, and thd is a complexity threshold. When the PSC of permission p j is less than thd, we consider p j to be the core permission.
where f i=1 m i,j is the number of users with permission p j . The smaller the value, the less users have permission p j . We think that the occurrences number of this permission is small and the more important.
f i=1 l k=1 m i,j · m i,k is the number of permissions that are owned by users with permissions p j . The smaller its value, the greater the proportion of the permission p j , the more important. Therefore, the lower the PSC, we consider the permission is the more important.

Encoding and Decoding of Role Genes
Encoding is the mapping of an individual from phenotype to genotype, which transforms the external representation of an individual into a genetic feature. Decoding is the mapping of an individual from genotype to phenotype, which transforms the individual's genetic feature into an external representation. A two-dimensional array [UR, RP] is used to encode the role genes, UR is the granted user-role relationship, RP is the granted role-permission relationship. The role gene is a basic unit of the role model which represents a role, as shown in Figure 4. permission is relatively low. On the other hand, this permission is hard to lose during role evolution. Even if one user does not get the permission, many other users do, thus ensuring the availability of the system. Therefore, the goal of the core permission evaluation is to find those permissions that cover fewer users and are more easily discarded. We evaluate the permissions by permission structure complexity (PSC) to generate the core permissions. The calculation method of PSC is shown in Equation (1), where mi,j is the value of the i-th row and the j-th column in the user-permission relationship UPM of f × l. α1 and α2 are weights, and thd is a complexity threshold. When the PSC of permission pj is less than thd, we consider pj to be the core permission.
where , 1 pj. The smaller its value, the greater the proportion of the permission pj, the more important. Therefore, the lower the PSC, we consider the permission is the more important.

Encoding and Decoding of Role Genes
Encoding is the mapping of an individual from phenotype to genotype, which transforms the external representation of an individual into a genetic feature. Decoding is the mapping of an individual from genotype to phenotype, which transforms the individual's genetic feature into an external representation. A two-dimensional array [UR, RP] is used to encode the role genes, UR is the granted user-role relationship, RP is the granted role-permission relationship. The role gene is a basic unit of the role model which represents a role, as shown in Figure 4.

Selection, Ccrossover and Mutation of Role Genes
The operators include selection operator, crossover operator and mutation operator in the role evolution method.
(1) Selection operator: Select adaptive individuals from the population to produce the next generation. After several generations of evolution, the differences among individual chromosomes will reduce, that can make the population lose the diversity of individuals. In order to solve the problem, we use the Roulette Wheel Selection method to randomly select the individuals to be combined. The basic idea is that the selected probability of each individual is proportional to the size of its fitness. The calculation method is shown in Equation (2).

Selection, Ccrossover and Mutation of Role Genes
The operators include selection operator, crossover operator and mutation operator in the role evolution method.
(1) Selection operator: Select adaptive individuals from the population to produce the next generation. After several generations of evolution, the differences among individual chromosomes will reduce, that can make the population lose the diversity of individuals. In order to solve the problem, Electronics 2020, 9, 517 8 of 18 we use the Roulette Wheel Selection method to randomly select the individuals to be combined. The basic idea is that the selected probability of each individual is proportional to the size of its fitness. The calculation method is shown in Equation (2).
where f (x k ) is the fitness of the k-th individual, and P(k) is the probability that the k-th individual is selected by the selection operator.
(2) Crossover operator: It includes single point crossover and multiple point crossover, as shown in Figure 5. Single point crossover randomly selects a crossover point. Then genes are exchanged between chromosomes which are located in front and back of the crossover point. Finally, the new offspring will be generated. For example, in Figure 5a Figure  6 shows chromosome-level mutation, which includes the increase of role genes and the deletion of role genes.   Figure 7 shows the gene-level mutation, which includes the increase, deletion and modification of both the user-role relationship and the role-permission relationship.  (3) Mutation operator: It includes chromosome-level mutation and gene-level mutation. Figure 6 shows chromosome-level mutation, which includes the increase of role genes and the deletion of role genes.    Figure  6 shows chromosome-level mutation, which includes the increase of role genes and the deletion of role genes.   Figure 7 shows the gene-level mutation, which includes the increase, deletion and modification of both the user-role relationship and the role-permission relationship.   Figure 7 shows the gene-level mutation, which includes the increase, deletion and modification of both the user-role relationship and the role-permission relationship.   Figure 7 shows the gene-level mutation, which includes the increase, deletion and modification of both the user-role relationship and the role-permission relationship.

Evaluation Indicators and Fitness Calculation
The evaluation indicators include the number of obtained roles |R|, the number of user-role assignment relationships |UR|, the number of role-permission assignment relationships |RP|, the accuracy of role evolution Pe, the confidentiality indicator CI, and the availability indicator AI.

Evaluation Indicators and Fitness Calculation
The evaluation indicators include the number of obtained roles |R|, the number of user-role assignment relationships |UR|, the number of role-permission assignment relationships |RP|, the accuracy of role evolution Pe, the confidentiality indicator CI, and the availability indicator AI.
As shown in Equations (3) and (4), the accuracy of role evolution (Pe) is used to evaluate the degree of consistency between the new obtained user-permission relationship by the mapping of evolved user-role-permission relationship and the user-permission relationship before evolution.
UP new is the new user-permission relationship matrix, and UP old is the original user-permission relationship matrix. The bigger Pe is, the better the algorithm performance is.
The confidentiality indicator (CI) (as shown in Equation (5)) is used to determine whether a permission leak has occurred. When a permission does not exist in the original user-permission relationship, instead existing in the new user-permission relationship, we believe that the permission leak has occurred.
N EM(i,j)=1 is the number of leaked permissions. The smaller the CI is, the better the algorithm performance is.
As shown in Equation (6), the availability indicator (AI) is used to evaluate the availability of evolutionary results. When a permission exists in the original user-permission relationship, and does not exist in the new user-permission relationship, we believe that the permission is discarded. The availability of system is affected. The smaller the AI is, the better the algorithm performance is.
Different evaluation indicators can evaluate the effects of role evolution from different dimensions. In fact, role evolution problem is a multi-objective optimization problem. Based on the importance of different evaluation indicators, we use the weight coefficient method to set the weight value ω i for each sub-goal, and the linear weighted summation of each sub-object is the fitness function. As shown Electronics 2020, 9, 517 10 of 18 in Equation (7), the multi-objective optimization problem is transformed into a single-objective optimization problem, which is a minimization problem.
Through the adjustment of the parameters, relevant evolution parameter ω i can be optimized according to the intention of security administrator, so that the role evolution is more targeted and the effect of the access control role management is improved.

Role Optimization
There may be redundant roles in the evolved roles. Redundant roles are deleted through role consolidation which reduces the number of roles, as shown in Equation (8).

Description of Algorithm
Role evolution process includes three core algorithms, which are preprocessing algorithm, role evolution algorithm and post-processing algorithm (as shown in Figure 2).
(1) Preprocessing algorithm is used to evaluate the importance degree of the permission, generate the core permission, and initialize the population of role model. The preprocessing algorithm consists of two core steps, namely the population initialization of the role model and the calculation of the core permission. The way to initialize the population of role model is to generate InitNum role genes by random method, and the corresponding user and permission codes of each role gene are also assigned randomly. The core permissions are calculated by calculating the PSC value (shown in Equation (1)) of the permissions. The pseudo code of the algorithm is shown in Algorithm 1, as follows. TempRole.Users.append(Random (1,UPM.UserNum)) 6: for k = 1 to Random (1,UPM.PermNum) do 7: TempRole.Perms.append(Random (1,UPM.PermNum)) 8: InitRoleModel.append(TempRole) 9: for k = 1 to UPM.PermNum do 10: index = PSC(k) 11: if (index> threshold) 12: CorePerSet.append(k) (2) The role evolution algorithm uses the genetic algorithm to solve the problem. The genetic algorithm has strong global search ability, which can guarantee the high quality of the final role results. Moreover, the generated role set has the ability to dynamically evolve based on changes of user-permission relationships in an open computing environment. It can implement the role mining and the role reconstruction. The algorithm includes the selection, crossover, mutation and individual fitness calculation of the role gene (Section 5.3 and Section 5.4). When the fitness meets the condition or reaches the upper limit of evolutionary generations, the algorithm will stop. The pseudo code for the algorithm is shown in Algorithm 2. if (random(0, 1) < P c ): 5: I g1 , I g2 = crossChr(I 1 , I 2 ) 6: else: 7: I g1 , I g2 = I 1 , I 2 8: if (random(0, 1) < P m ) 9: I g1 , I g2 = mutChr(I g1 , I g2 ) 10: pop t+1 .append(I g1 , I g2 ) 11: } while( len(pop t+1 ) < n_pop ) 12: pop = pop t+1 , t = t + 1 13: }while( F(pop.chr) < Fe and t < n_gen ) (3) The post-processing algorithm is used to implement role optimization, removes redundant roles (as shown in Equation (8)) and determines whether the role set includes the core permission. If the core permission is lost during the evolution process, the algorithm will add the core role to supplement the core permission. The core permissions are calculated by the Preprocessing algorithm. The pseudo code for the algorithm is shown in Algorithm 3.

Role Relationship Aggregation Algorithm
In order to correlate the mined role model with the real semantics in the real environment, the role information can be better applied in the access control system. Based on mean shift clustering algorithm, we cluster the roles and put similar roles in one category. Clustering is a class of roles with similar users and permissions, and there may be semantic correlation in the real working environment. Using the results of clustering, security administrators can be instructed to assign real semantics to a large number of roles based on the working environment. This will lay the foundation for assigning semantic information and subsequent applications of roles. The pseudo code for the algorithm is shown in Algorithm 4. The mean shift algorithm is a center-based clustering algorithm. It can be utilized to deal with the case where the number k of clusters is unknown. It is not necessary to set the number of clusters k in advance. The core idea is to calculate the average value M of the distance between a certain point A and its surrounding radius R, and calculate the direction of the next shift of the point (A = M + A). When the point no longer moves, it forms a cluster with the surrounding points, and calculates the distance between the cluster and other clusters. If the distance is less than the threshold D, they are merged into a cluster. If it is not satisfied, a new cluster is formed by itself until all the data points are selected. The results will establish the semantic relationship among roles, guide the generation of roles in the real environment, and help security administrator to give semantic meaning to the role group.
As can be seen from Section 5.2, encoding of role gene is shown in Figure 8. The role gene can be transformed into a fixed-length character vector to achieve role clustering. Role clustering is achieved by transforming role genes to fixed-length role vectors. RoleClu.append(new_clu) The mean shift algorithm is a center-based clustering algorithm. It can be utilized to deal with the case where the number k of clusters is unknown. It is not necessary to set the number of clusters k in advance. The core idea is to calculate the average value M of the distance between a certain point A and its surrounding radius R, and calculate the direction of the next shift of the point (A = M + A). When the point no longer moves, it forms a cluster with the surrounding points, and calculates the distance between the cluster and other clusters. If the distance is less than the threshold D, they are merged into a cluster. If it is not satisfied, a new cluster is formed by itself until all the data points are selected. The results will establish the semantic relationship among roles, guide the generation of roles in the real environment, and help security administrator to give semantic meaning to the role group.
As can be seen from Section 5.2, encoding of role gene is shown in Figure 8. The role gene can be transformed into a fixed-length character vector to achieve role clustering. Role clustering is achieved by transforming role genes to fixed-length role vectors. For a role vector v in a given vector space, the basic form of mean shift is: Gr is a high-dimensional spherical space of radius r, defined as Equation (9). The k is the number of samples. Add all the vectors that are formed by all points and the center of the sphere in Gr and get the result of the mean shift vector Mr(v), as shown in Equation (10). For a role vector v in a given vector space, the basic form of mean shift is: G r is a high-dimensional spherical space of radius r, defined as Equation (9). The k is the number of samples. Add all the vectors that are formed by all points and the center of the sphere in G r and get the result of the mean shift vector M r (v), as shown in Equation (10).

Datasets and Experimental Settings
To verify the effectiveness of our method, we perform validation experiments based on six real access control data sets: Healthcare, Domino, Firewall1, Firewall2, America small , and America large . The dataset consists of users, permissions, and user-permission relationship. It is consistent with the access control structure that we set up in the application environment. The statistics about users and permissions in these data sets of access control are shown in Table 1 below. They are widely used in the literature [38,41,47,48] to analyze the performance of various role mining algorithms. By comparing the performance of algorithms in different data sets, the unstable evaluation results that are caused by a single data set can be effectively avoided, and the robustness of the method is tested. |U| is the number of users, |P| is the number of permissions, |UPM| is the number of user-permission assignment relationship. The hardware and software environment of the experiment is as follows: the operating system is Win10 64-bit, the CPU is Intel(R) Core(TM) i7-8750H@2.21 GHz, the GPU is GeForce GTX 1050 Ti Max-Q, the memory size is 16 GB, and the Python version is 3.6.

Performance Evaluation of Role Evolution
The role evolution process has been carried out 800 generations of evolutionary training for the four policy sets of Healthcare, Domino, Firewall1, and Firewall2. The parameters ω1-ω6 in the fitness function (Equation (7)) are set to [1,1,1,1,1,1]. We assume by default that these six evaluation indicators are equally important to the system. The population size n_pop is set to 100. The number of evolution generations n_gen is set to 800. Probability of gene crossover P_c is set to 0.6. Probability of gene mutation P_m is set to 0.35.
As shown in Figure 9a, the fitness of role evolution algorithm is continuously reduced, and the effectiveness of the proposed method is verified. The heuristic algorithm is effectively converged in the evolution process. At the same time, we test the average time cost per generation in different data sets. As shown in Figure 9b, the average time cost per generation of four data sets (Healthcare, Domino, Firewall1, and Firewall2) are 0.3918 s, 1.7925 s, 6.0236 s, and 4.7048 s, respectively. As the number of users and permissions increases, so does the time cost.
To evaluate the role evolution performance from multiple dimensions, our experiments evaluate the role evolution performance of different policy sets in different generations (as shown in Figure 10a-f). The evaluation criteria include six dimensions of final results: the role evolution accuracy Pe, the confidentiality indicator CI, the availability indicator AI, the number of roles |R|, the number of user-role assignment relationships |UR|, the number of role-permission assignment relationships |RP|.
To achieve better experimental convergence efficiency, we standardized these six indicators during the evolution process and mapped them to the [0, 1] intervals to track the changes of different indicators.
four policy sets of Healthcare, Domino, Firewall1, and Firewall2. The parameters ω1 -ω6 in the fitness function (Equation (7)) are set to [1, 1, 1, 1, 1, 1]. We assume by default that these six evaluation indicators are equally important to the system. The population size n_pop is set to 100. The number of evolution generations n_gen is set to 800. Probability of gene crossover P_c is set to 0.6. Probability of gene mutation P_m is set to 0.35.
As shown in Figure 9a, the fitness of role evolution algorithm is continuously reduced, and the effectiveness of the proposed method is verified. The heuristic algorithm is effectively converged in the evolution process. At the same time, we test the average time cost per generation in different data sets. As shown in Figure 9b, the average time cost per generation of four data sets (Healthcare, Domino, Firewall1, and Firewall2) are 0.3918s, 1.7925s, 6.0236s, and 4.7048s, respectively. As the number of users and permissions increases, so does the time cost. To evaluate the role evolution performance from multiple dimensions, our experiments evaluate the role evolution performance of different policy sets in different generations (as shown in Figure  10a- Figure 10f). The evaluation criteria include six dimensions of final results: the role evolution accuracy Pe, the confidentiality indicator CI, the availability indicator AI, the number of roles |R|, the number of user-role assignment relationships |UR|, the number of role-permission assignment relationships |RP|. To achieve better experimental convergence efficiency, we standardized these six indicators during the evolution process and mapped them to the [0, 1] intervals to track the changes of different indicators. As shown in Figure 10a, the experimental results show that the accuracy of role Evolution Pe for the four different policy sets can reach more than 90% within 800 generations. The resulting role model can cover most user-permission relationships. As shown in Figure10b, the final confidentiality indicators in all four datasets can maintain low levels. Further, the gradual decrease of confidentiality indicator shows that the permission leakage of the role model is significantly reduced during the evolution process, and the system security is significantly increased. As shown in Figure10c, the final availability indicators in the four datasets can maintain low level, which makes the role model of the As shown in Figure 10a, the experimental results show that the accuracy of role Evolution Pe for the four different policy sets can reach more than 90% within 800 generations. The resulting role model can cover most user-permission relationships. As shown in Figure 10b, the final confidentiality indicators in all four datasets can maintain low levels. Further, the gradual decrease of confidentiality indicator shows that the permission leakage of the role model is significantly reduced during the evolution process, and the system security is significantly increased. As shown in Figure 10c, the final availability indicators in the four datasets can maintain low level, which makes the role model of the system more available. However, availability indicators show a trend of first rising and then falling in the process of role evolution. In the early stage of role evolution process, due to the significant decrease of confidentiality indicators, the role of system will be granted too few permissions, which will lead to the increase of the permission loss in the role model (that is that the availability of the system is decreased). However, after 80 generations of evolution, the system has been able to significantly optimize the availability indicators so that the loss of permissions is reduced (that is that the availability of the system is increased). At the same time, the number of roles, user-role assignment relationships, and role-permission assignment relationships have also dropped significantly during the process (as shown in Figure 10d-f).
In addition, we compare the performance of our method with the five general algorithms of CRM [34], PC [35], CM [36], GO [37], and HM [38]. The experimental results are shown in Table 2. It can be seen from Table 2 that compared with other general methods, our method can more effectively compress the role scale under the premise of allowing certain evolution errors, and has better effects. It can effectively reduce the role management burden of security administrator and improve permission management efficiency of access control system. CRM and GO also have good performance in some data sets. This is because CRM adds some constraints. These constraints limit the number of permissions that a role can contain. GO uses graph optimization theory to avoid the backtracking search process of permission in other methods. However, these two methods only consider a single constraint index, and do not fully consider the security and availability of the system. There is a risk that the system will not work properly. In addition, when the user, permission, and user-permission relationships in the system change, it is necessary to start role mining from the beginning, which adds extra overhead and work. The old results of role mining are not used effectively.  Healthcare  14  24  31  16  17  13  Domino  20  64  62  20  27  20  Firewall1  66  248  278  71  91  61  Firewall2  10  14  21 10 10 9

Performance Evaluation of Role Relationship Aggregation
We have two role sets RoleSet1 and RoleSet2 which are generated based on the datasets America small and America large (the role size is around 3000). The two large-scale role sets are clustered by using role relationship aggregation algorithm, and the clustering results are mapped to the 2-dimensional space by using principal component analysis (PCA) [49,50]. The experimental results are shown in Figure 11a,b. The roles in RoleSet1 are clustered into five categories, and the roles in RoleSet2 are clustered into fifteen categories. We can directly understand the relationship between the roles by visualization. Since the aggregation algorithm does not need to set the clustering value, the automatic generation of the role clustering category can be realized. Further, the roles with similar users and permissions relationships can be effectively clustered. This result will assist the role to obtain real semantic information and improve the efficiency of security administrators to implement access control management.
RoleSet2 are clustered into fifteen categories. We can directly understand the relationship between the roles by visualization. Since the aggregation algorithm does not need to set the clustering value, the automatic generation of the role clustering category can be realized. Further, the roles with similar users and permissions relationships can be effectively clustered. This result will assist the role to obtain real semantic information and improve the efficiency of security administrators to implement access control management.

Conclusions
To meet the dynamic evolution needs of the access control role for open computing environment, this paper proposes a role evolution mechanism based on genetic algorithm, which can automatically implement the role mining and reconstruction. Furthermore, the mechanism can provide role support for intelligent and automated access control. The role evolution is optimized from multiple dimensions when taking into account the security and availability of resources. By introducing the evaluation of core permission, it can effectively avoid the loss of core permission in the process of role evolution. Moreover, the role relationship aggregation algorithm is used to implement the clustering of roles, which is an auxiliary means for giving real semantics to the roles. As a result, it improves the efficiency of the security administrator to implement access control management. In the future, we will optimize the performance and efficiency of role evolution to further improve accuracy and reduce the error.