Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

: Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that di ﬀ erent values of a sensitive attribute may have di ﬀ erent sensitivity requirements. To solve this problem, we deﬁned three security levels for di ﬀ erent sensitive attribute values that have di ﬀ erent sensitivity requirements, and given an L sl -diversity model for multiple sensitive attributes. Following this, we proposed three speciﬁc greed algorithms based on the maximal-bucket ﬁrst (MBF), maximal single-dimension-capacity ﬁrst (MSDCF) and maximal multi-dimension-capacity ﬁrst (MMDCF) algorithms and the maximal security-level ﬁrst (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L sl -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.


Introduction
In recent years, different organizations such as governments, hospitals and other institutions have published more and more microdata. Microdata plays a key role in data analysis, data mining and scientific research. However, publishing microdata unavoidably exposes the privacy of the individual. To protect the privacy of the individual, Sweeney et al. proposed a k-anonymity model [1,2]. The model requires that the microdata is partitioned into a set of equivalence classes, each equivalence class contains at least k records, and all records within an equivalence class are assigned the same generalized value over each of their quasi-identifier attributes. Thus, each record in the k-anonymity model cannot be identified successfully with a probability greater than 1/k. The l-diversity model in [3] extends the k-anonymity model. It requires that each equivalence class has at least l different "well-represented" values for a sensitive attribute, so it also implies l-anonymity. To address the limitations of the k-anonymity and l-diversity models, Li et al. [4] introduced the concept of t-closeness, which requires that the distribution of the sensitive attribute values within each equivalence class of indistinguishable records is similar to that of the sensitive attribute values in the entire microdata. Then, various enhanced anonymity methods were proposed, such as (α, k)-anonymity [5], p-sensitive k-anonymity [6], anatomy [7], slicing [8], anatomy and generalization (ANGEL) [9], and permutation anonymization [10].
The above-mentioned works focus on anonymizing the microdata with only one sensitive attribute. They cannot be directly applied to the microdata with multiple sensitive attributes. Therefore, some extended k-anonymity, l-diversity, p-sensitive and t-closeness methods for multiple sensitive attributes  were proposed. And some extended anatomy methods combining multi-sensitive bucketization (MSB), clustering, generalization or permutation for multiple sensitive attributes [33][34][35][36] were proposed. To apply the slicing technique to the microdata with multiple sensitive attributes, some enhanced slicing techniques for multiple sensitive attributes were proposed [37][38][39][40][41][42][43]. Additionally, decomposition and decomposition plus were introduced to achieve l-diversity for multiple sensitive attributes [44,45]. The above methods for multiple sensitive attributes do not consider the sensitive requirements of various sensitive attributes. Different sensitive attributes may have different sensitivity requirements, so the rating techniques for multiple sensitive attributes were introduced [46,47]. These rating techniques not only protect privacy for multiple sensitive attributes, but also keep a large amount of correlations of the microdata. In real world, different values of a sensitive attribute may have different sensitivity requirements. It is not appropriate to apply the same sensitive requirement to all values of the sensitivity attribute. Hence, the above rating techniques for multiple sensitive attributes are not suitable for this situation.
To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L sl -diversity model for multiple sensitive attributes. Then, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms [33] and the maximal security-level first (MSLF) greedy policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L sl -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. Moreover, they can solve the problem that the information loss of the MBF, MSDCF and MMDCF algorithms increases greatly with the increasing of sensitive attribute number.
The remainder of this article is organized as follows. Section 2 provides an overview of the existing privacy preserving data publishing methods for multiple sensitive attributes. In Section 3, we provide some notations and definitions. Section 4 describes the three specific greed algorithms in detail. In Section 5, we present the experimental results and analysis, and concludes the paper in Section 6.

Related Works
A large variety of privacy preserving data publishing methods have been proposed for multiple sensitive attributes. In terms of the extended k-anonymity, l-diversity, p-sensitive and t-closeness methods for multiple sensitive attributes. Nidhi et al. [11] proposed a new k-anonymity model for multiple sensitive attributes, which realizes record suppression with minimum data distortion. Usha et al. [12] extended the k-anonymity model for multiple sensitive attributes, and provided several algorithms for implementation of the extended k-anonymity model. Liu et al. [13] proposed a new k-anonymity algorithm for multiple sensitive attributes, which uses the distribution of sensitive attribute values as a parameter to prevent association disclosure. Wang et al. [14] proposed a novel privacy preserving model for multiple sensitive attributes based on k-anonymity, called (α, β, k)-anonymity. They set a hierarchy sensitive attribute rule to achieve (α, β, k)-anonymity and developed a corresponding algorithm to anonymize the microdata by using generalization and hierarchy. Wang et al. [15] clustered multiple sensitive attributes based on a utility matrix, and then used a greedy strategy to partition records into equivalence classes. This method can guarantee that the size of each equivalence class is k Information 2020, 11,166 3 of 27 except the last one, and can also guarantee the diversity of each sensitive attribute value within an equivalence class.
Ahmed et al. [16] proposed a probabilistic model of multiple sensitive attribute diversity to prevent identification or non-membership attack that arises when the microdata with multiple sensitive attributes is published. In [17][18][19], a (α, l) model was applied to satisfy the diversity requirements for multiple sensitive attributes. Zhang et al. [17] used anatomization with generalization and suppression based on the (α, l) model. Guo et al. [18] proposed a personalized privacy preserving model for multiple sensitive attributes based on MSB, called personalized (α, l)-anonymity model. Li et al. [19] considered the associations between multiple sensitive attributes to prevent all chances of the positive and negative disclosure, and used a two-step greedy generalization algorithm to manage multiple sensitive attributes. Zhu et al. [20] proposed an addictive noise approach that publishes some anonymized tables after fulfilling the requirement of l-diversity. This approach replaces the multiple sensitive attribute values of each record by a value set and at least l-1 random selected noise values. Huang et al. [21] proposed a (v, l)-anonymity model which checks the differences of sensitive attribute values by incorporating the classification of sensitive attribute values. And (l 1 , l 2 )-diversity is used to validate the model. Jin et al. [22] proposed a l-coverage cluster grouping model which can handle multiple sensitive attributes. And this model is based on cluster algorithm.
Gal et al. [23] proposed a new model that extends k-anonymity and l-diversity to handle multiple sensitive attributes, and proposed a practical algorithm to implement this model. The algorithm used for this model contains two steps. In the first step, the microdata is divided into partitions, so that every partition contains at least k records and satisfies l-diversity. In the subsequent step, the microdata is anatomized. Wahyu et al. [24] proposed a distribution model to set sensitive attribute values when p-sensitive is applied to multiple sensitive attributes, minimizing their probability of disclosure. Wu et al. [25] proposed a p-cover k-anonymity model for protecting multiple sensitive attributes, and extended the incognito algorithm [26] to implement this model. Lin et al. [27] proposed a novel (k, p)-anonymity framework to solve the disclosure problem of sensitive attributes in the k-anonymity and l-diversity models. Anjum et al. [28] proposed an efficient approach for the anonymization of multiple sensitive attributes, called (p, k)-Angelization. The (p, k)-Angelization approach not only protects the privacy of the individual, but also improves the utility of the released information. Kanwala et al. [29] proposed a privacy-preserving model for 1:M records (i.e., an individual can have multiple records) dataset with multiple sensitive attributes, called (p, l)-Angelization.
Wang et al. [30] proposed two privacy-preserving algorithms for multiple sensitive attributes to satisfy the t-closeness model. The two algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages a principal component analysis. Sowmyarani et al. [31] proposes a (p+)-sensitive, t-closeness model for multiple sensitive attributes. It combines the advantages of the t-closeness and the p-sensitive k-anonymity approaches to reduce the possibility of the similarity and skewness attacks of the anonymization techniques. Saraswathi et al. [32] proposed an enhanced t-closeness algorithm for multiple sensitive attributes. In the algorithm, t-closeness is applied over MSB k-anonymity clustering attribute hierarchy (MSB-KACA) algorithm. And they used earth mover distance (EMD) to avoid probabilistic inference attack due to bucketization.
In terms of the extended anatomy methods combining MSB, clustering, generalization or permutation for multiple sensitive attributes, Yang et al. [33] proposed an MSB approach. The main idea of the MSB approach is to partition the given table into a quasi-identifier attribute table  and a sensitive attribute table, and to make that each sensitive attribute satisfies the l-diversity constraints. Lin et al. [34] proposed a technique to handle multiple numerical sensitive attributes and to eliminate the threat of proximity breach for multiple sensitive attributes. They applied clustering and MSB techniques to release the microdata with multiple numerical sensitive attributes. Luo et al. [35] proposed an improved framework for multiple sensitive attributes, named anatomy and generalization on multiple sensitive attributes (ANGELMS). This approach vertically partitions the attributes into one quasi-identifier attribute table and several sensitive attribute tables. Each sensitive attribute table divides the records of the microdata into groups (i.e., buckets). Each bucket obeys the l-diversity requirement. In the quasi-identifier attribute table, each group generalizes the quasi-identifier attribute values by following the k-anonymity principle. Ye et al. [36] proposed an anonymization method combining anatomy and permutation for protecting privacy of the microdata with multiple sensitive attributes. This method includes two major steps: anatomizing microdata and permutating quasi-identifier attributes. To realize the anonymization method, they further proposed two algorithms, namely naive multi-sensitive bucketization permutation algorithm (NMBPA) and closest distance multi-sensitive bucketization permutation algorithm (CDMBPA).
In terms of the extended slicing methods for multiple sensitive attributes, Dhumal et al. [37] applied the slicing technique without permuting the values of multiple sensitive attributes and did not consider the quasi-identifier attributes while proposing this technique. Kiruthika et al. [38] proposed some enhanced slicing techniques like Mondrian and suppression slicing. Mondrian slicing randomly switches all the buckets whereas suppression slicing permutes the quasi-identifier attribute values of the records. Suppression slicing maintains the microdata's utility by guaranteeing the l-diversity principle in each quasi-identifier attribute group. Luo et al. [39] extended the slicing technique from single sensitive attribute to multiple sensitive attributes, which is called slicing on multiple sensitive (SLOMS). Further, they proposed an MSB-KACA algorithm to anonymize the microdata with multiple sensitive attributes by SLOMS. In [40], a dynamic data publishing technique for multiple sensitive attributes was proposed, named the KC slice. The proposed technique integrates the features of LKC-privacy and slicing techniques. Raju et al. [41] proposed a novel dynamic KCi-Slice publishing prototype for retaining the privacy and utility of multiple sensitive attributes, which is an improvement of KC-Slice. Reddy et al. [42] proposed a privacy preserving data publishing model that manages personalization for publishing the microdata with multiple sensitive attributes. The model uses the slicing technique supported by deterministic anonymization for quasi-identifier attribute, i.e., generalization for categorical sensitive attributes and fuzzy approach for numerical sensitive attributes based on diversity. Susan et al. [43] conducted a work which combined the anatomy and slicing techniques for multiple sensitive attributes, called anatomization with slicing for multiple sensitive attributes (SLAMSA). They used anatomization to reduce information loss and enhanced the slicing technique to improve attribute correlation.
In terms of the decomposition methods for multiple sensitive attributes, Ye et al. [44] proposed a decomposition technique to achieve l-diversity for multiple sensitive attributes. In the decomposition technique, vertical partitioning of multiple sensitive attributes is done that divides the original table into two tables, i.e., a sensitive table and a non-sensitive table. But adding noise in the decomposition technique causes distortion. Hence, Das et al. [45] extended the decomposition technique by optimizing the noise value selection (i.e., choosing the noise value closer to the original values), called decomposition plus.
The above methods for multiple sensitive attributes do not consider the sensitive requirements of sensitive attributes. Because different sensitive attributes may have different sensitivity requirements, Liu et al. [46] introduced a rating technique for multiple sensitive attributes, which is based on different sensitivity coefficients for different attributes. This approach not only protects privacy for multiple sensitive attributes, but also keeps a large amount of correlations of the microdata. But the rating technique can be attacked by applying association rules due to the relationship between sensitive attribute values. Yi et al. [47] removed the weaknesses of the rating technique and eliminated the threat of association attack.

Notations and Definitions
In the real world, different values of a sensitive attribute may have different sensitivity requirements. Some values of the sensitive attribute have no sensitivity requirement, i.e., these sensitive attribute values do not need to be protected because their leakage is not harmful to the individual. Some values Information 2020, 11, 166 5 of 27 of the sensitive attribute have low sensitivity requirement, i.e., these sensitive attribute values need to be protected to some extent because their leakage cause certain harm to the individual. Furthermore, some values of the sensitive attribute have high sensitivity requirement, i.e., these sensitive attribute values need to be well protected because their leakage cause serious harm to the individual. Accordingly, three sensitive attribute security levels are defined as follows.
Definition 1 (sensitive attribute security Level 0). Sensitive attribute security Level 0 is the security level of a sensitive attribute value with no sensitivity requirement, i.e., a sensitive attribute value with sensitive attribute security Level 0 have no sensitivity requirement. Definition 2 (sensitive attribute security Level 1). Sensitive attribute security Level 1 is the security level of a sensitive attribute value with low sensitivity requirement, i.e., a sensitive attribute value with sensitive attribute security Level 1 have low sensitivity requirement.
Definition 3 (sensitive attribute security Level 2). Sensitive attribute security Level 2 is the security level of a sensitive attribute value with high sensitivity requirement, i.e., a sensitive attribute value with sensitive attribute security Level 2 have high sensitivity requirement.
Let T = A 1 , A 2 , . . . , A p , S 1 , S 2 , . . . , S d be the microdata, where A i denotes the ith quasi-identifier attribute and 1 ≤ i ≤ p, S j denotes the jth sensitive attribute and 1 ≤ j ≤ d, p denotes the number of quasi-identifier attributes and d denotes the number of sensitive attributes, n denotes the number of records of T(i.e., n =|T|), t k denotes the kth record of T and 1 ≤ k ≤ n, and t k [X] denotes the value of the attribute X of the kth record. An example of the microdata is shown in Table 1. In Table 1, social security number (SSN) and name are two identifier attributes. Age, sex, race and zipcode are four quasi-identifier attributes. Further, physician and disease are two sensitive attributes. t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 and t 9 are nine records of the microdata. The values of the sensitive attribute physician are John, Bob, Anne, Sam and Mary. Following this, the values of the sensitive attribute disease are flu, pneumonia, gastritis, human immunodeficiency virus (HIV) and cancer. For the former, the security levels of all sensitive attribute values can be set to sensitive attribute security Level 1 because these sensitive attribute values have the same and low sensitivity requirement. For the latter, the security level of flu can be set to sensitive attribute security Level 0 because the sensitive attribute value has no sensitivity requirement. The security levels of Pneumonia and Gastritis can be set to sensitive attribute security Level 1 because the two sensitive attribute values have the same and low sensitivity requirement. Further, the security levels of HIV and cancer can be set to sensitive attribute security Level 2 because the two sensitive attribute values have the same and high sensitivity requirement. Definition 4 (composite sensitive attribute) [33]. A composite sensitive attribute is the whole of all sensitive attributes of T, denoted by S = {S 1 , S 2 , . . . , S d }, where the ith sensitive attribute S i (1 ≤ i ≤ d) is the ith dimension of the composite sensitive attribute. D(S i ) is the value field of S i , and |S i | represents the number of D(S i ).
Definition 5 (composite sensitive attribute vector) [33]. A composite sensitive attribute vector is a vector form of all sensitive attribute values of the kth record t k in T, denoted by Definition 6 (group) [33]. A group is a subset of records of T. each record of T belongs to only one group. All groups of T is denoted as GT = {G 1 , G 2 , . . . , G m }, where m denotes the number of all groups of T.
For Table 1, the composite sensitive attribute of the microdata is {Physician, Disease}, and a composite sensitive attribute vector can be <John, Flu>. G 1 = {t 1 , t 2 , t 3 }, G 2 = {t 3 , t 4 , t 5 } and G 3 = {t 7 , t 8 , t 9 } can be three groups of the microdata, and Definition 7 (l-diversity for single sensitive attribute) [33]. For a group G with single sensitive attribute, if v is the sensitive attribute value with the maximum frequency and c(v)/ G ≤ 1/l , where c(v) denotes the frequency of v, |G| denotes the number of records of G, then G satisfies l-diversity for single sensitive attribute.
Definition 8 (l-diversity for multiple sensitive attributes) [33]. For a group G with multiple sensitive attributes, if each sensitive attribute of the composite sensitive attribute in G satisfies l-diversity for single sensitive attribute, then G satisfies l-diversity for multiple sensitive attributes.
Definition 9 (l-diversity group for multiple sensitive attributes) [33]. An l-diversity group for multiple sensitive attributes is a group of T and the group satisfies l-diversity for multiple sensitive attributes. All l-diversity groups for multiple sensitive attributes are denoted as GTc = {G 1 , G 2 , . . . , G mc }, where mc denotes the number of all l-diversity groups for multiple sensitive attributes.
From Definitions 7, 8 and 9, all the sensitive attribute values of each group obey the same l-diversity requirement, i.e., the same sensitive requirement is applied to them. This is not appropriate and will cause extra information loss of the microdata. Because the maximal security level of sensitive attribute values in the microdata shown in Table 1 is sensitive attribute security Level 2, so l is set to 3 in this paper. Thus, only three diversity groups for multiple sensitive attributes can be formed. For different sensitive attribute values with different sensitive attribute security levels, they should have different l-diversity requirements because they have different sensitivity requirements. Hence, L sl -diversity for single sensitive attribute and L sl -diversity for multiple sensitive attributes are defined as follows, where L sl ⊆ {l 0 , l 1 , l 2 }, l 0 for sensitive attribute security level 0, l 1 for sensitive attribute security Level 1, l 2 for sensitive attribute security Level 2, and l 0 , l 1 , l 2 are set to 1, 2 and 3 in this paper, respectively. Definition 10 (L sl -diversity for single sensitive attribute). For a group G with single sensitive attribute, if v 0 is a sensitive attribute value with sensitive attribute security level 0 of G, then c(v 0 )/ G ≤ 1/l 0 , where c(v 0 ) denotes the frequency of v 0 in G, |G| denotes the number of records in G. Similarly, if v 1 is a sensitive attribute value with sensitive attribute security level 1 of G, then c(v 1 )/ G ≤ 1/l 1 , where c(v 1 ) denotes the frequency of v 1 in G. Further, if v 2 is a sensitive attribute value with sensitive attribute security level 2 of G, then c(v 2 )/ G ≤ 1/l 2 , where c(v 2 ) denotes the frequency of v 2 in G. Then, G satisfies L sl -diversity for single sensitive attribute.
Definition 11 (L sl -diversity for multiple sensitive attributes). For a group G with multiple sensitive attributes, if each sensitive attribute of the composite sensitive attribute in G satisfies L sl -diversity for single sensitive attribute, then G satisfies L sl -diversity for multiple sensitive attributes.
Definition 12 (L sl -diversity group for multiple sensitive attributes). An L sl -diversity group for multiple sensitive attributes is a group of T and the group satisfies L sl -diversity for multiple sensitive attributes. All L sl -diversity groups of T is denoted as GTs = {G 1 , G 2 , . . . , G ms }, where ms denotes the number of all L sl -diversity groups of T.
For Table 1, as described above, the security levels of all values of the sensitive attribute Physician is sensitive attribute security Level 1. Further, the security level of Flu is sensitive attribute security Level 0, the security levels of pneumonia and gastritis are sensitive attribute security Level 1, and the security levels of HIV and Cancer are sensitive attribute security Level 2. Any record of the microdata shown in Table 1 consists of one value of the sensitive attribute physician and one value of the sensitive attribute disease. As a result, L sl includes at least l 1 , so {l 1 }-diversity groups for multiple sensitive attributes, {l 0 , l 1 }-diversity groups for multiple sensitive attributes, {l 1 , l 2 }-diversity groups for multiple sensitive attributes and {l 0 , l 1 , l 2 }-diversity groups for multiple sensitive attributes can be formed.
Definition 13 (multiple dimensional bucket) [33]. A multiple dimensional bucket is a bucket that each dimension of the composite sensitive attribute is one of dimensions of the bucket. Therefore, the records of T can be mapped to corresponding buckets according to the sensitive attribute values of each dimension of their composite sensitive attribute vectors. If the number of dimensions of the composite sensitive attribute in T is d, then d dimensional buckets of T can be established, denoted as Bucket(S 1 , S 2 , . . . , S d ), where each d dimensional bucket is denoted as buk < s 1 , s 2 , . . . , s d >, s j ∈ D(S j ) and 1 ≤ j ≤ d, and the size of each d dimensional bucket of is denoted as size(buk < s 1 , s 2 , . . . , s d >), i.e., the number of records in the d dimensional bucket. Further, the dimension capacity of a certain value s 0 j ∈ D(S j ) on the S j dimension of the d dimensional bucket is the sum of all the bucket sizes with the certain value s 0 j on this dimension, denoted as Capa(s 0 According to Table 1, two-dimensional buckets of T can be established, as shown in Figure 1. For Table 1, as described above, the security levels of all values of the sensitive attribute Physician is sensitive attribute security Level 1. Further, the security level of Flu is sensitive attribute security Level 0, the security levels of pneumonia and gastritis are sensitive attribute security Level 1, and the security levels of HIV and Cancer are sensitive attribute security Level 2. Any record of the microdata shown in Table 1 consists of one value of the sensitive attribute physician and one value of the sensitive attribute disease. As a result, sl L includes at least 1 l , so 1 { } l -diversity groups for multiple sensitive attributes, 0 1 { , } l l -diversity groups for multiple sensitive attributes, 1 2 { , } l ldiversity groups for multiple sensitive attributes and 0 1 2 { , , } l l l -diversity groups for multiple sensitive attributes can be formed.
According to Table 1, two-dimensional buckets of T can be established, as shown in Figure 1.
is a certain two-dimensional bucket, i.e., { 6 t } in Figure 1, and In Figure 1, the leftmost column is the values of the sensitive attribute physician, and the top row is the values of the sensitive attribute disease. The rightmost column is the dimension capacities of the values of the sensitive attribute physician, and the bottom row is the dimension capacities of the values of the sensitive attribute disease. Further, the five rows and five columns in the middle are 2-dimensional buckets of T. For example, when s 0 1 is Anne and s 0 2 is gastritis, buk < s 0 1 , s 0 2 > is a certain two-dimensional bucket, i.e., {t 6 } in Figure 1, and size(buk In [33], the MSB method includes two stages: grouping phase and residual processing phase. In the first stage, according to a greedy strategy, l buckets with different values on each dimension are selected, and one record is extracted from each bucket to form an l-diversity group for multiple sensitive attributes, which circulates until it cannot form a new l-diversity group for multiple sensitive attributes that meets the requirements. In the second stage, for the remaining records in the multi-dimensional buckets after grouping, add them to the existing l-diversity groups for multiple sensitive attributes as much as possible without destroying l-diversity for multiple sensitive attributes. Finally, records that do not belong to any l-diversity group for multiple sensitive attributes are suppressed from the published microdata. After the above steps, the quasi-identifier attributes of each l-diversity group for multiple sensitive attributes are published as a quasi-identifier attribute table, and the sensitive attributes of each l-diversity group for multiple sensitive attributes are published as a sensitive attribute table. Further, both the additional information loss and the suppression ratio are taken as the standard to measure the quality of the published microdata. The definition of additional information loss is extended as follows. Obviously, the smaller the suppression ratio is, the less records are lost. When the suppression ratio is the same, the smaller the additional information loss, the less information is lost.

Our Proposed Algorithms
In [33], three specific greed algorithms were proposed to implement the above MSB method, called MBF, MSDCF, and MMDCF. According to Definitions 10, 11 and 12, a record with a high sensitive attribute security level is more difficult to be used to form a group than a record with a low sensitive attribute security level, so the record with a higher sensitive attribute security level should be prioritized to form a group. In view of this idea, we also propose three specific greed algorithms based on the MBF, MSDCF and MMDCF algorithms and the MSLF greedy policy, named as MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF, to implement the L sl -diversity model for multiple sensitive attributes.

MBF-MSLF
The basic idea of the MBF-MSLF algorithm is to first select an unshielded non-empty d dimensional bucket with the maximal sensitive attribute security level and the largest bucket size, and extract a record from the bucket to add to a group and delete the record from the bucket. For this record, all buckets of some certain dimensions of the record are shielded when adding any record of these buckets to the group will destroy L sl -diversity for multiple sensitive attributes of the group. By repeating the above process, the L sl -diversity group is formed. Following this, the shielding of each d dimensional bucket is removed, and the above grouping process is repeated until a complete group cannot be formed. For each remaining record, it is added to a formed group without destroying L sl -diversity for multiple sensitive attributes of the group. Finally, the records that cannot be added to any formed group will be suppressed in the published microdata. The specific steps of the MBF-MSLF algorithm are shown in Algorithm 1. cannot be added to any formed group will be suppressed in the published microdata. The specific steps of the MBF-MSLF algorithm are shown in Algorithm 1.

MSDCF-MSLF
The basic idea of the MSDCF-MSLF algorithm is to first select an unshielded non-empty d dimensional bucket with the maximal sensitive attribute security level and the largest bucket selectivity, and extract a record from the bucket to add to a group and delete the record from the bucket. The bucket selectivity in the MSDCF-MSLF algorithm is calculated as follows.

MSDCF-MSLF
The basic idea of the MSDCF-MSLF algorithm is to first select an unshielded non-empty d dimensional bucket with the maximal sensitive attribute security level and the largest bucket selectivity, and extract a record from the bucket to add to a group and delete the record from the bucket. The bucket selectivity in the MSDCF-MSLF algorithm is calculated as follows.
where size(buk < s 0 1 , s 0 2 , . . . , s 0 d >), Max 1≤ j≤d Capa(s 0 j ) and Select(buk < s 0 1 , s 0 2 , . . . , s 0 d >) are the bucket size, the maximal single-dimensional capacity and the bucket selectivity of a certain bucket buk < s 0 1 , s 0 2 , . . . , s 0 d >, respectively. For this record, all buckets of some certain dimensions of the record are shielded when adding any record of these buckets to the group will destroy L sl -diversity for multiple sensitive attributes of the group. By repeating the above process, the L sl -diversity group is formed. Following this, the shielding of each d dimensional bucket is removed, and the above grouping process is repeated until a complete group cannot be formed. For each remaining record, it is added to a formed group without destroying L sl -diversity for multiple sensitive attributes of the group. Finally, the records that cannot be added to any formed group will be suppressed in the published microdata. The specific steps of the MSDCF-MSLF algorithm are shown in Algorithm 2.

MMDCF-MSLF
The basic idea of the MMDCF-MSLF algorithm is to first select an unshielded non-empty d dimensional bucket with the maximal sensitive attribute security level and the largest bucket selectivity, and extract a record from the bucket to add to a group and delete the record from the bucket. The bucket selectivity in the MMDCF-MSLF algorithm is calculated as follows.
where size(buk < s 0 1 , s 0 2 , . . . , s 0 d >), 1≤ j≤d Capa(s 0 j ) and Select(buk < s 0 1 , s 0 2 , . . . , s 0 d >) are the bucket size, the sum of all dimension capacities and the bucket selectivity of a certain bucket buk < s 0 1 , s 0 2 , . . . , s 0 d >, respectively. For this record, all buckets of some certain dimensions of the record are shielded when adding any record of these buckets to the group will destroy L sl -diversity for multiple sensitive attributes of the group. By repeating the above process, the L sl -diversity group is formed. Following this, the shielding of each d dimensional bucket is removed, and the above grouping process is repeated until a complete group cannot be formed. For each remaining record, it is added to a formed group without destroying L sl -diversity for multiple sensitive attributes of the group. Finally, the records that cannot be added to any formed group will be suppressed in the published microdata. The specific steps of the MMDCF-MSLF algorithm are shown in Algorithm 3.

Algorithm 2: MSDCF-MSLF
for multiple sensitive attributes of the group. By repeating the above process, the sl L -diversity group is formed. Following this, the shielding of each d dimensional bucket is removed, and the above grouping process is repeated until a complete group cannot be formed. For each remaining record, it is added to a formed group without destroying sl L -diversity for multiple sensitive attributes of the group. Finally, the records that cannot be added to any formed group will be suppressed in the published microdata. The specific steps of the MSDCF-MSLF algorithm are shown in Algorithm 2.

Experimental Results and Analysis
The experimental environment in this paper is as follows: Intel (R) Core (TM) i5-7200U 2.5 GHz dual-core processor with 8 GB memory, Windows 10 64 bit operating system, and the programming language is C++. The experimental microdata is the demographic dataset of university of California Irvine (UCI) machine learning repository from http://kdd.ics.uci.edu. The microdata contains 30162 complete records, and each record has nine fields, where the Occupation field, the Education field,

Experimental Results and Analysis
The experimental environment in this paper is as follows: Intel (R) Core (TM) i5-7200U 2.5 GHz dual-core processor with 8 GB memory, Windows 10 64 bit operating system, and the programming language is C++. The experimental microdata is the demographic dataset of university of California Irvine (UCI) machine learning repository from http://kdd.ics.uci.edu. The microdata contains 30162 complete records, and each record has nine fields, where the Occupation field, the Education field, the Marital field, the Workclass field and the Race field are chosen as multiple sensitive attributes in this paper, as shown in Table 2. For the multiple sensitive attributes of the microdata, different sensitive attribute security levels and composite sensitive attributes are chosen in this paper, as shown as Tables 3 and 4 respectively. The experiment mainly compares and analyzes the additional information loss, the suppression ratio and the central processing unit (CPU) runtime of the algorithms from the following aspects: (1) changing the number of records (i.e., the value of n is from 1000 to 10000) when the number of sensitive attributes is set to three; (2) changing the number of sensitive attributes (i.e., the value of d is from 2 to 5) when the number of records is set to 2000.  Figures 2-4 are the additional information loss, the suppression ratio and the CPU runtime of MBF and MBF-MSLF when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3, respectively.  MBF and MBF-MSLF when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3, respectively.   According to Figures 2, 3 and 4, compared with MBF, the additional information loss of MBF-MSLF increases a little, but the suppression ratio the MBF-MSLF directly decreases to 0, which greatly reduces the information loss of the published microdata. With the increasing of data volume, the According to Figures 2-4, compared with MBF, the additional information loss of MBF-MSLF increases a little, but the suppression ratio the MBF-MSLF directly decreases to 0, which greatly reduces the information loss of the published microdata. With the increasing of data volume, the additional information loss and the suppression ratio of MBF and MBF-MSLF tend to be stable because the distribution of the sensitive attribute values in the microdata becomes more and more stable. In addition, with the increasing of data volume, the CPU runtime of MBF and MBF-MSLF increase gradually, and the CPU runtime of MBF-MSLF increases faster than the that of MBF. Figures 5-7 are the additional information loss, the suppression ratio and the CPU runtime of MBF and MBF-MSLF when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000, respectively. According to Figures 2, 3 and 4, compared with MBF, the additional information loss of MBF-MSLF increases a little, but the suppression ratio the MBF-MSLF directly decreases to 0, which greatly reduces the information loss of the published microdata. With the increasing of data volume, the additional information loss and the suppression ratio of MBF and MBF-MSLF tend to be stable because the distribution of the sensitive attribute values in the microdata becomes more and more stable. In addition, with the increasing of data volume, the CPU runtime of MBF and MBF-MSLF increase gradually, and the CPU runtime of MBF-MSLF increases faster than the that of MBF. Figures 5, 6 and 7 are the additional information loss, the suppression ratio and the CPU runtime of MBF and MBF-MSLF when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000, respectively.     According to Figures 5, 6 and 7, compared with MBF, the additional information loss of MBF-MSLF increases a little, but the suppression ratio of MBF-MSLF directly decreases to 0, which greatly reduces the information loss of the published microdata. With the increasing of sensitive attribute number, the additional information loss of MBF tends to be 0 quickly, and the suppression ratio of MBF increases very fast. This shows that fewer and fewer groups can be formed, and it is more and more difficult to add records to the formed groups with the increasing of sensitive attribute number. But for MBF-MSLF, with the increasing of sensitive attribute number, its additional information loss increases quickly and then decreases slowly, while its suppression ratio is always 0. This shows that all records of the microdata can be grouped, but at first it is easier to add records to the formed groups, and then it is more difficult to add records to the formed groups with the increasing of sensitive attribute number. In addition, with the increasing of sensitive attribute number, the CPU runtime of MBF and MBF-MSLF increase gradually, and the CPU runtime of MBF-MSLF increases faster than the that of MBF.

Comparative Analysis of MBF and MBF-MSLF
From the above comparative analysis of MBF and MBF-MSLF, compared with MBF, MBF-MSLF can greatly reduce the information loss of the published microdata, but its runtime is only a small increase. Like MBF, the information loss of MBF-MSLF tends to be stable with the increasing of data volume. And MBF-MSLF can solve the problem that the information loss of MBF increases greatly with the increasing of sensitive attribute number. According to Figures 5-7, compared with MBF, the additional information loss of MBF-MSLF increases a little, but the suppression ratio of MBF-MSLF directly decreases to 0, which greatly reduces the information loss of the published microdata. With the increasing of sensitive attribute number, the additional information loss of MBF tends to be 0 quickly, and the suppression ratio of MBF increases very fast. This shows that fewer and fewer groups can be formed, and it is more and more difficult to add records to the formed groups with the increasing of sensitive attribute number. But for MBF-MSLF, with the increasing of sensitive attribute number, its additional information loss increases quickly and then decreases slowly, while its suppression ratio is always 0. This shows that all records of the microdata can be grouped, but at first it is easier to add records to the formed groups, and then it is more difficult to add records to the formed groups with the increasing of sensitive attribute number. In addition, with the increasing of sensitive attribute number, the CPU runtime of MBF and MBF-MSLF increase gradually, and the CPU runtime of MBF-MSLF increases faster than the that of MBF.
From the above comparative analysis of MBF and MBF-MSLF, compared with MBF, MBF-MSLF can greatly reduce the information loss of the published microdata, but its runtime is only a small increase. Like MBF, the information loss of MBF-MSLF tends to be stable with the increasing of data volume. And MBF-MSLF can solve the problem that the information loss of MBF increases greatly with the increasing of sensitive attribute number. According to Figures 8-10, compared with MSDCF, the additional information loss of MSDCF-MSLF increases a little, but the suppression ratio of MSDCF-MSLF directly decreases to 0. Thus, MSDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of data volume, the additional information loss and the suppression ratio of MSDCF and MSDCF-MSLF tend to be stable. This is because the distribution of the sensitive attribute values in the microdata becomes more and more stable. Moreover, with the increasing of data volume, the CPU runtime of MSDCF and MSDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF increases faster than the that of MSDCF. Figures 8-10 are the additional information loss, the suppression ratio and the CPU runtime of MSDCF and MSDCF-MSLF when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3, respectively.         number of sensitive attributes is set to 3, respectively.     MSLF increases a little, but the suppression ratio of MSDCF-MSLF directly decreases to 0. Thus, MSDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of data volume, the additional information loss and the suppression ratio of MSDCF and MSDCF-MSLF tend to be stable. This is because the distribution of the sensitive attribute values in the microdata becomes more and more stable. Moreover, with the increasing of data volume, the CPU runtime of MSDCF and MSDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF increases faster than the that of MSDCF. Figures 11-13 are the additional information loss, the suppression ratio and the CPU runtime of MSDCF and MSDCF-MSLF when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000, respectively.   MSDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of data volume, the additional information loss and the suppression ratio of MSDCF and MSDCF-MSLF tend to be stable. This is because the distribution of the sensitive attribute values in the microdata becomes more and more stable. Moreover, with the increasing of data volume, the CPU runtime of MSDCF and MSDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF increases faster than the that of MSDCF. Figures 11-13 are the additional information loss, the suppression ratio and the CPU runtime of MSDCF and MSDCF-MSLF when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000, respectively.   According to Figures 11-13, compared with MSDCF, the additional information loss of MSDCF-MSLF increases a little, but the suppression ratio of MSDCF-MSLF directly decreases to 0. Thus, MSDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of sensitive attribute number, the additional information loss of MSDCF tends to be 0 quickly, and the suppression ratio of MSDCF increases very fast. This means that fewer and fewer groups can be formed, and it is more and more difficult to add records to the formed groups with the According to compared with MSDCF, the additional information loss of MSDCF-MSLF increases a little, but the suppression ratio of MSDCF-MSLF directly decreases to 0. Thus, MSDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of sensitive attribute number, the additional information loss of MSDCF tends to be 0 quickly, and the suppression ratio of MSDCF increases very fast. This means that fewer and fewer groups can be formed, and it is more and more difficult to add records to the formed groups with the increasing of sensitive attribute number. However, with the increasing of sensitive attribute number, the additional information loss of MSDCF-MSLF increases quickly and then decreases slowly, while its suppression ratio is always 0. This means that all records of the microdata can be grouped, but at first it is easier to add records to the formed groups, and then it is more difficult to add records to the formed groups with the increasing of sensitive attribute number. Moreover, with the increasing of sensitive attribute number, the CPU runtime of MSDCF and MSDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF increases faster than the that of MSDCF.

Comparative Analysis of MSDCF and MSDCF-MSLF
From the above comparative analysis of MSDCF and MSDCF-MSLF, compared with MSDCF, MSDCF-MSLF can greatly reduce the information loss of the published microdata, but its runtime is only a small increase. the information loss of MSDCF-MSLF tends to be stable with the increasing of data volume, like MSDCF. Furthermore, MSDCF-MSLF can solve the problem that the information loss of MSDCF increases greatly with the increasing of sensitive attribute number.                 Compared with MMDCF, the additional information loss of MMDCF-MSLF increases a little, but the suppression ratio of MMDCF-MSLF directly decreases to 0 according to Figures 17-19. Hence, MMDCF-MSLF can greatly reduce the information loss of the published microdata. With the increasing of sensitive attribute number, the additional information loss of MMDCF tends to be 0 quickly, and the suppression ratio of MMDCF increases very fast. This illustrates that fewer and fewer groups can be formed, and it is more and more difficult to add records to the formed groups with the increasing of sensitive attribute number. With the increasing of sensitive attribute number, the additional information loss of MMDCF-MSLF increases quickly and then decreases slowly, while its suppression ratio is always 0. This illustrates that all records of the microdata can be grouped, but at first it is easier to add records to the formed groups, and then it is more difficult to add records to the formed groups with the increasing of sensitive attribute number. Additionally, the CPU runtime of MMDCF and MMDCF-MSLF increase gradually, and the CPU runtime of MMDCF-MSLF increases faster than the that of MMDCF with the increasing of sensitive attribute number.

Comparative Analysis of MMDCF and MMDCF-MSLF
From the above comparative analysis of MMDCF and MMDCF-MSLF, compared with MMDCF, MMDCF-MSLF can greatly reduce the information loss of the published microdata, but its runtime is only a small increase. The information loss of MMDCF-MSLF tends to be stable with the increasing of data volume, similar to Like MMDCF. Furthermore, MMDCF-MSLF can solve the problem that the information loss of MMDCF increases greatly with the increasing of sensitive attribute number.

Comparative Analysis of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF
The suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF is 0 when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3. Further, Figures 20 and 21 are the additional information loss and the CPU runtime of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3, respectively.
increases faster than the that of MMDCF with the increasing of sensitive attribute number.
From the above comparative analysis of MMDCF and MMDCF-MSLF, compared with MMDCF, MMDCF-MSLF can greatly reduce the information loss of the published microdata, but its runtime is only a small increase. The information loss of MMDCF-MSLF tends to be stable with the increasing of data volume, similar to Like MMDCF. Furthermore, MMDCF-MSLF can solve the problem that the information loss of MMDCF increases greatly with the increasing of sensitive attribute number.

Comparative Analysis of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF
The suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF is 0 when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3. Further, Figures 20 and 21 are the additional information loss and the CPU runtime of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF when the number of records is changed from 1000 to 10000 and the number of sensitive attributes is set to 3, respectively.  With the increasing of data volume, the suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF are all 0, but the additional information loss of MSDCF-MSLF or MMDCF-MSLF is smaller than that of MBF-MSLF according to Figure 20. This shows that all records can be grouped by using MBF-MSLF, MSDCF-MSLF or MMDCF-MSLF, but the added records of MSDCF-MSLF or With the increasing of data volume, the suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF are all 0, but the additional information loss of MSDCF-MSLF or MMDCF-MSLF is smaller than that of MBF-MSLF according to Figure 20. This shows that all records can be grouped by using MBF-MSLF, MSDCF-MSLF or MMDCF-MSLF, but the added records of MSDCF-MSLF or MMDCF-MSLF is less than those of MBF-MSLF. In addition, with the increasing of data volume, the CPU runtime of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF or MMDCF-MSLF increases faster than the that of MBF-MSLF according to Figure 21.
The suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF is 0 when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000. And Figures 22  and 23 are the additional information loss and the CPU runtime of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF when the number of sensitive attributes is changed from 2 to 5 and the number of records is set to 2000, respectively.  With the increasing of sensitive attribute number, the suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF are all 0, but the additional information loss of MSDCF-MSLF or MMDCF-MSLF is first smaller than that of MBF-MSLF, and then larger than that MBF-MSLF according to With the increasing of sensitive attribute number, the suppression ratio of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF are all 0, but the additional information loss of MSDCF-MSLF or MMDCF-MSLF is first smaller than that of MBF-MSLF, and then larger than that MBF-MSLF according to Figure 22. This shows that all records can be grouped by using MBF-MSLF, MSDCF-MSLF or MMDCF-MSLF, but the added records of MSDCF-MSLF or MMDCF-MSLF is first less than those of MBF-MSLF, and then more than those of MBF-MSLF. In addition, with the increasing of sensitive attribute number, the CPU runtime of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF increase gradually, and the CPU runtime of MSDCF-MSLF or MMDCF-MSLF increases faster than the that of MBF-MSLF according to Figure 23.
From the above comparative analysis of MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF, their information loss tends to be stable with the increasing of data volume. Compared with MBF-MSLF, the runtime of MSDCF-MSLF or MMDCF-MSLF is only a small increase. Furthermore, when the number of sensitive attributes is small, the information loss of MSDCF-MSLF or MMDCF-MSLF is lower than that of MBF-MSLF, but the information loss of MSDCF-MSLF or MMDCF-MSLF is higher than that of MBF-MSLF with the increasing of sensitive attribute number.

Conclusions
In this paper, we first defined three security levels for different sensitive attribute values, and given an L sl -diversity model for multiple sensitive attributes. Further, then we proposed three specific greed algorithms based on the MBF, MSDCF and MMDCF algorithms and the MSLF greedy policy, named as MBF-MSLF, MSDCF-MSLF and MMDCF-MSLF, to form L sl -diversity groups for multiple sensitive attributes. When forming an L sl -diversity group for multiple sensitive attributes, the algorithms is to first select an unshielded non-empty d dimensional bucket with the maximal sensitive attribute security level and the largest bucket size (or the largest bucket selectivity), and extract a record from the bucket to add to the group and delete the record from the bucket. For this record, all buckets of some certain dimensions of the record are shielded when adding any record of these buckets to the group will destroy L sl -diversity for multiple sensitive attributes of the group. By repeating the above process, the L sl -diversity group for multiple sensitive attributes is formed. Following this, the shielding of each d dimensional bucket is removed, and the above grouping process is repeated until a complete L sl -diversity group for multiple sensitive attributes cannot be formed. For each remaining record, it is added to a formed L sl -diversity group for multiple sensitive attributes without destroying L sl -diversity for multiple sensitive attributes of the group. Finally, the records that cannot be added to any formed L sl -diversity group for multiple sensitive attributes will be suppressed in the published microdata.
The experimental results show that the algorithms can greatly reduce the information loss of the published microdata, but its runtime is only a small increase, when comparing with MBF, MSDCF and MMDCF. Their information loss tends to be stable with the increasing of data volume, like MBF, MSDCF and MMDCF. Further, they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number. Compared with MBF-MSLF, the runtime of MSDCF-MSLF or MMDCF-MSLF is only a small increase. Further, when the number of sensitive attributes is small, the information loss of MSDCF-MSLF or MMDCF-MSLF is lower than that of MBF-MSLF, but the information loss of MSDCF-MSLF or MMDCF-MSLF is higher than that of MBF-MSLF with the increasing of sensitive attribute number. In this study, when there are more than two unshielded non-empty d dimensional buckets with the same maximal sensitive attribute security level and largest bucket size (or the largest bucket selectivity), we cannot know which bucket should be selected first, so we can only select one of these buckets at random. We will further introduce other security level greedy policies to solve this problem.