F-Classify: Fuzzy Rule Based Classification Method for Privacy Preservation of Multiple Sensitive Attributes

With the advent of smart health, smart cities, and smart grids, the amount of data has grown swiftly. When the collected data is published for valuable information mining, privacy turns out to be a key matter due to the presence of sensitive information. Such sensitive information comprises either a single sensitive attribute (an individual has only one sensitive attribute) or multiple sensitive attributes (an individual can have multiple sensitive attributes). Anonymization of data sets with multiple sensitive attributes presents some unique problems due to the correlation among these attributes. Artificial intelligence techniques can help the data publishers in anonymizing such data. To the best of our knowledge, no fuzzy logic-based privacy model has been proposed until now for privacy preservation of multiple sensitive attributes. In this paper, we propose a novel privacy preserving model F-Classify that uses fuzzy logic for the classification of quasi-identifier and multiple sensitive attributes. Classes are defined based on defined rules, and every tuple is assigned to its class according to attribute value. The working of the F-Classify Algorithm is also verified using HLPN. A wide range of experiments on healthcare data sets acknowledged that F-Classify surpasses its counterparts in terms of privacy and utility. Being based on artificial intelligence, it has a lower execution time than other approaches.


Introduction
In the digital era, data collection and storage for ultimate analysis are constantly expanding. The ownership of collected data allows data holders to utilize it for useful data mining. Given that data proprietors are not usually data professionals, collected data must be made accessible so that data analysts may use it. When data is shared for mutual benefit, individual privacy becomes a major concern. Individual privacy is compromised by the information set obtained, which comprises explicit identifiers, quasi-identifiers (QIs), sensitive attributes (SAs), and insensitive attributes. Personal identifiers, such as a name or a national identification number, are examples of explicit identifiers that are almost always re-identified. The privacy-preserving strategies presented in the literature [1][2][3] usually eliminated them from data sets. QIs are such attributes that, when combined, can assist to link a person to an externally available source, such as age, gender, and zip code. SAs contain sensitive information about a person, and their disclosure could significantly Scenario 1: Let us start with a scenario in which the adversary already knows something about their next-door neighbor Richard. The adversary knows Richard is a 26-yearold man who lives in the same neighborhood as him, therefore he also knows their zip code. He noticed that Richard has recently lost weight. With such background and demographic knowledge, the adversary identifies from Table 2 that Richard belongs to group 1, and then discovers that the only patient in group 1 who has lost weight has cancer. In this manner, privacy is compromised by exploiting some demographic and background knowledge. Scenario 2: In a different scenario, if the adversary from Table 2 knows Ana's diagnostic method is an ELISA test, the adversary will be able to easily determine that Ana has HIV. As a result, existing approaches for single sensitive attributes are insufficient when it comes to preserving privacy for multiple sensitive attributes.
Scenario 3: In the case of MSAs, the previously proposed techniques for MSAs [7,8] still have some limitations. Single dimensional generalization is used in proposed MSA approaches, and there is a trade-off between privacy disclosure and data utility. SLOMS [9] has a demographic knowledge attack and significant information loss, whereas SLASMA [7] has a privacy risk from a demographic knowledge attack as well as low data utility.
Scenario 4: The approach (p, k) angelization [8] is similar to the strategy angelization [10], except that an adversary uses background knowledge to obtain a single SA value for each attribute by iteratively intersecting MSAs in correlated buckets. Additionally, because (p, k) angelization is based on MSA weight computations, the algorithm is more complex and takes longer to complete..
In this article, a fuzzy logic [11] based approach is proposed to address the limitations of previously proposed techniques; it is multi-dimensional partitioning and a rule-based technique. To preserve privacy, it offers multi-dimensional partitioning for both QIs and SAs. In the literature, fuzzy-based techniques for privacy preservation are proposed in [12,13], but none of them include MSAs.
The initial step in the proposed approach is to apply fuzzy classification on QIs (Age-Zipcode) and generate classes. Classification is not limited to 2-anonymous (two tuples in one class) or 3-anonymous (three tuples in one class); instead, each class has a different number of tuples. For example, class q-C2 has five patients in Table 3a, whereas class q-C4 has only one. SAs are classified after QIs have been classified. In SA classification, the class containing one patient is merged with another class to make the classification at least 2-anonymous and to avoid identity disclosure. In Figures 1 and 2, Matlab [14] simulations of fuzzy logic membership functions (mfs) and rules assessment are shown. Table 3b shows the results of the fuzzy classification of three SAs.  As a final step, a permutation is used to generate anonymized data based on Table 3a-c.  Table 4b,c are anonymized tables for MSTs to be published after patient identities were  removed from Table 3b,c. To evaluate the privacy breach of anonymized index, Chest X-ray}. As a result, it would be difficult for an adversary to deduce a direct relationship between any information in a high-dimensional data set with only one attribute in one class.  The following are the main contributions of this paper: • The article presents a fuzzy logic classifier (F-Classify) based on artificial intelligence (AI). The suggested methodology classifies QIs and SAa using a single methodology, namely fuzzy classification, rather than utilizing two distinct approaches for QIs and SAa. Instead of fixed classes/buckets, variable numbers of classes/buckets (k is variable) are formed in the proposed methodology.
• The proposed algorithm is verified for correctness using higher-level Petri nets (HLPN). • The proposed F-classify approach is implemented in Python, and the results are compared to those obtained through (p-k) angelization. The results indicate that fuzzy classification (multi-dimensional partitioning) of correlated attributes increases data utility while permutation of multiple tables improves privacy. When compared to techniques that propose two different methods for QIs and SAs privacy, F-classify uses fuzzy logic for both QAs and SAs, resulting in minimal overhead.
In this article, the privacy preservation of MSAs is studied and proposed, the following section will highlight relevant work. Section 3 will discuss preliminaries and definitions, and Section 4 will describe the proposed work. Validation of the proposed approach is demonstrated using HLPN in Section 5. In Section 6, the findings and discussion are presented. The article concludes with Section 7.

Literature Review
This section summarizes the work that has been done so far in the area of privacypreserving data publishing of single and multiple sensitive attribute data sets. Many privacypreserving approaches have been proposed, using generalization, bucketization, or slicing techniques. K-anonymity [1,6], l-diversity [2], t-closeness [3] and many others [4,5,[15][16][17] used generalization to provide privacy. Work based on bucketization has been proposed in [18][19][20][21], while slicing is a relatively new and evolving approach, first proposed by Li et al. [22].
The majority of these approaches focused on a single sensitive attribute. In reality, the data could be comprised of MSAs. The preservation of MSA privacy is still in its early phases, and a wide range of anonymization models have been presented in this regard, using several methodologies. Slicing [9,22], a method for anonymizing MSAs, was first presented in [22] for anonymizing high dimensional data. Because generalization causes information loss, Susan et al. [7] suggested a privacy model SLAMSA that utilized anatomization with slicing to resolve the information loss issue. Some enhanced slicing models, such as suppression and mondrian slicing, have been introduced in [23]. To eliminate the co-relationship between MSAs, Ref. [9] proposed "SLOMS", which used slicing. Aside from slicing, numerous alternative approaches based on clustering and multi-sensitive bucketization have been proposed (MSB). For the privacy preservation of numerical MSAs, MSB-based approaches have been introduced [24], but these approaches have ignored textual data.
Another strategy (α, l) was proposed to meet the diversity of MSAs. Positive and negative disclosure risks are minimized in this technique by analyzing the correlation between MSAs [22,25,26]. (α, l) is also utilized in [27], together with anatomy, generalization, and suppression, which resulted in significant data loss. Ref. [28] introduces a rating approach for MSAs; it generalizes the sensitive attribute values, increasing information loss and hence decreasing utility. Another approach, Ref. [28], is based on rating, and the rating used could be compromised via association rules. The decomposition-based technique and its extension decomposition plus improved l-diversity for MSAs [29,30]. In [31], "ANGELMS" have been proposed to anonymize MSAs using vertical partitioning. Ref. [32] proposes a (p+)-sensitive and t-closeness model for MSAs that meets t-closeness requirements for the published table.
The privacy model (p, k) angelization [8] has some significant advantages over others, but it still has certain shortcomings because weights have been calculated and allocated to SAs based on interdependence and sensitivity of sensitive attributes. Weights for SAs cannot be calculated in every case, and weight calculation increases execution time. Khan et al. [33] in (p, k) angelization identify the fingerprint correlation attack and suggest an improved (c, k)-anonymization technique. The innovative KCi-slice [34] is a KC-slice model enhancement with better privacy and utility requirements. The author of [35] proposed multiple security levels for different SAs values. The proposed method claims more utility, but requires more time to execute.
Until now, we have only discussed MSAs with single record data sets. In the literature, there is also some work done in multiple records together with MSA data sets. Ref. [36] proposes the first privacy model for 1:M and MSAs, which evaluates the work of [8] for 1:M and MSAs-based privacy disclosures. Although the proposed approach provides good protection against adversarial attacks, it appears that efficiency can be improved. Recent work on adversarial attack identification in a balanced p sensitive k-anonymity based privacy model for 1:M and MSAs have been proposed. They presented 1:M MSA-(p, l)-diversity in [37] as an efficient, resilient, and utility aware privacy technique. Table 5 highlights some of the work proposed for MSAs.

Privacy Models
Evaluation Attacks Utility [22] Slicing It was intended for high dimensional data, but it has failed and has given original tuples when multiple tuples have identical SAs and QIDs.
Skewness, sensitivity, and similarity attacks Loss of information [7] Slicing and anatomization The proposed approach has a very complex solution. It publishes multiple tables, and also has greater execution time.

Demographic knowledge attack
Loss of information [38] Multiple column multiple attributes slicing The proposed approach is for the MSAs anonymization, and QIs are overlooked. In case of 1:M occurrence of record, it shows incorrect results.
Skewness attacks, similarity attacks, and sensitivity attacks Loss of information [9] SLOMS Proposed approach released several tables with information loss. The correlation among MSA was also removed in this approach.
Demographic knowledge attack Loss of information [24] Multi-sensitive bucketization with clustering The approach only worked with numerical data if the consequence suppression rate is low.
-Information loss is less [25] MSA(α,l) The approach used generalization with suppression and anatomy. It caused the utility to decrease.
-Loss of information [28] Rating SAs are generalized.
Association privacy attack Loss of information [29] Decomposition The proposed approach preserves privacy by assuring diversity in MSAs, as a consequence it activated information loss.

Similarity and skewness privacy attacks
Loss of information is high [30] Decomposition plus Noise is added in proposed method, resulting in loss of utility. Attribute and identity disclosure are also not prevented in this approach.

Similarity and skewness privacy attacks
Loss of information is high [31] ANGELMS There is a zero correlation between MSAs and QIDs in this approach, results in high information loss.
Sensitivity, similarity, and skewness privacy attacks Loss of information is high.

Privacy Models Evaluation Attacks Utility
[32] P+ sensitive t-closeness It assigns sensitivity level to each SA in such a way that each group contains at least p-distinct sensitivity levels. It also generalizes the QIs.
-Loss of information [40] P-cover k-anonymity It generalizes QI values to ensure privacy, it also ensures the MSA P-diversity constraint between MSA. It avoids membership, identity and attribute disclosures.
Sensitivity, skewness, and similarity privacy attacks Loss of information.
[8] (p, k)-angelization The proposed approach preserves the privacy of MSAs using weight calculations. Weight calculation takes additional execution time and hence resulted into higher execution time.
-Loss of information.
To aggregate attributes based on QIs or MSAs, previously proposed models used techniques such as generalization, bucketization, and slicing. Information is lost while using the generalization approach provided in k-anonymity [1,2] since the records in one group are quite close to each other. Furthermore, because each attribute is generalized independently, there is no link between them. As a result, when analyzing the data, it is possible to find every potential combination of attributes. Despite the fact that bucketization has better utility than generalization [18][19][20][21], membership disclosure attacks are likely to occur because most bucketized algorithms use the same QIs values as the original table. Slicing of the data set has mostly focused on horizontal and vertical slicing. Slicing is mostly used for sensitive attributes, while QIs are either ignored or generalized via k-anonymity.
The basic terminologies and definitions used in this approach, as well as previously proposed approaches, will be highlighted in the following section.

Preliminaries
The techniques discussed earlier are based on single-dimensional generalization, and each technique provides a separate method for preserving QI and SA privacy. Single dimensional generalization is used to preserve QIs, while bucketization, slicing, and other techniques are used to preserve SAs. To maintain privacy, the approach used in this paper is based on fuzzy logic to provide multi-dimensional partitioning for both QIs and SAs. The basic terminologies and definitions used in this article and related articles are discussed in this section to help understand the presented methodology.

Notation
The data set is in the form of a table T with m data attributes and n tuples. The m data attributes are quasi-identifiers QI = {qi 1 , . . . , qi n } and sensitive attributes SA = {sa 1 , . . . , sa n }. Adversary commonly uses sensitive attributes to reveal private information about individuals, and QI attributes can be linked to any other external data set to identify individuals. Table 6 lists the other notations used in this paper.  [8] If any individual i is uniquely recognized in any group G with n tuples through QIs, it means that the attacker uses demographic knowledge (dk). The adversary's ability to find an individual [41] is facilitated by the individual's QI attributes. If the attacker can trace an individual's personal information via QIs, he is capable of launching a dk attack. Table T, a batch partitioning = {B 1 , B 2 . . . B g } and a bucket partitioning = {C 1 , C 2 , . . . C f } is given, a (p, k)-Angelization of Table T yields two  different tables, sensitive batch table (SBT), and generalized table (GT).

Fuzzification
The process of fuzzification is the transformation of a precise number into a fuzzier one. In this step, inputs are changed into linguistic variables that can be used with fuzzy sets. Below are definitions of linguistic variables, membership functions (mfs), and fuzzy sets.
Linguistic variable:Let Table T be the universe under consideration with m data attributes and n tuples. Table T has crisp data. The first step in mapping crisp data into fuzzy data is to define linguistic variables (lv). The lv is a data attribute with some values that can include QIs and SAs. In Table 1, QI age is a linguistic variable and it has some linguistic value, i.e., 27 years.
Membership function (mfs): The degree of membership is determined by the values of linguistic variables. The value assigned to attributes is called its degree. Mfs are determined based on degree of membership. Depending on the values of the linguistic variable, Mfs can be two, three, or four. For example, we can define two mfs for the linguistic variable age in Table 1 as µ 1 = (25-33 years) and µ 2 = (34-48 years).
Fuzzy sets: Mfs are used to generate fuzzy sets. The fuzzy sets include everything between completely false (0.0) and completely true (1.0). For example, the two mfs for age A = µ 1 , µ 2 form one fuzzy set. Assume that Table T is the universe under consideration and t is a specific element of T, then A is a fuzzy set defined on T and can be expressed as; Logical operations on fuzzy sets: The fuzzy set theory comprises the operations union, intersection, complement, and inclusion, just like classical set theory. In fuzzy logic, the various logical operations for compound statements in Equation (1) are considered as implications [12,32]. The implication used in the proposed approach is UNION or AND, which is defined as follows in terms of characteristic functions. (UNION) Union of two fuzzy sets A and B using (1) will be: Fuzzy Inference: The fuzzy relation R = A → B is used to represent a fuzzy rule. R can be considered as a two-dimensional membership function for a fuzzy set (1). By applying implication as a union using Equation (2) and employing IF-THEN rules, a fuzzy inference engine is formed. Possible rules are calculated based on linguistic variables and membership functions. Total rules can be calculated as (µ) lvs .
Defuzzification: Following the evaluation of the rules, the input fuzzy sets are defuzzified, resulting in a set of crisp output values. The center of gravity (CoG) defuzzification method is used to defuzzify the fuzzy system [42]. In CoG, values are taken from the inference engine and aggregated.

HLPN
HLPN is used to verify the correctness of an algorithm. The HLPN [43] is a 7-tuple with N = (P, T, F, ϕ, Rn, L, M 0 ). F, P, and T belong to a dynamic structure, whereas L, ϕ, and Rn reflect static semantics in the group of 7-tuples. The P represents a finite set of places, each of which represents a single part of the system. T denotes the set of finite transitions, where transitions represent the system's variations. Rn explains the transition rules, L signifies a label on F, and M 0 denotes the initial marking in the 7-tuple definition of HLPN.

Proposed Approach: F-Classify
Section 2 investigates previously proposed techniques for single sensitive attributes and multiple sensitive attributes. According to the findings, privacy breaches occur when single sensitive attribute-based approaches are applied to MSAs. Section 2 discusses and compares MSA-based techniques based on privacy violations and utility requirements. There appears to be a trade-off between privacy and utility in the majority of MSA-based approaches. To reduce such trade-offs, this article introduces F-Classify, an AIbased classification methodology for QIs and MSAs. F-Classify will publish several tables. One anonymized QIs table and numerous MSAs tables will be published. The number of MSAs tables is determined by the number of sensitive attributes in micro-data. In the following sections, we will go through how F-Classify works.

Linguistic Variables and Fuzzy Sets
The first step is to convert crisp input (m data attributes) to linguistic variables. The values for lvs (QIs and SAs) are specified first, and then the mfs are generated. For both numerical and categorical attributes, the criteria for defining mfs are different.

•
For numerical attributes sort the data in any order, then divide it into two/three/four (depending on mfs) equal lists. • For categorical attributes, select unique attribute values from the list. Then, for each distinct attribute, assign a random number between 0 and 1. After assigning a random number, divide the unique list into two/three/four equal lists using the same technique.
Let lv(q) denote the linguistic variable for QIs, and lv(sa) denote the linguistic variable for sensitive attributes. First, define mfs, then fuzzy sets for lv(q) and lv(sa). Equations (3) and (4) show fuzzy sets for linguistic variable QIs and MSAs, respectively.

Fuzzy Inference Rule-Based
Section 4.1 defined linguistic variables and membership functions; the next step is to define rules based on fuzzy sets and implications. The number of rules will be determined by the number of lvs and the number of mfs. For any, i and j and QIs A and B rules will be calculated in (6).
i f QI A is lv(q Ai ) and QI B is lv(q Bj ) For any, i, j, and k and SAs a, b, and c rules will be calculated in (7).
i f SA a is lv(sa ai ) and SA b is lv(sa bj ) and SA c is lv(sa ck )

Defuzzification
Defuzzification is performed on the output of evaluated rules. As a result of defuzzification, we only get one tuple at a time. The result of defuzzification of SAs in the provided example is shown in Figure 2. In Figure 2, the selected attribute physician is p1, the disease is d1, and the treatment is t1, therefore the defuzzification result is class C1.

Permutation
Rules are used to classify QIs and MSAs. We assign these classes to tuples in the data set, and we now have multiple tables based on the classes we defined. Values from the SAs table are permuted into the QIs table to generate the anonymized table. The formal modeling and analysis of the F-Classify algorithm will be explained in the next section.

Formal Modeling and Analysis
The design and working of the F-Classify algorithm are described in depth in Section 4. Here, F-Classify algorithm formal modeling and analysis is performed using HLPN [43].

F-Classify Algorithm
The F-Classify algorithm is explained in depth in the following sections. Two parts make up the algorithm. Fuzzification is the first step, while permutation is the second.
The F-classify Algorithm 1 starts with splitting the table into QIs and SAs attributes subsets. Furthermore, it generates tables (QT) and (STm) with new attribute classes and attributes of sub-set tables for every data sub-set. Data attributes are initialized and called linguistic variables (lv). Line 1-3 in Algorithm 1 defines mfs for every lv, mfs can be 2, 3, or 4 for every attribute. Line 4: loop from 1 to the number of subsets (k). Line 5: loop from 1 to no of rules (η). Line 6: making rules for data-set from 1 to total rules. Line 7: Assign output class to each rule. Line 10: loop from 1 to the number of tuples (n). Line 11: loop from 1 to the number of rules (η). Line 12: condition to check, tuple from Table T end for 9: end for 10: for i = 1 to n do 11: for j = 1 to α do 12: if (T(Tuple) ∈ (C[α])) then In the following part, we will go through formal modeling and analysis.

Formal Modeling and Analysis
We formally validate the working of the F-Classify algorithm along with its properties. To achieve the formal analysis, HLPN and Z3 languages are used. The HLPN model has been transformed into SMT-Lib [44] together with the correctness properties to illustrate the correctness of the F-Classify algorithm. Properties are then executed via the Z3 solver to verify their correctness. In HLPN, the algorithm was first presented in terms of its mathematical properties. These attributes are first translated into SMT-Lib to see if they are valid, and then they are executed through the Z3 solver. In [44], the formal definitions of SMT and Z3 solver are presented. The notations used in this section are represented in Table 6. Figure 3 depicts the HLPN for the F-Classify algorithm. Table 7 defines the variable types that were used and their explanations. The places and descriptions included in the HLPN F-Classify algorithm are shown in Table 8. The transitions have been labeled as Input in Figure 3. The first Input transition is a raw data table with m attributes and n records of patient electronic health records (EHRs) stored in the place DT. In the input transition, a raw data table with m attributes and n records is given, which is subsequently separated into different QI and SAs subsets using the Dsplit function. In Equation (8), the entire data split procedure is shown. All m attributes are then translated into linguistic variables lv m in (9).

Types Description
Tp m m tuples in Data   Tables   Table 8. Mapping of data types on places.

Types Description
All membership functions η are defined for linguistic variables m. Now rules are formed based on the combined values of linguistic variables and membership functions as η m and saved in Rules, then output classes are assigned to each specific rule in (10) and (12).
After that, each record in the data table is compared to check its class as depicted in (11), then we construct class-based quasi identifier table QcT (Table 3a) and multiple sensitive attribute tables ScT (Table 3b,c). Now, from the class-based quasi identifier table QcT, it is checked whether the corresponding class is present in the sensitive attribute table ScT. Quasi identifier table QcT is appended to each class of multiple sensitive attribute tables ScT and saved in Anonymize T. After that, each record in the data table is checked as it is present in which class in (11), then we construct a class-based quasi identifier table QcT (Table 3a) and multiple sensitive  attribute tables ScT (Table 3b,c).
It is then checked whether the matching class is present in which sensitive attribute table ScT, using the class-based quasi identifier table QcT. Each class of multiple sensitive attribute tables ScT has a quasi-identifier table QcT appended to it and saved in Anonymize T. Table 4a shows the anonymized form of the table. We publish anonymized QT (Table 4a) and MST (Table 4b,c) tables with multiple sensitive attributes. The above procedure is represented by the last transition Release Table, depicted in (13). Table)

R(Class
The working and properties of the F-Classify privacy-preserving model have been formally verified in this section. Multiple sensitive attributes are protected from membership, attribute, and identity disclosure while using a fuzzy logic-based classification methodology. In the following section, we will look at the performance of the proposed privacy model. [3], i21 [4]) Table)

Results and Discussion
This section compares and discusses the experimental results of the proposed methodology to those of previously suggested techniques.

Experimental Setup
Our model is implemented on an Intel Core i7-3520M computer with a 500 GB hard drive and 8 GB RAM, running Windows 7. Python is the programming language used in the implementation. The two data sets for experiments have been taken from the UCI repository http://archive.ics.uci.edu/ml/datasets/Heart+Disease (accessed on 10 June 2020) and https://archive.ics.uci.edu/ml/datasets/Adults (accessed on 10 August 2020). The data used are from the Hungarian Institute of Cardiology and Cleveland Clinic Foundation of Heart Disease and Adults. In the heart disease data set, there are almost 76 attributes, but we have used 14 attributes for experimental purposes. Furthermore, out of 14, 12 are SAs and 2 of them are QIs. The QIs in the data set are age and gender, and we have added another QI zipcode for experimental purposes. The 12 SAs used are cp (chest pain type), trestbps (resting blood pressure), chol (serum cholerstrol), fbs (fasting blood sugar), restecg (resting electrocardiographic results), thalach (maximum heart rate achieved), exang (exercise-induced angina), oldpeak (ST depression induced by exercise relative to rest), slope (the slope of the peak exercise ST segment), ca (number of major vessels), thal (type of defect) and num (diagnosis of heart disease). The value of cp is either 1, 2, 3, or 4 based on different pain conditions. The value of num is either 0 or 1, and the value of the slope is either 1 (upsloping), 2 (flat), or 3 (downsloping). We have used 284 records after removing missing values records. In the second adult data set a total of 40,000 tuples are taken for experimental purposes. The 12 attributes are selected for analysis, and out of 12, 9 are sensitive attributes and 3 are QIs (age, gender, and zip code). The comparative analysis based on different parameters has been done between the proposed model and (p, k) angelization.

Measurement of Privacy
We ca not quantify the privacy level or information leakage of any method scientifically, but we can calculate the probability of information leakage. In F-Classify, we have used fuzzy logic for classification or grouping of tuples so information leakage depends on membership functions defined for QIs and SA and class/group size. In F-Classify we have dynamic class/group size, so to measure privacy leakage we will sum the number of records in each group and then find the probability, P = 1 α × 100, where α is the sum of records based on QIs and sensitive attributes in each class/group.
In (14), k is the number of records in each class/group based on QI and is denoted by (k 1 , . . . , k i ), is the number of sensitive attributes in each class/group and i is the number of membership functions defined for QIs. We have multiple groups against one mf, therefore summation is used to add records from every group. For approach (p, k) angelization SA are not categorized, therefore the probability is based only on group size, as P = 1 k × 100, where k is a group size that is static and predefined. So this probability only depends on the number of records in one group. The likelihood of finding a record in the case of (p, k) angelization is substantially higher than in the case of F-Classify, as shown in Figure 4a. We have classification based on QI and SAs in F-Classify, therefore the chance of identifying a record in a data set is determined by the sum of records in each mf and the number of SA against each record. In (p, k) angelization, however, only the records in one group are considered. Furthermore, when the number of records in a data set increases, the group size gets bigger in F-Classify, and the likelihood of finding a record decreases, but this is not the case with (p, k) angelization, which has a fixed group size. The number of records in (p, k) angelization does not affect group size.

Discernibility Penalty
The Discernibility penalty (DCP) is a measure of indistinguishable records. Each record gets a penalty for being indistinguishable from other records, which is used to calculate DCP. The lower the DCP value, the more indistinguishable the records are from one another. If we have C indistinguishable classes, we can use (15) to determine the DCP.
We used fuzzy logic with multi-dimensional partitioning in our methodology, which fuzzifies and distinguishes records. We can observe in Figure 4b that DCP grows with increasing group size in the case of (p, k) angelization, but DCP remains the same in the case of F-Classify. We have different group sizes in one table in F-Classify, as seen in Table 3a. We have one tuple in one class/group and five tuples in the other class/group, giving us 1, 5, 6, and 1 group sizes in one table while maintaining DCP. As a result, DCP has a smaller value in F-Classify than (p, k) angelization, where DCP increases with the increase of group size.

Normalized Certainty Penalty (NCP)
To check the utility of proposed techniques, the idea of certainty penalty (CP) is proposed by Xu et al. [5]. CP is a utility-based metric, it is helpful to get information loss, and it also illustrates the significance of every attribute. Normalized certainty penalty (NCP) is calculated using (16).
where y and z are the range of tuples defined after classification of table T, and A is the attribute for which NCP is calculated. In Figure 5a, NCP is calculated for F-Classify and (p, k) angelization. NCP is calculated based on generalization steps in the case of (p, k) angelization and in F-Classify it is based on classification of attributes. Information loss is proportional to generalization; the more the generalization, the greater the information loss. The information loss in (p, k) angelization is more than in F-Classify, as seen in Figure 5a. When there are few sensitive attributes, information loss is greater in F-Classify, but it gradually decreases in comparison to (p, k) angelization as the number of sensitive attributes increases. NCP is quite low in F-Classify in the adults data set, as seen in Figure 5b. In terms of information loss, the suggested algorithm's low NCP shows that it performs well with larger data sets.

Query Error
Query Error is another way to assess the suggested F-Classify utility. Comparing anonymized datasets aggregated query results [45,46] is one way to determine the efficiency of privacy models. Query error is computed using Equation (17).
The actual query count is the result of the query executed on the original data-set T, whereas the estimated query count is the count obtained by anonymizing the dataset (T*). The query accuracy results are compared to the number of groups and query dimensionality.
For the Heart Disease and Adults data sets, relative query error is plotted against the number of groups in Figure 6a,b. As the group size in F-Classify is not fixed like it is in (p, k) angelization, the relative query error for different group sizes is almost the same in the proposed approach. Furthermore, in the suggested methodology, fuzzy classification is applied for both QIs and SAs, resulting in a relatively low relative query error as compared to (p, k) angelization. Although enhanced generalization is employed for QIs in (p, k) angelization, query error is still greater than in F-Classify.

Execution Time Analysis
Variable numbers of records and sensitive attributes are used to compare execution time. The execution time increases as the number of records increases. While the execution time for F-Classify increases slightly as the number of records increases, the execution time for (p, k) angelization increases rapidly as the number of records grows, as illustrated in Figure 7a. When contrast to F-classify, which classifies SAs using an AI algorithm, (p, k) angelization takes longer to execute since it employs a weight calculation technique for each SA. F-Classify has a slight variation in execution time with the increasing number of SAs shown in Figure 7b. The execution time of an adult data set with 40,000 records is presented in Figure 7c, and the pattern is the same: as the number of SAs increases, so does the execution time. In both small and large data sets, (p, k) angelization takes longer to execute than F-Classify.

Conclusions
This article introduces F-Classify, an AI-based technique for preserving the privacy of multiple sensitive attributes while publishing micro-data. The classification of SAs and QIs in F-Classify is done using rule-based fuzzy classification. Classifying QIs and SAs using fuzzy classification provides for multi-dimensional partitioning with minimal information loss. Attribute disclosure is prevented by classifying SAs into distinct classes so that instead of a fixed number of tuples in each group, each class has a varying number of tuples. Permuting the tables after multi-dimensional partitioning results in a higher level of privacy. Experiments on real-time data sets were conducted, and the results were compared in terms of Query Error, NCP, DCP, and execution. When comparing DCP to (p, k) angelization, it was observed that F-Classify had the lowest DCP value, indicating that records are indistinguishable. Query accuracy is used to assess utility, and it reveals that F-Classify has a lower query error than (p, k) angelization. In comparison to (p, k) angelization, an algorithm's execution time is quite minimal. We plan to extend this work in the future to include dynamic data publication. Multiple releases with insertion, deletion, and updating of records are a challenge in dynamic data publication.