# θ-Sensitive k-Anonymity: An Anonymization Model for IoT based Electronic Health Records

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

## Abstract

**:**

^{+}-sensitive k-anonymity and balanced p

^{+}-sensitive k-anonymity for implementing privacy protection in EHR. However, these models have certain privacy vulnerabilities, which are identified in this paper with two new types of attack: the sensitive variance attack and categorical similarity attack. A mitigation solution, the $\theta $-sensitive k-anonymity privacy model, is proposed to prevent the mentioned attacks. The proposed model works effectively for all k-anonymous size groups and can prevent sensitive variance, categorical similarity, and homogeneity attacks by creating more diverse k-anonymous groups. Furthermore, we formally modeled and analyzed the base and the proposed privacy models to show the invalidation of the base and applicability of the proposed work. Experiments show that our proposed model outperforms the others in terms of privacy security (14.64%).

## 1. Introduction

- Identity disclosure prevention: Generalizing [7,8,9] the QI values of a group of records from more specific values to less specific values e.g., k-anonymity [7,8], where every record should be indistinguishable from at least k-1 other records. An individual having probability higher than 1/k cannot be re-identified by an intruder/attacker.

#### 1.1. Motivation

- p
^{+}-sensitive k-anonymity and (p, α)-sensitive k-anonymity [17]: This model is a modified version of the p-sensitive k-anonymity [19], for preventing a similarity attack. However, the p^{+}-sensitive k-anonymity and (p, α)-sensitive k-anonymity models have zero diversity at the ${\mathrm{A}}^{\mathrm{s}}$ category level, which may lead to a categorical similarity attack. A more powerful possible attack by an adversary is the sensitive variance attack, due to the low variability at ${\mathrm{A}}^{\mathrm{s}}$ category level. With an upsurge in the adversary’s knowledge (background knowledge—BK) the privacy level can be breached, which may cause attribute disclosures. The proposed $\theta $-sensitive k-anonymity privacy model provides a privacy solution to prevent all such attacks. - Balanced p
^{+}-sensitive k-anonymity and (p, α)-sensitive k-anonymity [18]: This model is an enhanced version of p^{+}-sensitive k-anonymity model. It balances the categorical level sensitive attributes in each EC. However, it still has low diversity at ${\mathrm{the}\mathrm{A}}^{\mathrm{s}}$ category level and works only for more than three k-anonymous size ECs.

^{+}-sensitive k-anonymity and (p, α)-sensitive k-anonymity model [17], we propose the $\theta $-Sensitive k-anonymity privacy model in this paper. The categorical level similarity and small EC size problems in the balanced p

^{+}-sensitive k-anonymity and (p, α)-sensitive k-anonymity model [18] are also addressed by achieving a more balanced and diverse EC even at the category level and its execution on small k size EC, i.e., k = 2.

#### 1.2. Contributions

- A new $\theta $-sensitive k-anonymity privacy model is proposed where privacy in an EC is achieved through a threshold value, i.e., $\theta $. The $\theta $ value for an EC is obtained by multiplying variance and an observation value. The variance-based diversity in an EC prevents the sensitive variance attack, which automatically prevents the categorical similarity attack. In the proposed model, the ${\mathrm{A}}^{\mathrm{s}}$ values checking is not only performed with next ECs, but a cross check is also performed during the last EC. If the required privacy is not achievable with the existing ${\mathrm{A}}^{\mathrm{s}}$ values, then a noise is added for the required diversity.
- We formally modeled and analyzed the base model in [17] and the proposed $\theta $-sensitive k-anonymity privacy model using High Level Petri-Nets (HLPN).
- Based on the above points, simulation results show that our proposed $\theta $-sensitive k-anonymity model has only 0.002679% higher privacy leakage than its counterpart p
^{+}-sensitive k-anonymity model which has 14.65% higher privacy leakage with the base line privacy.

^{+}-sensitive k-anonymity along with its formal analysis are presented in Section 4. Section 5 discusses the proposed $\theta $-sensitive k-anonymity model and its formal analysis. In Section 6, the experiments and evaluations are provided. Section 7 concludes the paper.

## 2. Related Work

## 3. Preliminaries

**Definition**

**1.**

_{1},A

_{2}, …, A

_{n}) in a masked microdata table T’ is said to be k-anonymous if and only if, for any combination ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}\times $ t(${\mathrm{A}}_{\mathrm{in}}^{\mathrm{qi}}$) values from start to end, is greater than or equal to k in R.

**Definition**

**2.**

**Definition**

**3.**

**Definition**

**4.**

**Definition**

**5.**

**Definition**

**6.**

**Definition**

**7.**

^{+}-sensitive k-anonymity model, to highlight its shortcomings concerning sensitive variance or an S-Variance attack.

## 4. Problem Statement

^{+}-sensitive k-anonymity and $(p,\alpha )$-sensitive k-anonymity models [17], respectively.

**Definition**

**8.**

^{+}-sensitive k-anonymity [17]: A masked microdata T’, fulfills k-anonymity and for each ${\mathrm{A}}^{\mathrm{s}}$ value belongs to distinct categories must be equal to or greater than p for each EC in T’.

^{+}-sensitive k-anonymity model in which p = 2, k = 4 and c = 2. The ECs column in Table 4a is not part of a published table.

**Definition**

**9.**

^{+}-sensitive k-anonymous approach have categorical similarity and sensitive variance attacks and are explained in Table 5. Table 5 shows the variance calculation for these ECs, where a high variance for more diverse EC2 and small variance for less diverse EC3 can be seen.

#### Critical Review of p^{+}-Sensitive k-Anonymity Model

^{+}-sensitive k-anonymity algorithm to check its invalidation concerning a sensitive variance attack. The detail formal verification of the working of p

^{+}-sensitive k-anonymity privacy model along with its properties is given in [18] from Rule 1 to Rule 7, which gets original data input from the end-user and processes it. The sensitive variance attack over the p

^{+}-sensitive k-anonymity model is shown in Figure 1, where the arrow heads show the data flow. Table 6 shows variable types and their descriptions. The places $P$ and its description are shown in Table 7. The attacker model in Figure 1 consists of three entities: the end-user, the adversary, and the trusted data publisher.

^{+}-sensitive k-anonymity model is highly vulnerable against a sensitive variance attack. The main reason is the existence of non-diverse (low variance) ${\mathrm{A}}^{\mathrm{s}}$ values similar to ‘Flu’ in Table 4a and Table 4b, and ‘HIV’ in Table 4b. In Rule (1) through function $\mathrm{S}$-$\mathrm{Variance}\_\mathrm{Attck}()$, an adversary performs an attack on the released data using some external source of information, i.e., BK. In Rule (1):

## 5. The Proposed $\theta $-Sensitive k-Anonymity Privacy Model

#### 5.1. Threshold $\theta $-Sensitivity

#### 5.1.1. Variance (${\mathsf{\sigma}}^{2}$)

#### 5.1.2. Observation 1 ($\mathsf{\mu}$)

^{+}-Sensitive 4-anonymous Table 4a, EC2 variance = 1.25, and EC3 variance = 0.69. The difference is because of the duplicated sensitive value i.e., Flu, in EC3. We propose an efficient way of removing the frequency repetition of sensitive values to achieve a more diverse EC. For this, we calculated the $\theta $ value. For example, consider a fully diverse EC of size 4 with variance = 1.25 and multiply it with an observed value, ranges between 0.5 and 0.9. Since, 1.25 * 0.5 = 0.625 is less than 0.69 and 1.25 * 0.6 = 0.75, which is greater than 0.69. The difference between the two values i.e., 1.25 and 0.69, is because of only one duplicated value “Flu”. Thus, it depends on privacy requirements and the level of diversity we are interested to achieve. In this paper, we perform a very strict $\theta $ calculation to get fully diverse ECs. Therefore, for example in the implementation part of the proposed Algorithm 1, we multiply a variance of 4 size EC with an observed value µ= 0.6 to have a fully diverse EC. The same technique is applied to all other ECs as well. The $\theta $ obtained in this way in line 8 of the proposed Algorithm 1 in Section 5.2, is then checked in the conditional part at line 10 inside a loop to check all ECs concerning $\theta $ requirements.

**Definition**

**10.**

#### 5.2. The Proposed $\theta $-Sensitive k-Anonymity Algorithm

Algorithm 1: $\mathit{\theta}$-sensitive k-anonymity | ||

Input: Microdata Table (MT) | ||

Output: $:\theta -\mathrm{sensitive}k-\mathrm{anonymous}$ table (MMT) | ||

1 | $\mathrm{Procedure}:\theta -\mathrm{sensitive}k-\mathrm{anonymity}(\mathrm{MMT},\mathsf{\theta},\mathrm{k})$ | |

2 | $\mathrm{Let}k\subseteq \mathrm{MMT}$ | |

3 | $if|k|\ge 2$$then$ | |

4 | $\mathrm{Condition}=\mathrm{true}$; | |

5 | $foreach\mathrm{m}\mathrm{size}\mathrm{EC}\mathrm{in}{\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}:\{{\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}\times {\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}\}\in k$ do | ►${\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}$ set, consists of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}\&{\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ |

6 | ${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{i}}}\leftarrow \mathrm{Compute}\mathrm{vari}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{i}}}^{\mathrm{s}})$ | ►$\mathrm{vari}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{i}}}^{\mathrm{s}}$), calculate variance for each m size EC. |

7 | $endfor$ | |

8 | $\mathsf{\theta}\leftarrow {\mathsf{\mu}\ast \mathsf{\sigma}}^{2}$ | ►$\mathsf{\theta}$, required threshold |

9 | $foreach{\mathrm{m}\mathrm{size}\mathrm{EC}}_{\mathrm{i}}{\mathrm{in}\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}:\{{\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}\times {\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}\}\in kdo$ | ►${\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}$ set, consists of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}$ and ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ |

10 | $if{\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}\theta then$ | |

11 | ${\mathrm{EC}}_{\mathrm{b}}\leftarrow {\mathrm{EC}}_{\mathrm{c}}+1$ | |

12 | $if{\mathrm{EC}}_{\mathrm{n}}={\mathrm{EC}}_{\mathrm{c}}$ | |

13 | ${\mathrm{MS}}_{\mathrm{n}}\leftarrow \mathrm{Compute}\mathrm{mfsv}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{n}}}^{\mathrm{s}})$ | ►mfsv(), max frequent ${\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{n}}}^{\mathrm{s}}$ |

14 | ${\mathrm{MS}}_{\mathrm{n}-1}\leftarrow \mathrm{Compute}\mathrm{mfsv}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{n}-1}}^{\mathrm{s}})$ | ►mfsv(),max frequent ${\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{n}-1}}^{\mathrm{s}}$ |

15 | $\mathrm{notExist}\leftarrow \mathrm{crossCheck}({\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{n}}},{\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{n}-1}})$ | ►crossCheck(), check both side existence |

16 | $if\mathrm{notExist}$ | |

17 | $\mathrm{swap}({\mathrm{MS}}_{\mathrm{n}},{\mathrm{MS}}_{\mathrm{n}-1})$ | ►swap(), last and 2^{nd} last ECs MS values |

18 | $endif$ | |

19 | ${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{n}}}\leftarrow \mathrm{Compute}\mathrm{vari}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{n}}}^{\mathrm{s}})$ | |

20 | $if{\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{n}}}\theta $ | |

21 | $Break$ | |

22 | $\mathrm{jump}\mathrm{to}\mathrm{else}\mathrm{part}\mathrm{of}\mathrm{condition}\mathrm{line}43$ | |

23 | $else$ | |

24 | $Break$ | |

25 | $endif$ | |

26 | $else$ | |

27 | $for{\mathrm{EC}}_{\mathrm{b}}{\mathrm{till}\mathrm{EC}}_{\mathrm{n}}{\mathrm{in}\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}:\{{\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}\times {\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}\}\in \mathrm{K}$ | |

28 | $if{\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{b}}}\theta $ | |

29 | $Breakloop$ | |

30 | $endif$ | |

31 | $Breakloop$ | |

32 | $if{\mathrm{EC}}_{\mathrm{b}}=\mathrm{found}$ | |

33 | ${\mathrm{MS}}_{\mathrm{c}}\leftarrow \mathrm{Compute}\mathrm{mfsv}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{c}}}^{\mathrm{s}})$ | ►mfsv(), max frequency ${\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{c}}}^{\mathrm{s}}$ |

34 | ${\mathrm{MS}}_{\mathrm{b}}\leftarrow \mathrm{Compute}\mathrm{mfsv}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{b}}}^{\mathrm{s}})$ | ►mfsv(), max frequency ${\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{b}}}^{\mathrm{s}}$ |

35 | ${\mathrm{MS}}_{\mathrm{b}}\leftarrow \mathrm{backCheck}({\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{c}}},{\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{b}}})$ | ►backCheck() find MS value in ${\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{b}}}$, not exists ${\mathrm{in}\mathrm{MS}}_{{\mathrm{EC}}_{\mathrm{c}}}$ |

36 | ||

37 | $\mathrm{swap}({\mathrm{MS}}_{\mathrm{c}},{\mathrm{MS}}_{\mathrm{b}})$ | ►swap(), exchange MS values |

38 | ${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}\leftarrow \mathrm{Compute}\mathrm{vari}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{c}}}^{\mathrm{s}})$ | ►vari(), again compute variance |

39 | $if{\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}\mathsf{\theta}$ | |

40 | ${\mathrm{EC}}_{\mathrm{c}}+=1$ | |

41 | $endif$ | |

42 | $else$ | |

43 | $\mathrm{NS}\leftarrow \mathrm{Compute}\mathrm{addNoise}({\mathrm{A}}_{{\mathrm{EC}}_{\mathrm{c}}}^{\mathrm{s}})$ | ►addNoise(), until variance>$\mathsf{\theta}$ |

44 | $endif$ | |

45 | $endif$ | |

46 | $else$ | |

47 | ${\mathrm{EC}}_{\mathrm{c}}+=1$ | |

48 | $endif$ | |

49 | $endfor$ | |

50 | $else$ | |

51 | $\mathrm{Condition}=\mathrm{false};$ | |

52 | $endif$ |

^{+}-sensitive k-anonymity is prone to homogeneity, categorical similarity, and sensitive variance attacks, and Table 8a from $\theta $-sensitive k-anonymity secures the data from such attacks because of more diversity, even at the category level, i.e., the maximum value for category c is 4 through $\theta $-sensitive k-anonymity, where, for Table 4a, the maximum value for c is 2. Table 8a provides more protection against the categorical similarity attack. Further swapping of values is not possible in the last EC; thus, a single tuple is added as noise to increase the diversity and to prevent categorical similarity attack and sensitive variance attack. Such a small amount of noise does not highly affect the utility of the data. Table 4b is a base table to obtain Table 8b using the $\theta $-sensitive k-anonymity approach. Table 8b is also highly diverse at the categorical level and there are no repeated sensitive values. Thus, there is no need to add noise and to have a high value of variance. The anonymized data, both in Table 8a and Table 8b, obtained through the proposed $\theta $-sensitive k-anonymity algorithm, have no attribute disclosure risk and are defensive against homogeneity [11], categorical similarity, and sensitive variance attacks, and even secure from skewness attacks [12].

#### 5.3. Analysis of $\theta $-Sensitive k-Anonymity Model Using Formal Modeling and Analysis

^{+}-sensitive k-anonymity model. Therefore, the adversary did not get private information for the target individual and the attack results in a null value. In Rule (9):

## 6. Experimental Evaluation

^{+}-sensitive k-anonymity model are described. The proposed algorithm wisely diversified the A

^{S}values in a balanced way inside each EC without using the categorical approach. The utility and quality of the anonymized released data were checked with numerous quality measures.

#### 6.1. Experimental Setup

^{+}-sensitive k-anonymity model. The quality of the sanitized publicly released data was evaluated with four utility metrics: discernibility penalty (DCP) [18,38,39], normalized average QI-group (C

_{AVG}) [17,18,38], noise calculation, and query accuracy [18,33]. The execution time of both algorithms was analyzed at the end of the experiments.

#### 6.2. Discernibility Penalty (DCP)

^{+}-sensitive k-anonymity model generated groups based on p. It means the number of tuples can be greater than p in a k-anonymous class. Figure 3 shows the DCS score for $\theta $-sensitive k-anonymity, including a comparison with p

^{+}-sensitive and baseline. In comparison to p

^{+}-sensitivity, the DCP score, through the proposed $\theta $-sensitive k-anonymity algorithm, is almost equal to the baseline, which implies that the proposed model assigned an optimal penalty to each EC and produced an optimal DCP score. The magnified subplots in Figure 3 with k = 12 and k = 16 for $\theta $-sensitive k-anonymity shows the very minor difference with baseline. This minor difference can also be seen in Table 11, with an average DCP score of 47.2 or 0.002679% with a baseline obtained from the simulation while calculating the DCP for the anonymized dataset ${\mathrm{R}}^{\ast}$.

#### 6.3. Normalized Average (C_{AVG})

_{AVG}is another mathematically sound measurement that measures the quality of the sanitized data by the EC average size. It was proposed in [38] and applied in [17,18]. Below in Equation (3), C

_{AVG}can be calculated as

_{AVG}are inversely proportional. Low C

_{AVG}value indicates high information utility. The optimal goal is to have a minimum size of ECs in ${\mathrm{R}}^{\ast}$. Figure 4 shows C

_{AVG}for p

^{+}-sensitive k-anonymity and $\theta $-sensitive k-anonymity over k-anonymity. p

^{+}-sensitive has lower data utility over small k, where there is a high data utility for large k. The proposed technique has a very balanced and sustainable utility for each input value of k. Thus, the proposed $\theta $-sensitive k-anonymity model performs efficiently for all sizes of k, compared to the p

^{+}-sensitive k-anonymity model.

#### 6.4. Noise Addition

#### 6.5. Query Accuracy

^{th}QI. The SQLQuery in Equation (4) for the COUNT query will work as

^{+}-sensitive k-anonymity and $\theta $-sensitive k-anonymity using the query error rate for 1000 randomly generated aggregate queries. The error rate increases for the high value of k because of the high range in ${\mathrm{A}}^{\mathrm{qi}}$s. This selects a greater number of tuples than the original microdata and hence high error rate. In Figure 6b, it is depicted that the more we select tuples based on predicates, the higher the error rate will be in the anonymized data.

#### 6.6. Execution Time

^{+}-sensitive k-anonymity model and for the proposed $\theta $-sensitive k-anonymity model. The execution time for both of the algorithms increased with an increase in value of k because of the increase in ${\mathrm{A}}^{\mathrm{qi}}\mathrm{s}$ generalization range. Since we did not consider the sensitive values categorization, our approach took a small amount of time to execute as compared to its counterpart. In the $\theta $-sensitive k-anonymity model, a higher execution time for k = 10, k = 16 and k = 20 was because of the time taken to add more noise tuples to achieve the required diversity.

## 7. Conclusions

^{+}-sensitive k-anonymity model. The purpose was to prevent an attribute disclosure risk in anonymized data. The p

^{+}-sensitive k-anonymity model was considered to be vulnerable to a privacy breach from sensitive variance, categorical similarity, and homogeneity attacks. These attacks were mitigated by implementing the proposed $\theta $-sensitive k-anonymity privacy model using Equation (1). In the proposed solution, the threshold $\theta $ value decides the diversity level for each EC of the dataset. The vulnerabilities in the p

^{+}-sensitive k-anonymity model and the effectiveness of the proposed $\theta $-sensitive k-anonymity model were formally modeled through HLPN, which further ensures the validation of the proposed technique. The experimental work proved the privacy implementation and an improved utility of the released data using different mathematical measures. For future work consideration, the proposed algorithm can be extended to 1:M (single record having many attribute values) [40], to multiple sensitive attributes (MSA) [41,42,43], or can be modeled by considering the dynamic data set [44] approach.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Dang, L.M.; Piran, J.; Han, D.; Min, K.; Moon, H. A Survey on Internet of Things and Cloud Computing for Healthcare. Electronics
**2019**, 8, 768. [Google Scholar] [CrossRef][Green Version] - Sun, W.; Cai, Z.; Li, Y.; Liu, F.; Fang, S.; Wang, G. Security and Privacy in the Medical Internet of Things: A Review. Secur. Commun. Netw.
**2018**, 2018, 1–9. [Google Scholar] [CrossRef] - Baek, S.; Seo, S.-H.; Kim, S.J. Preserving Patient’s Anonymity for Mobile Healthcare System in IoT Environment. Int. J. Distrib. Sens. Netw.
**2016**, 12, 2171642. [Google Scholar] [CrossRef][Green Version] - Liu, F.; Li, T. A Clustering K-Anonymity Privacy-Preserving Method for Wearable IoT Devices. Secur. Commun. Netw.
**2018**, 2018, 1–8. [Google Scholar] [CrossRef][Green Version] - Wan, J.; Al-Awlaqi, M.A.A.H.; Li, M.; O’Grady, M.; Gu, X.; Wang, J.; Cao, N. Wearable IoT enabled real-time health monitoring system. EURASIP J. Wirel. Commun. Netw.
**2018**, 2018, 298. [Google Scholar] [CrossRef] - Al-Khafajiy, M.; Baker, T.; Chalmers, C.; Asim, M.; Kolivand, H.; Fahim, M.; Waraich, A. Remote health monitoring of elderly through wearable sensors. Multimed. Tools Appl.
**2019**, 78, 24681–24706. [Google Scholar] [CrossRef][Green Version] - Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst.
**2002**, 10, 557–570. [Google Scholar] [CrossRef][Green Version] - Sweeney, L. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst.
**2002**, 10, 571–588. [Google Scholar] [CrossRef] - Song, F.; Ma, T.; Tian, Y.; Al-Rodhaan, M. A New Method of Privacy Protection: Random k-Anonymous. IEEE Access
**2019**, 7, 75434–75445. [Google Scholar] [CrossRef] - Wang, J.; Du, K.; Luo, X.; Li, X. Two privacy-preserving approaches for data publishing with identity reservation. Knowl. Inf. Syst.
**2018**, 60, 1039–1080. [Google Scholar] [CrossRef][Green Version] - Amiri, F.; Yazdani, N.; Shakery, A.; Chinaei, A.H. Hierarchical anonymization algorithms against background knowledge attack in data releasing. Knowl. Based Syst.
**2016**, 101, 71–89. [Google Scholar] [CrossRef] - Yaseen, S.; Abbas, S.M.A.; Anjum, A.; Saba, T.; Khan, A.; Malik, S.U.R.; Ahmad, N.; Shahzad, B.; Bashir, A.K. Improved Generalization for Secure Data Publishing. IEEE Access
**2018**, 6, 27156–27165. [Google Scholar] [CrossRef] - Liu, X.; Deng, R.H.; Choo, K.K.R.; Weng, J. An efficient privacy preserving outsourced calculation tool kit with multiple keys. IEEE Trans. Inf. Forensics Secur.
**2016**, 11, 2401–2414. [Google Scholar] [CrossRef] - Michalas, A. The lord of the shares. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol, Cyprus, 8–12 April 2019; pp. 146–155. [Google Scholar] [CrossRef][Green Version]
- Machanavajjhala, A.; Gehrke, J.; Kifer, D.; Venkitasubramaniam, M. L-diversity: Privacy beyond k-anonymity. Int. Conf. Data Eng.
**2006**, 1, 24. [Google Scholar] [CrossRef][Green Version] - Li, N.; Li, T.; Venkatasubramanian, S. t-Closeness: Privacy beyond k-Anonymity and l-Diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; pp. 106–115. [Google Scholar]
- Sun, X.; Sun, L.; Wang, H. Extended k-anonymity models against sensitive attribute disclosure. Comput. Commun.
**2011**, 34, 526–535. [Google Scholar] [CrossRef] - Anjum, A.; Malik, S.U.R.; Choo, K.-K.R.; Khan, A.; Haroon, A.; Khan, S.; Khan, S.U.; Ahmad, N.; Raza, B. An efficient privacy mechanism for electronic health records. Comput. Secur.
**2018**, 72, 196–211. [Google Scholar] [CrossRef] - Campan, A.; Truta, T.M.; Cooper, N. p-sensitive k-anonymity with generalization constraints. Trans. Data Privacy
**2010**, 3, 65–89. [Google Scholar] - Al-Khafajiy, M.; Webster, L.; Baker, T.; Waraich, A. Towards fog driven IoT healthcare. In Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, Amman, Jordan, 26–27 June 2018; Volume 9, p. 9. [Google Scholar]
- Shahzad, A.; Lee, Y.S.; Lee, M.; Kim, Y.-G.; Xiong, N.N. Real-Time Cloud-Based Health Tracking and Monitoring System in Designed Boundary for Cardiology Patients. J. Sens.
**2018**, 2018, 1–15. [Google Scholar] [CrossRef] - Domingo-Ferrer, J.; Soria-Comas, J. From t-closeness to differential privacy and vice versa in data anonymization. Knowl. Based Syst.
**2015**, 74, 151–158. [Google Scholar] [CrossRef][Green Version] - Dwork, C. Differential privacy. In International Colloquium on Automata, Languages, and Programming; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Fung, B.C.; Wang, K.; Chen, R.; Yu, P.S. Privacy-preserving data publishing. ACM Comput. Surv.
**2010**, 42, 1–53. [Google Scholar] [CrossRef] - Xu, Y.; Ma, T.; Tang, M.; Tian, W. A Survey of Privacy Preserving Data Publishing using Generalization and Suppression. Appl. Math. Inf. Sci.
**2014**, 8, 1103–1116. [Google Scholar] [CrossRef][Green Version] - Torra, V. Transparency in Microaggregation; UNECE: Skovde, Sweden, 2015; pp. 1–8. Available online: http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A861563&dswid=-2982 (accessed on 25 August 2019).
- Panackal, J.J.; S.Pillai, A. Adaptive Utility-based Anonymization Model: Performance Evaluation on Big Data Sets. Procedia Comput. Sci.
**2015**, 50, 347–352. [Google Scholar] [CrossRef][Green Version] - Rahimi, M.; Bateni, M.; Mohammadinejad, H. Extended K-Anonymity Model for Privacy Preserving on Micro Data. Int. J. Comput. Netw. Inf. Secur.
**2015**, 7, 42–51. [Google Scholar] [CrossRef][Green Version] - Sowmiyaa, P.; Tamilarasu, P.; Kavitha, S.; Rekha, A.; Krishna, G.R. Privacy Preservation for Microdata by using k-Anonymity Algorthim. Int. J. Adv. Res. Comput. Commun. Eng.
**2015**, 4, 373–375. [Google Scholar] - Wong, C.; Li, J.; Fu, W.; Wang, K. (α,k)-Anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining ACM, Philadelphia, PA, USA, 20–23 August 2006; pp. 754–759. [Google Scholar]
- Zhang, Q.; Koudas, N.; Srivastava, D.; Yu, T. Aggregate Query Answering on Anonymized Tables. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Institute of Electrical and Electronics Engineers (IEEE), Istanbul, Turkey, 17–20 April 2007; pp. 116–125. [Google Scholar]
- Li, J.; Tao, Y.; Xiao, X. Preservation of proximity privacy in publishing numerical sensitive data. In Proceedings of the 2008 ACM SIGMOD International Conference, Association for Computing Machinery (ACM), Vancouver, BC, Canada, 9–12 June 2008; pp. 473–486. [Google Scholar] [CrossRef]
- Xiao, X.; Tao, Y. Personalized privacy preservation. In Proceedings of the 2006 ACM SIGMOD International Conference, Chicago, IL, USA, 27–29 June 2006; p. 229. [Google Scholar] [CrossRef][Green Version]
- Christen, P.; Vatsalan, D.; Fu, Z. Advanced Record Linkage Methods and Privacy Aspects for Population Reconstruction—A Survey and Case Studies. In Population Reconstruction; Springer: Berlin, Germany, 2015; pp. 87–110. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Rubner, Y.; Tomasi, C.; Guibas, L.J. The Earth Mover’s Distance as a Metric for Image Retrieval. Int. J. Comput. Vis.
**2000**, 40, 99–121. [Google Scholar] [CrossRef] - Ali, M.; Malik, S.U.R.; Khan, S.U. DaSCE: Data Security for Cloud Environment with Semi-Trusted Third Party. IEEE Trans. Cloud Comput.
**2015**, 5, 642–655. [Google Scholar] [CrossRef] - Bayardo, R.J.; Agrawal, R. Data Privacy through Optimal k-Anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), Tokyo, Japan, 5–8 April 2005; pp. 217–228. [Google Scholar]
- Lefevre, K.; DeWitt, D.; Ramakrishnan, R. Mondrian Multidimensional K-Anonymity. In Proceedings of the 22nd International Conference on Data Engineering, Atlanta, GA, USA, 3–8 April 2006; p. 25. [Google Scholar]
- Gong, Q.; Luo, J.; Yang, M.; Ni, W.; Li, X.-B. Anonymizing 1:M microdata with high utility. Knowl. Based Syst.
**2016**, 115, 15–26. [Google Scholar] [CrossRef][Green Version] - Wang, R.; Zhu, Y.; Chen, T.-S.; Chang, C.-C. Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness. J. Comput. Sci. Technol.
**2018**, 33, 1231–1242. [Google Scholar] [CrossRef] - Anjum, A.; Ahmad, N.; Malik, S.U.R.; Zubair, S.; Shahzad, B. An efficient approach for publishing microdata for multiple sensitive attributes. J. Supercomput.
**2018**, 74, 5127–5155. [Google Scholar] [CrossRef] - Khan, R.; Tao, X.; Anjum, A.; Sajjad, H.; Malik, S.U.R.; Khan, A.; Amiri, F. Privacy Preserving for Multiple Sensitive Attributes against Fingerprint Correlation Attack Satisfying c-Diversity. Wirel. Commun. Mob. Comput.
**2020**, 2020, 1–18. [Google Scholar] [CrossRef][Green Version] - Zhu, H.; Liang, H.B.; Zhao, L.; Peng, D.Y.; Xiong, L. τ-Safe (l,k)-Diversity Privacy Model for sequential publication with high utility. IEEE Access
**2019**, 7, 687–701. [Google Scholar] [CrossRef]

ID | Name | Age | Zip Code | Country | Disease |
---|---|---|---|---|---|

1 | JULIAN | 34 | 14247 | USA | HIV |

2 | KALEEM | 40 | 14208 | Pakistan | HIV |

3 | JOHANNA | 26 | 14205 | USA | Cancer |

4 | MICHAEL | 25 | 14242 | Canada | Cancer |

5 | JUDITH | 40 | 14054 | USA | Hepatitis |

6 | EVA | 48 | 13073 | Japan | Phthisis |

7 | HARIS | 45 | 14066 | Pakistan | Asthma |

8 | PAUL | 40 | 14063 | USA | Obesity |

9 | YIN LI | 40 | 14243 | China | Flu |

10 | BEVERLY | 37 | 14203 | Canada | Flu |

11 | DENISE | 36 | 14204 | Canada | Flu |

12 | JANETTE | 35 | 14247 | USA | Indigestion |

ID | Age | Zip Code | Country | Disease |
---|---|---|---|---|

1 | 34–40 | 14208-14247 | ** | HIV |

2 | 34–40 | 14208-14247 | ** | HIV |

3 | 25–26 | 14205-14242 | America | Cancer |

4 | 25–26 | 14205-14242 | America | Cancer |

5 | >= 40 | 14054-14063 | America | Hepatitis |

6 | >= 40 | 14054-14063 | America | Obesity |

7 | >= 40 | 13073-14066 | Asia | Asthma |

8 | >= 40 | 13073-14066 | Asia | Phthisis |

9 | 35–40 | 14243-14247 | ** | Flu |

10 | 35–40 | 14243-14247 | ** | Indigestion |

11 | 36–37 | 14203-14204 | America | Flu |

12 | 36–37 | 14203-14204 | America | Flu |

Symbol | Description | Symbol | Description |
---|---|---|---|

$\mathsf{M}\mathsf{T}$ | Microdata Table | ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}$ | Quasi identifier for i^{th} end user |

$\mathrm{MMT}$ | Micro Mask Table | ${\mathrm{A}}^{\mathrm{s}}$ | Sensitive Attributes |

A | Attributes in MT | ${\mathrm{A}}^{\mathrm{id}}$ | Identifier Attribute |

PD | Published Data | ${\mathrm{A}}_{\mathrm{ECc}}^{\mathrm{s}}$ | Sensitive value in an ${\mathrm{EC}}_{\mathrm{c}}$ |

$\mathrm{ECs}$ | Set of Equivalence classes | ${\mathrm{A}}_{\mathrm{ECn}}^{\mathrm{s}}$ | Sensitive value in an ${\mathrm{EC}}_{\mathrm{n}}$ |

${\mathrm{EC}}_{\mathrm{i}}$ | k-anonymous group of tuples with the combination of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}$ and ${\mathrm{A}}^{\mathrm{s}}$ | ${\mathrm{A}}_{\mathrm{ECn}-1}^{\mathrm{s}}$ | Sensitive value in an ${\mathrm{EC}}_{\mathrm{n}-1}$ |

${\mathrm{EC}}_{\mathrm{c}}$ | Equivalence Class current | ${\mathrm{A}}_{\mathrm{ECb}}^{\mathrm{s}}$ | Sensitive value in an ${\mathrm{EC}}_{\mathrm{b}}$ |

${\mathrm{EC}}_{\mathrm{b}}$ | Equivalence Class broken | $\mathrm{N}$ | Noise |

${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{i}}}$ | Variance for ${\mathrm{EC}}_{\mathrm{i}}$ | $\mathrm{M}$ | Total number of record in an EC |

${\mathrm{MS}}_{\mathrm{n}}$ | Max frequency of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ in an EC_{n} | ${\mathrm{MS}}_{\mathrm{c}}$ | Max frequency of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ in an EC_{c} |

${\mathrm{MS}}_{\mathrm{n}-1}$ | Max frequency of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ in an EC_{n-1} | ${\mathrm{MS}}_{\mathrm{b}}$ | Max frequency of ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{s}}$ in an EC_{b} |

$\mathrm{P}$ | Places used in formal modeling | ${\mathrm{G}}_{\mathrm{i}}^{\mathrm{qi}}$ | QI-group at index i |

$\phi $ | Data Types in formal modeling |

Category ID | Sensitive Values |
---|---|

1 | HIV, Cancer |

2 | Hepatitis, Phthisis |

3 | Asthma, Obesity |

4 | Indigestion, Flu |

ECs | ID | Age | Zip Code | Country | Disease |
---|---|---|---|---|---|

EC1 | 1 | =< 40 | 14204-14247 | America | HIV |

2 | =< 40 | 14204-14247 | America | Cancer | |

3 | =< 40 | 14204-14247 | America | Flu | |

4 | =< 40 | 14204-14247 | America | Indigestion | |

EC2 | 5 | >= 40 | 13073-14066 | **** | Hepatitis |

6 | >= 40 | 13073-14066 | **** | Phthisis | |

7 | >= 40 | 13073-14066 | **** | Asthma | |

8 | >= 40 | 13073-14066 | **** | Obesity | |

EC3 | 9 | =< 40 | 14203-14247 | **** | HIV |

10 | =< 40 | 14203-14247 | **** | Cancer | |

11 | =< 40 | 14203-14247 | **** | Flu | |

12 | =< 40 | 14203-14247 | **** | Flu |

ID | Age | Zip Code | Country | Disease |
---|---|---|---|---|

1 | =< 40 | 14205-14247 | **** | HIV |

2 | =< 40 | 14205-14247 | **** | HIV |

3 | =< 40 | 14205-14247 | **** | Cancer |

4 | =< 40 | 14205-14247 | **** | Flu |

5 | >= 40 | 13073-14066 | **** | Hepatitis |

6 | >= 40 | 13073-14066 | **** | Phthisis |

7 | >= 40 | 13073-14066 | **** | Asthma |

8 | >= 40 | 13073-14066 | **** | Obesity |

9 | =< 40 | 14203-14247 | America | Cancer |

10 | =< 40 | 14203-14247 | America | Flu |

11 | =< 40 | 14203-14247 | America | Flu |

12 | =< 40 | 14203-14247 | America | Indigestion |

**Table 5.**Variance calculation for different equivalence classes (ECs) in Table 4a.

EC2 | EC3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Sensitive Values | $x$ | $f$ | ${x}^{2}$ | $f\ast x$ | $f\ast {x}^{2}$ | Sensitive Values | $x$ | $f$ | ${x}^{2}$ | $f\ast x$ | $f\ast {x}^{2}$ |

Hepatitis | 1 | 1 | 1 | 1 | 1 | Flu | 1 | 2 | 1 | 2 | 2 |

Phthisis | 2 | 1 | 4 | 2 | 4 | Cancer | 2 | 1 | 4 | 2 | 4 |

Asthma | 3 | 1 | 9 | 3 | 9 | HIV | 3 | 1 | 9 | 3 | 9 |

Obesity | 4 | 1 | 16 | 4 | 16 | $N={\displaystyle \sum}f=4$ | $\sum}fx=7$ | $\sum}f{x}^{2}=15$ | |||

$N={\displaystyle \sum}f=4$ | $\sum}fx=10$ | $\sum}f{x}^{2}=30$ | |||||||||

Variance (${\sigma}^{2})$ | $\left(\frac{{\displaystyle \sum}{\mathrm{fX}}^{2}}{\mathrm{N}}-{(\frac{{\displaystyle \sum}\mathrm{fX}}{\mathrm{N}})}^{2}\right)=\left(\frac{30}{4}-{(\frac{10}{4})}^{2}\right)=1.25$ | Variance $({\sigma}^{2})$ | $\left(\frac{{\displaystyle \sum}{\mathrm{fX}}^{2}}{\mathrm{N}}-{(\frac{{\displaystyle \sum}\mathrm{fX}}{\mathrm{N}})}^{2}\right)=\left(\frac{15}{4}-{(\frac{7}{4})}^{2}\right)=0.69$ |

Data Types | Description |
---|---|

k | User input for k-anonymity |

p | p-sensitivity numeric value |

C | Distinct categories set |

Condition | Boolean value 1 or 0 |

S_{n} | Total distinct ${\mathrm{A}}^{\mathrm{s}}$ values |

C_{n} | Total distinct categories |

${\mathrm{A}}_{\mathrm{i}}^{\mathrm{si}}$ | Sensitive Attribute for i^{th} end user |

${\mathrm{A}}_{\mathrm{i}}^{\mathrm{id}}$ | Identifier attribute for i^{th} end user |

Places | Description |
---|---|

$\mathit{\phi}$(MT) | ℙ (${\mathrm{A}}^{\mathrm{qi}}$×${\mathrm{A}}^{\mathrm{s}}$×${\mathrm{A}}^{\mathrm{id}})$ |

$\mathit{\phi}$(MMT) | ℙ (${\mathrm{A}}^{\mathrm{qi}}$ × ${\mathrm{A}}^{\mathrm{s}}$ × k) |

$\mathit{\phi}$(KLevel) | ℙ (k) |

$\mathit{\phi}$(CondTF) | ℙ (Condition) |

$\mathit{\phi}$(Gi) | ℙ (${\mathrm{A}}^{\mathrm{qi}}$ × ${\mathrm{A}}^{\mathrm{s}}$ × k) |

$\mathit{\phi}$(ds) | ℙ (${\mathrm{A}}^{\mathrm{s}})$ |

$\mathit{\phi}$(CountDs) | ℙ (S_{n}) |

$\mathit{\phi}$(${\mathbf{Gi}}^{\prime}$) | ℙ (${\mathrm{A}}^{\mathrm{qi}}$× ${\mathrm{A}}^{\mathrm{s}}$ × k× C) |

$\mathit{\phi}$(PLevel) | ℙ (p) |

$\mathit{\phi}$(CompC) | ℙ (${\mathrm{C}}_{\mathrm{n}}$) |

$\mathit{\phi}$(Publish Data) | ℙ (${\mathrm{A}}^{\mathrm{qi}}$ × ${\mathrm{A}}^{\mathrm{s}}$) |

$\mathit{\phi}$(BK) | ℙ (${\mathrm{A}}^{\mathrm{id}}$ × ${\mathrm{A}}^{\mathrm{qi}})$ |

$\mathit{\phi}$(SA Disc) | ℙ (${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}$×${\mathrm{A}}_{\mathrm{i}}^{\mathrm{si}}$ ×${\mathrm{A}}_{\mathrm{i}}^{\mathrm{id}})$ |

ID | Age | Zip Code | Country | Disease |
---|---|---|---|---|

1 | =< 40 | 14054-14247 | America | HIV |

2 | =< 40 | 14054-14247 | America | Cancer |

3 | =< 40 | 14054-14247 | America | Hepatitis |

4 | =< 40 | 14054-14247 | America | Obesity |

5 | >= 40 | 13073-14243 | Asia | HIV |

6 | >= 40 | 13073-14243 | Asia | Phthisis |

7 | >= 40 | 13073-14243 | Asia | Asthma |

8 | >= 40 | 13073-14243 | Asia | Flu |

9 | =< 40 | 14063-14247 | America | Cancer |

10 | =< 40 | 14063-14247 | America | Flu |

11 | =<40 | 14063-14247 | America | Flu |

12 | =<40 | 14063-14247 | America | Indigestion |

13 | =<40 | 14063-14247 | America | Obesity |

ID | Age | Zip Code | Country | Disease |
---|---|---|---|---|

1 | =< 40 | 14054-14247 | America | Hepatitis |

2 | =< 40 | 14054-14247 | America | HIV |

3 | =< 40 | 14054-14247 | America | Cancer |

4 | =< 40 | 14054-14247 | America | Flu |

5 | >= 40 | 13073-14243 | Asia | HIV |

6 | >= 40 | 13073-14243 | Asia | Phthisis |

7 | >= 40 | 13073-14243 | Asia | Asthma |

8 | >= 40 | 13073-14243 | Asia | Flu |

9 | =< 40 | 14063-14247 | America | Cancer |

10 | =< 40 | 14063-14247 | America | Obesity |

11 | =< 40 | 14063-14247 | America | Flu |

12 | =< 40 | 14063-14247 | America | Indigestion |

Data Types | Descriptions |
---|---|

M | Size of an EC |

Condition | Boolean value 1 or 0 |

$\sigma $ | A float type value to define Sigma |

µ | A float type value to define Mu |

$\theta $ | A float type value to define Theta |

Found ${\mathrm{EC}}_{\mathrm{b}}$ | Equivalence class b when it is found |

${\mathrm{AdjEC}}_{\mathrm{c}}$ | Adjust Equivalence class c |

${\mathrm{AdjEC}}_{\mathrm{n}}$ | Adjust Equivalence class n |

${\mathrm{VarEC}}_{\mathrm{s}}$ | Variance of different Equivalence classes |

${\mathrm{VarAdjEC}}_{\mathrm{n}}$ | Adjust variance for Equivalence class n |

${\mathrm{VarAdjEC}}_{\mathrm{c}}$ | Adjust variance for Equivalence class c |

Places | Descriptions |
---|---|

$\mathit{\phi}$($\mathrm{MT}$) | ℙ (${\mathrm{A}}^{\mathrm{id}}\times {\mathrm{A}}^{\mathrm{qi}}\times {\mathrm{A}}^{\mathrm{s}}$) |

$\mathit{\phi}$($\mathrm{MMT}$) | ℙ (${\mathrm{EC}}_{\mathrm{c}}\times {\mathrm{EC}}_{\mathrm{b}}\times {\mathrm{EC}}_{\mathrm{n}}\times k)$ |

$\mathit{\phi}$($\mathrm{KValue}$) | ℙ (k) |

$\mathit{\phi}$($\mathrm{CondTF}$) | ℙ (Condition) |

$\mathit{\phi}$($\mathrm{Sigma}$) | ℙ ($\sigma $) |

$\mathit{\phi}$($\mathrm{Mu}$) | ℙ ($\mu $) |

$\mathit{\phi}$($\mathrm{Theta}$) | ℙ ($\theta $) |

$\mathit{\phi}$(${\mathrm{Found}\mathrm{EC}}_{\mathrm{b}}$) | ℙ (${\mathrm{EC}}_{\mathrm{b}}$) |

$\mathit{\phi}$(${\mathrm{VarEC}}_{\mathrm{s}}$) | ℙ (${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}\times {\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{b}}}\times {\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{n}}}$) |

$\mathit{\phi}$(${\mathrm{AdjEC}}_{\mathrm{c}}$) | ℙ (${\mathrm{EC}}_{\mathrm{c}}$) |

$\mathit{\phi}$(${\mathrm{AdjEC}}_{\mathrm{n}}$) | ℙ (${\mathrm{EC}}_{\mathrm{n}}$) |

$\mathit{\phi}$(${\mathrm{StrictEC}}_{\mathrm{n}-1}$) | ℙ (${\mathrm{EC}}_{\mathrm{n}-1}$) |

$\mathit{\phi}$(${\mathrm{VarAdjEC}}_{\mathrm{n}}$) | ℙ (${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{n}}}$) |

$\mathit{\phi}$(${\mathrm{VarAdjEC}}_{\mathrm{c}}$) | ℙ (${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}$) |

$\mathit{\phi}$($\mathrm{Need}\mathrm{Noise}$) | ℙ (${\mathrm{V}}_{{\mathrm{EC}}_{\mathrm{c}}}\times {\mathrm{A}}^{\mathrm{id}}\times {\mathrm{A}}^{\mathrm{qi}}\times {\mathrm{A}}^{\mathrm{s}}$) |

$\mathit{\phi}$($\mathrm{PublshdData}$) | ℙ (${\mathrm{A}}^{\mathrm{qi}}\times {\mathrm{A}}^{\mathrm{s}}$) |

$\mathit{\phi}$($\mathrm{BK}$) | ℙ (${\mathrm{A}}^{\mathrm{id}}\times {\mathrm{A}}^{\mathrm{qi}}$) |

$\mathit{\phi}$($\mathrm{SA}\mathrm{Disc}$) | ℙ (${\mathrm{A}}_{\mathrm{i}}^{\mathrm{qi}}$×${\mathrm{A}}_{\mathrm{i}}^{\mathrm{si}}$ × ${\mathrm{A}}_{\mathrm{i}}^{\mathrm{id}}$) |

k | Baseline | θ-Sensitive | p^{+}-Sensitive |
---|---|---|---|

2 | 320300 | 320303 | 571227 |

4 | 640600 | 640605 | 778626 |

6 | 960900 | 960912 | 1214207 |

8 | 1281200 | 1281215 | 1467959 |

10 | 1601500 | 1601520 | 1876310 |

12 | 1921800 | 1921824 | 2096543 |

14 | 2242100 | 2242145 | 2632775 |

16 | 2562400 | 2562470 | 3017773 |

18 | 2882700 | 2882812 | 3315591 |

20 | 3203000 | 3203166 | 3628936 |

Average Val. | 1761650 | 1761697.2 | 2059994.7 |

Diff. of θ and p^{+} avg. values with base avg. value | -- | 47.2 | 298344.7 |

Percent Closer to baseline | -- | 0.002679235 | 14.65 |

% diff. between θ and p^{+} | -- | 14.64 | -- |

This means that our proposed approach θ-sensitive, k-anonymity is 14.64% better than p^{+}-sensitive k-anonymity and 0.002679% closer to the baseline. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Khan, R.; Tao, X.; Anjum, A.; Kanwal, T.; Malik, S.u.R.; Khan, A.; Rehman, W.u.; Maple, C.
*θ*-Sensitive *k*-Anonymity: An Anonymization Model for IoT based Electronic Health Records. *Electronics* **2020**, *9*, 716.
https://doi.org/10.3390/electronics9050716

**AMA Style**

Khan R, Tao X, Anjum A, Kanwal T, Malik SuR, Khan A, Rehman Wu, Maple C.
*θ*-Sensitive *k*-Anonymity: An Anonymization Model for IoT based Electronic Health Records. *Electronics*. 2020; 9(5):716.
https://doi.org/10.3390/electronics9050716

**Chicago/Turabian Style**

Khan, Razaullah, Xiaofeng Tao, Adeel Anjum, Tehsin Kanwal, Saif ur Rehman Malik, Abid Khan, Waheed ur Rehman, and Carsten Maple.
2020. "*θ*-Sensitive *k*-Anonymity: An Anonymization Model for IoT based Electronic Health Records" *Electronics* 9, no. 5: 716.
https://doi.org/10.3390/electronics9050716