# Data Collection Framework for Energy Efficient Privacy Preservation in Wireless Sensor Networks Having Many-to-Many Structures

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Motivation and Background

#### 2.1. Privacy Preserving Data Publishing vs. Privacy Preserving Data Mining

#### 2.2. Privacy Preserving Data Collection Models

_{1}R

_{2},···R

_{n}) but they are required to share privacy preserved data with data recipients. Data owners do not trust data recipient parties; however, they are required to fully trust data publishers in completing privacy preserving operations. Also data owners have to be sure that their data is not maliciously or unintentionally used for illegal duties by staff of data publishers.

_{1}, R′

_{2},···R′

_{3}) are collected by data collectors. Privacy preserved data mining methods use this model.

#### 2.3. Threat and Network Model

^{th}sink has a privacy level p

_{i}where each level requires to share k

_{i}-anonymous data with i

^{th}sink and inequality of k

_{1}< k

_{2}... < k

_{n}is valid.

_{n}-anonymity property. By eavesdropping on this data, attackers can obtain only k

_{n}-anonymous data.

#### 2.4. Our Contribution

## 3. Proposed Anonymization Method, Iterative k-Anonymous Clustering Method (Ik-ACM)

#### 3.1. Data Representation

_{ij}, represents the j’th attribute of the i’th record where, {i : 1 ≤ i ≤ r} and {j : 1 ≤ j ≤ m}. Table T is represented by a set of bit strings B, where B

_{ij}is bit string representation of j’th attribute of i’th record. k’th bit of B

_{ij}is shown as B

_{ij}(k). Suppose that j’th attribute of table is categorical and there are d

_{j}distinct values. These values are indexed by k and shown as V

_{j}(k) where {k : 1 ≤ k ≤ d

_{j}}. Bit string of this categorical attribute has a size of d

_{j}and formed as follows:

_{j}number of intervals. Each interval is indexed by k. Bit string representation of this numeric attribute has a size of d

_{j}and formed as follows:

#### 3.2. Information Loss Metric

_{ij}) = 0 are true, are excluded from the summation. C random variable can take values from the set {1..z}. Actually, B̄ is calculated for finding the value of this random variable.

_{ij}), is calculated as $p({B}_{\mathit{ij}})=\frac{1}{m.r}$. Equation 1 can be rewritten as follows:

_{ij}is F

_{ij}. Total number of elements in B̄

_{ij}(k) that has the value of $\frac{1}{{F}_{\mathit{ij}}}$ is equal to F

_{ij}, and the rest is zero. Therefore, the second sum operation of Equation 3 yields the value, $\text{log}\frac{1}{{F}_{\mathit{ij}}}$. The simplest equation for the information loss of data table T, IL(T), can be calculated as follows:

#### 3.3. Iterative Anonymization Model

_{th}sensor shares key with p

^{th}sink which is labelled as ${e}_{p}^{l}$. i

^{th}sink contains list of the keys of all sensors as ${e}_{i}^{l},{e}_{i+1}^{l},\dots ,{e}_{n-1}^{l}$ where 0 < l < m.

^{th}sensor which is selected as a group head sensor during WSN operation. In the first step, by using only generalization operation, input data is k1-anonymized. In the second step, k1-anonymized data is k2-anonymized by encrypting the chosen data parts with ${e}_{1}^{l}$. For each i

^{th}step to n

^{th}step, anonymization is done by encryption using key, ${e}_{i-1}^{l}$. The output after n

^{th}step is multicasted to all sinks.

^{th}group head sensor, each sink decrypts the data with their keys. The resulting data after decryption actually has the level of privacy required for that sink. i

^{th}sink can only decrypt the data which is encrypted after the i

^{th}iterations; because it has the corresponding keys. Data parts encrypted by the keys, ${e}_{1}^{l},{e}_{2}^{l},\dots ,{e}_{i-2}^{l}$, cannot be decrypted, therefore they can be considered as suppression operations for that sink. 1st sink, which has to get data with lowest privacy criteria, can decrypt all the encrypted parts and the result data is actually k

_{1}-anonymized. On the other hand, n

^{th}sink has no key and gathers data as kn-anonymized.

#### 3.4. Bottom-Up Hierarchical Clustering Process

^{th}input vector is T

_{i}. Each cluster is numerated as ${L}_{j}^{h}$ in each iteration, h, where j is the index number of cluster. Input vector set of cluster ${L}_{j}^{h}$ is represented by ${V}_{j}^{h}$, the number of input vectors belonging to that cluster is $\left|{V}_{j}^{h}\right|$, and representative vector is ${R}_{j}^{h}$. Suppose that k

^{th}data item of representative vector is denoted as ${R}_{j}^{h}[k]$. Representative vector is actually the anonymized output of input vectors belonging to that cluster which is formed by generalization and encryption operations of some data parts of vectors. Assume that D

^{h}is distance matrix of iteration h. It contains distances between each cluster pairs.

^{th}-anonymization step. In each iteration, by using the information loss metric described in Section 3.2, distances between each cluster are calculated. Distance between any two clusters is actually equal to the information loss that may occur if both clusters are merged. Two clusters having smallest distance, assume that clusters, ${L}_{s}^{h}$ and ${L}_{t}^{h}$, are chosen for merging. New bigger cluster, ${L}_{u}^{h+1}$ which contains the vector items of both clusters is formed and old two clusters are deleted. $\left|{V}_{u}^{h+1}\right|$ is equal to the sum of $\left|{V}_{s}^{h}\right|$ and $\left|{V}_{t}^{h}\right|$. For the first step of cluster operations (k1-anonymization stage), anonymity operation is generalization. In these generalization operations, ${R}_{u}^{h+1}[k]$, is equal to the XOR of ${R}_{s}^{h}[k]$ and ${R}_{t}^{h}[k]$. Encryption is used as an anonymity operation in all anonymization steps except k1-anonymization. Assume that two clusters, ${L}_{h}^{s}$, ${L}_{h}^{t}$ are chosen as the closest cluster pair at h

^{th}iteration of Ik-ACM and this iteration corresponds to (i + 1)

^{th}-anonymization step. The newly created cluster is labelled as ${L}_{h}^{\mathit{st}}$ and E

_{ei−1}(x) represents the encrypted output of input x with key e

_{i}

_{−1}. Formation of representative vector for newly created cluster, ${R}_{h}^{\mathit{st}}$, is given in Algorithm 2.

_{i}-anonymized output, clustering operations are completed until data is k

_{i}

_{+1}-anonymized. In the first iteration raw data is k

_{1}-anonymized by generalization operations. In the second one and the rest of all iterations data is anonymized to a higher level by encryption operations where different key is used in each iteration.

^{th}steps. The formation of output is given in fifth and sixth phases of main function given in Algorithm 1.

Function Cluster Combination |

Input : parameter, k, distance matrix, D^{h}, key,
${e}_{\mathit{step}\_\mathit{no}-1}^{l}$, anonymization step, step_no |

Output: New cluster,
${L}_{u}^{h+1}$, updated distance matrix, D^{h}^{+1} |

1. Find clusters,
${L}_{s}^{h}$,
${L}_{t}^{h}$, having minimum distance in distance matrix D^{h}; create a new cluster
${L}_{u}^{h+1}$ |

2. ${V}_{u}^{h+1}\leftarrow {V}_{s}^{h}\cup {V}_{t}^{h}$ |

3. $\left|{V}_{u}^{h+1}\right|=\left|{V}_{s}^{h}\right|+\left|{V}_{t}^{h}\right|$ |

4. If step_no = = 1 |

For each z^{th} bit string of representative vector,
${R}_{u}^{h+1}[z]={R}_{s}^{h}[z]$ OR${R}_{t}^{h}[z]$ |

else |

${R}_{u}^{h+1}$ ← Function Form_Encrypted_Representative_Vector (
${L}_{s}^{h}$,
${L}_{t}^{h}$, e_{step_no−1}) |

(Function is given in Algorithm 1.) |

5. Remove clusters, ${L}_{s}^{h}$, ${L}_{t}^{h}$ |

6. Find the distance of
${L}_{u}^{h+1}$ to other clusters, update D^{h}^{+1} |

Main Function Ik-ACM |

Input : Table, T, number of records, r, number of attributes, m, number of sinks, n, anonymization parameters k_{1}, k_{2},..., k_{n}, index of group head sensor, l |

Output: Anonymized table, Ik-ACM(T) |

Initialization |

1. h = 1 |

2. for all i where {i : 0 < i < r} |

2.1. Create cluster array, $\left\{{L}_{i}^{1}\right\}$ |

2.2. Add record, T_{i} to
${V}_{i}^{1}$ |

2.3. Set initial size of cluster, $\left|{V}_{i}^{1}\right|=1$ |

2.4. Initialize the representative vector, ${R}_{i}^{1}\leftarrow {T}_{i}$ |

2.5. Initialize the distance matrix D^{1} by using Equation 4 |

Iterative steps for multilevel k-anonymization |

3. step_no = 1 |

4. while step _no ≠ n |

4.1. while not for each cluster $\left|{V}_{i}^{h}\right|\ge {k}_{\mathit{step}\_no}$ |

4.1.1 Call Function ClusterCombination (k, D^{h},
${e}_{\mathit{step}\_\mathit{no}-1}^{l}$,step _no) |

4.1.2 h=h+1 |

4.2. step_no=step_no+1 |

Form the output of Ik-ACM |

5. Ik-ACM (T) is initialized to empty set |

6. for each cluster,
${L}_{s}^{h}$ in L^{h} where {s : 0 < s < |L^{h}|} |

6.1. Append ${R}_{s}^{h}$ and $\left|{V}_{s}^{h}\right|$ to Ik-ACM(T) |

Function Form_Encrypted_Representative_Vector |

Input : Representative vectors, representative vector,
${R}_{h}^{s}$, representative vector,
${R}_{h}^{t}$, key
${e}_{\mathit{step}\_\mathit{no}-1}^{l}$ |

Output:Representative vector of new cluster,
${R}_{h+1}^{u}$ |

For each attribute m |

if ${R}_{h}^{s}[m]={R}_{h}^{t}[m]$ then |

${R}_{h+1}^{u}[m]={R}_{h}^{t}[m]$ |

else |

${R}_{h+1}^{u}[m]={E}_{{e}_{\mathit{step}\_no-1}^{l}}({R}_{h}^{s}[m]||{R}_{h}^{t}[m])$ |

#### 3.5. Complexity Analysis of Ik-ACM

^{2}.m.V). Initially there are n clusters and at the end of k

_{n}-anonymization phase, the minimum number of clusters is n/k

_{n}. Therefore, n − n/k

_{n}cluster combination operation occurs. Cluster combination consists of finding the minimum distance in the distance matrix and matrix reorganizing so that the distance values of new cluster are added and distance values of previous clusters are removed. If binary heap structure is used for finding minimum distance, formation of initial min heap structure with n

^{2}elements is O(n

^{2}). In a heap, finding the minimum operation is O(1). However, removing distances of merged clusters from heap and adding the distances of new cluster to the heap need 2n deletion and n addition operations which cost O(n log(n)). Reorganization of distance matrix can be done in O(n.m.V) time sequentially with maintaining the heap. As a result, cost of each cluster combination operation is O(nlogn + nmV). Recall that maximum number of cluster combination operations is n − n/k

_{n}, the algorithm reaches to the end of k

_{n}-anonymization phase in O(n

^{2}logn + n

^{2}mV). Forming the output of Ik-ACM takes O(n / k

_{n}.m). Formation of output does not change the overall running time. Totally, Ik-ACM takes O(n

^{2}logn + n

^{2}. 2mV). m and V generally have lower values so they can be assumed as a constant factor. The running time can be fine-tuned to O(n

^{2}logn).

#### 3.6. Multicasting and Energy Saving

_{k}

_{1}and l

_{k}

_{2}, which are obtained by k-anonymization with only generalization operation for having k1-anonymized and k2-anonymized data, respectively. Data length of anonymous data generated by Ik-ACM is labelled as l

_{IkACM}. The number of hops in the shortest route from group head sensor, G, to Sink1 and Sink2 is represented as h

_{G,Sink}

_{1}, h

_{G,Sink}

_{2}respectively. Also assume that the hop distance between G and multicast point, M, is h

_{G,M}, distances from M to Sink1 and Sink2 are h

_{M,Sink}

_{1}and h

_{M,Sink}

_{2}, respectively. Unique anonymized data is sent to, M, if an appropriate node exists in the network which holds the Inequality 5. Among the possible node candidates, the one which minimizes the value of h

_{G,M}+ h

_{M,Sink}

_{1}+ h

_{M,Sink}

_{2}is chosen.

## 4. Performance Evaluation of Ik-ACM

_{multipath}. Energy consumption of method that uses multicasting when appropriate is represented as E

_{hybrid}. Energy saving ratio, ES, is computed as follows:

## 5. Related Work

## 6. Conclusions

## References

- Chan, H; Perrig, A. Security and privacy in sensor networks. Computer
**2003**, 36, 103–105. [Google Scholar] - Boyle, D; Newe, T. Security protocols for use with wireless sensor networks: A survey of security architectures. Proceedings of the Third International Conference on Wireless and Mobile Communications (ICWMC 2007), Guadeloupe, France, 4–9 March 2007.
- Xiao, Y; Rayi, VK; Sun, B; Du, X; Hu, F; Galloway, M. A survey of key management schemes in wireless sensor networks. Comput. Commun
**2007**, 30, 2314–2341. [Google Scholar] - Fung, BCM; Wang, K; Chen, R; Yu, PS. Privacy-preserving data publishing: A Survey on recent developments. ACM Comput. Surv
**2009**, 42, 1–53. [Google Scholar] - Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuziness, Knowl.-Based Syst
**2002**, 10, 557–570. [Google Scholar] - Pfitzmann, A; Khntopp, M. Anonymity, unobservability, and pseudonymity—A proposal for terminology. In Designing Privacy Enhancing Technologies; Springer: Berlin, Germany, 2001 2009/2001; pp. 1–9.
- Aggarwal, CC; Yu, PS. A general survey of privacy preserving data mining models and algorithms. Priv. Preserving Data Min. Models Algorithm
**2008**, 34, 11–52. [Google Scholar] - Gkoulalas-Divanis, A; Verykiosc, VS. An overview of rrivacy preserving data mining. Crossroads
**2009**, 15. [Google Scholar] [CrossRef] - Bahsi, H; Levi, A. k-Anonymity based framework for privacy preserving data collection in wireless sensor networks. Turk. J. Electr. Eng. Comput. Sci
**2010**, 18, 241–271. [Google Scholar] - Andritsos, P; Tzerpos, V. Software clustering based on information loss minimization. Proceedings of the 10th Working Conference on Reverse Engineering(WCRE’03), Victoria, Canada, 13–16 November 2003; IEEE Computer Society: Washington, DC, USA, 2003; p. 334. [Google Scholar]
- Abbasi, AA; Younis, M. A survey on clustering algorithms for wireless sensor networks. Comput. Commun
**2007**, 30, 2826–2841. [Google Scholar] - Carman, DW; Kruus, PS; Matt, BJ. Constraints and Approaches for Distributed Sensor Network Security; Technical Report 00-010; NAI Labs, The Security Research Division Network Associates, Inc: Los Angeles, CA, USA, 2000. [Google Scholar]
- Meyerson, A; Williams, R. On the complexity of optimal k-anonymity. Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, Paris, France, June 2004; pp. 223–228.
- Lefevre, K; Dewitt, DJ; Ramakrishnan, R. Incognito: Efficient full-domain k-anonymity. Proceedings of the 2005 ACM SIGMOD international conference on Management of data, Baltimore, MD, USA, 14–16 June 2005; ACM: New York, NY, USA, 2005; pp. 49–60. [Google Scholar]
- Samarati, P. Protecting respondents’ identities in microdata release. IEEE Trans. Know. Data Eng
**2005**, 13, 1010–1027. [Google Scholar] - Aggarwal, G; Feder, T; Kenthapadi, K; Motwani, R; Panigraphy, R; Thomas, D; Zhu, A. Anonymizing tables. In Database Theory - ICDT 2005; Springer: Berlin, Germany; Volume 3363/2005, pp. 246–258.
- Sweeney, L. Datafly: A System for providing anonymity in medical data. Proceedings of the IFIP TC11 WG113 Eleventh International Conference on Database Securty XI: Status and Prospects, Lake Tahoe, CA, USA, 10–13 August 1997; Chapman & Hall, Ltd: London, UK, 1997; pp. 356–381. [Google Scholar]
- Machanavajjhala, A; Gehrke, J; Kifer, D; Venkitasubramaniam, M. l-Diversity: Privacy beyond k-anonymity. Proceedings of the 22nd International Conference Data Engineering (ICDE), Atlanta, GA, USA, 3–7 April 2006.
- Truta, TM; Bindu, V. Privacy protection: p-Sensitive k-anonymity property. Proceedings of the 22th IEEE International Conference of Data Engineering (ICDE), Atlanta, GA, USA, 3–7 April 2006.
- Campan, A; Truta, TM. Extended p-sensitive k-anonymity. Stud Univ BabesBolyai Infor
**2006**, LI, 19–30. [Google Scholar] - Li, N; Li, T; Venkatasubramanian, S. t-Closeness: Privacy beyond k-anonymity and l-diversity. Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE07), Istanbul, Turkey, April 2007; Available on line: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.587 (accessed on 30 July 2010).
- Chaum, D. The dining cryptographers problem: Unconditional sender and receipent untraceability. J. Cryptol
**1988**, 1, 65–75. [Google Scholar] - Chaum, D. Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. Associations Comput. Mach
**1981**, 24, 84–88. [Google Scholar] - Gulcu, C; Tsudik, G. Mixing email with BABEL. Proceedings of 1996 Symposium on Network and Distributed System Security (SNDSS ’96), San Diego, CA, USA, 22–23 February 1996.
- Reiter, MK; Rubin, AD. Anonymous web transactions with crowds. Commun. ACM
**1999**, 42, 32–48. [Google Scholar] - Gruteser, M; Grunwald, D. Anonymous usage of location-based services through spatial and temporal cloaking. Proceedings of the 1st International Conference On Mobile Systems, Applications, Services (MobiSYS), San Franscisco, CA, USA, 5–8 May 2003; ACM: New York, NY, USA, 2003; pp. 31–42. [Google Scholar]
- Gruteser, M; Schelle, G; Jain, A; Han, R; Grundwald, D. Privacy-aware location sensor networks. Proceedings the 9th USENIX Workshop on Hot Topics in Operating Systems (HotOS), Lihue, HI, USA, 18–21 May 2003.
- Ozturk, C; Zhang, Y; Trappe, W. Source-location privacy in energy-constrained sensor network routing. Proceedings of the 2004 ACM Workshop on Security of Ad Hoc and Sensor Networks, Washington DC, USA, 25 October 2004; ACM: New York, NY, USA, 2004; pp. 88–93. [Google Scholar]
- Jian, Y; Chen, S; Zhang, Z; Zhang, L. Protecting receiver-location privacy in wireless sensor networks. Proceedings of the 26th Annual IEEE Conference on Computer Communications (IEEE INFOCOM 2007), Anchorage, AK, USA, 6–12 May 2007.
- Wadaa, A; Olariu, S; Wilson, L; Eltoweissy, M; Jones, K. On providing anonymity in wireless sensor networks. Proceedings of the Tenth International Conference on Parallel and Distributed Systems (ICPADS’04), Newport Beach, CA, USA, 7–9 July 2004; IEEE Computer Society: Washington, DC, USA, 2004. [Google Scholar]
- Gedik, B; Liu, L. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE Trans. Mob. Comput
**2008**, 7, 1–18. [Google Scholar]

Records | B_{i1} | B_{i2} | B_{i3} |
---|---|---|---|

T_{1} | 00010 | 01000 | 10000 |

T_{2} | 01100 | 11100 | 01111 |

Records | $\overline{{B}_{i1}}$ | $\overline{{B}_{i2}}$ | $\overline{{B}_{i3}}$ | |
---|---|---|---|---|

T_{1} | 00010 | 01000 | 10000 | |

T_{2} | 0 $\frac{1}{2}$$\frac{1}{2}$00 | $\frac{1}{3}$$\frac{1}{3}$$\frac{1}{3}$00 | 0 $\frac{1}{4}$$\frac{1}{4}$$\frac{1}{4}$$\frac{1}{4}$ |

Energy Consumption Ratios | Ratio Value |
---|---|

Transmission/Reception | 1.5 |

Transmission/Encryption | 2333.34 |

Encryption/Decryption | 1 |

Location of Sinks (coordinates) | Total Info Loss For (bits) | Number of Group Head Nodes Using Multicasting Method | Number of Group Head Nodes Using Multipathing Method | ES (%) |
---|---|---|---|---|

(0,0),(500,500) | 0.73 | 716 | 1784 | 3 |

(0,0),(500,0) | 0.79 | 1287 | 1213 | 6 |

(100,0),(400,0) | 0.84 | 1813 | 687 | 14 |

(150,0),(350,0) | 0.87 | 2137 | 363 | 22 |

(200,0),(300,0) | 0.90 | 2406 | 94 | 32 |

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Bahsi, H.; Levi, A.
Data Collection Framework for Energy Efficient Privacy Preservation in Wireless Sensor Networks Having Many-to-Many Structures. *Sensors* **2010**, *10*, 8375-8397.
https://doi.org/10.3390/s100908375

**AMA Style**

Bahsi H, Levi A.
Data Collection Framework for Energy Efficient Privacy Preservation in Wireless Sensor Networks Having Many-to-Many Structures. *Sensors*. 2010; 10(9):8375-8397.
https://doi.org/10.3390/s100908375

**Chicago/Turabian Style**

Bahsi, Hayretdin, and Albert Levi.
2010. "Data Collection Framework for Energy Efficient Privacy Preservation in Wireless Sensor Networks Having Many-to-Many Structures" *Sensors* 10, no. 9: 8375-8397.
https://doi.org/10.3390/s100908375