# Hiding Sensitive Itemsets Using Sibling Itemset Constraints

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Preliminaries

- Modification of the database is minimized, such that originality of the database is kept as much as possible.
- All sensitive itemsets are hidden and do not appear in the sanitized database.
- Supersets of sensitive itemsets are also hidden and do not appear in the sanitized database. We know from the Apriori property that this goal is also accomplished if the first goal is achieved.
- All non-sensitive frequent itemsets appear in the sanitized database. If an itemset doesn’t appear in the new database, it is called a lost itemset.
- No new itemset appears in the sanitized database. Such itemsets are called ghost itemsets. However, approaches that delete items from the dataset naturally accomplish this goal, and no new itemset can be mined.

#### CSP Formulation

## 4. Itemset Hiding Using Sibling Itemsets Constraints

#### Illustrative Example

## 5. Experimental Analysis

#### 5.1. Evaluation Metrics of Itemset Hiding

#### 5.1.1. Hiding Failure

#### 5.1.2. Artifactual Patterns

#### 5.1.3. Dissimilarity

#### 5.1.4. Misses Cost

#### 5.2. Comparison

#### 5.3. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Feng, K.; Tian, J. Forecasting Reference Evapotranspiration Using Data Mining and Limited Climatic Data. Eur. J. Remote Sens.
**2021**, 54 (Suppl. 2), 363–371. [Google Scholar] [CrossRef] - Raja, K.; Patrick, M.; Gao, Y.; Madu, D.; Yang, Y.; Tsoi, L.C. A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries. Int. J. Genom.
**2017**, 2017, 6213474. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Neto, C.; Brito, M.; Lopes, V.; Peixoto, H.; Abelha, A.; Machado, J. Application of Data Mining for the Prediction of Mortality and Occurrence of Complications for Gastric Cancer Patients. Entropy
**2019**, 21, 1163. [Google Scholar] [CrossRef] [Green Version] - Hong, J.; Park, S. The Identification of Marketing Performance Using Text Mining of Airline Review Data. Mob. Inf. Syst.
**2019**, 2019, 1790429. [Google Scholar] [CrossRef] - Amanowicz, M.; Jankowski, D. Detection and Classification of Malicious Flows in Software-Defined Networks Using Data Mining Techniques. Sensors
**2021**, 21, 2972. [Google Scholar] [CrossRef] [PubMed] - Sánchez-Aguayo, M.; Urquiza-Aguiar, L.; Estrada-Jiménez, J. Predictive Fraud Analysis Applying the Fraud Triangle Theory through Data Mining Techniques. Appl. Sci.
**2022**, 12, 3382. [Google Scholar] [CrossRef] - Clifton, C. Privacy-Preserving Data Mining. In Encyclopedia of Database Systems; Liu, L., Özsu, M.T., Eds.; Springer: New York, NY, USA, 2018; pp. 2819–2821. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, W.; Zhang, Y. Privacy Preserving Association Rule Mining: Taxonomy, Techniques, and Metrics. IEEE Access
**2019**, 7, 45032–45047. [Google Scholar] [CrossRef] - Mendes, R.; Vilela, J.P. Privacy-Preserving Data Mining: Methods, Metrics, and Applications. IEEE Access
**2017**, 5, 10562–10582. [Google Scholar] [CrossRef] - Verykios, V.S.; Elmagarmid, A.K.; Bertino, E.; Saygin, Y.; Dasseni, E. Association Rule Hiding. IEEE Trans. Knowl. Data Eng.
**2004**, 16, 434–447. [Google Scholar] [CrossRef] - Association Rule Hiding for Data Mining; Advances in Database Systems; Springer: Boston, MA, USA, 2010; Volume 41.
- Atallah, M.; Bertino, E.; Elmagarmid, A.; Ibrahim, M.; Verykios, V. Disclosure Limitation of Sensitive Rules. In Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX’99), Chicago, IL, USA, 7 November 1999; pp. 45–52. [Google Scholar] [CrossRef] [Green Version]
- Saygin, Y.; Verykios, V.S.; Clifton, C. Using Unknowns to Prevent Discovery of Association Rules. ACM SIGMOD Rec.
**2001**, 30, 45. [Google Scholar] [CrossRef] - Lee, G.; Chang, C.-Y.; Chen, A.L.P. Hiding Sensitive Patterns in Association Rules Mining. In Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004, Hongkong, China, 28–30 September 2004; pp. 424–429. [Google Scholar] [CrossRef]
- Mannila, H.; Toivonen, H. Levelwise Search and Borders of Theories in KnowledgeDiscovery. Data Min. Knowl. Discov.
**1997**, 1, 241–258. [Google Scholar] [CrossRef] - Moustakides, G.V.; Verykios, V.S. A MaxMin Approach for Hiding Frequent Itemsets. Data Knowl. Eng.
**2008**, 65, 75–89. [Google Scholar] [CrossRef] [Green Version] - Sun, X.; Yu, P.S. Hiding Sensitive Frequent Itemsets by a Border-Based Approach. J. Comput. Sci. Eng.
**2007**, 1, 74–94. [Google Scholar] [CrossRef] [Green Version] - Quoc Le, H.; Arch-Int, S.; Arch-Int, N. Association Rule Hiding Based on Distance and Intersection Lattice. In International Conference on Software Technology and Engineering (ICSTE 2012); ASME Press: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
- Menon, S.; Sarkar, S.; Mukherjee, S. Maximizing Accuracy of Shared Databases When Concealing Sensitive Patterns. Inf. Syst. Res.
**2005**, 16, 256–270. [Google Scholar] [CrossRef] - Gkoulalas-Divanis, A.; Verykios, V.S. An Integer Programming Approach for Frequent Itemset Hiding. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, VA, USA, 6–11 November 2006; pp. 748–757. [Google Scholar] [CrossRef]
- Gkoulalas-Divanis, A.; Verykios, V.S. Hiding Sensitive Knowledge without Side Effects. Knowl. Inf. Syst.
**2009**, 20, 263–299. [Google Scholar] [CrossRef] - Ayav, T.; Ergenc, B. Full-Exact Approach for Frequent Itemset Hiding. Int. J. Data Warehous. Min.
**2015**, 11, 49–63. [Google Scholar] [CrossRef] [Green Version] - Lin, C.W.; Zhang, B.; Yang, K.T.; Hong, T.P. Efficiently Hiding Sensitive Itemsets with Transaction Deletion Based on Genetic Algorithms. Sci. World J.
**2014**, 2014, 398269. [Google Scholar] [CrossRef] - Lin, J.C.-W.; Liu, Q.; Fournier-Viger, P.; Hong, T.-P.; Voznak, M.; Zhan, J. A Sanitization Approach for Hiding Sensitive Itemsets Based on Particle Swarm Optimization. Eng. Appl. Artif. Intell.
**2016**, 53, 1–18. [Google Scholar] [CrossRef] - Bux, N.K.; Lu, M.; Wang, J.; Hussain, S.; Aljeroudi, Y. Efficient association rules hiding using genetic algorithms. Symmetry
**2018**, 10, 576. [Google Scholar] [CrossRef] [Green Version] - Agrawal, R.; Imieliński, T.; Swami, A. Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Rec.
**1993**, 22, 207–216. [Google Scholar] [CrossRef] - Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB ’94), Santiago, Chile, 12–15 September 1994; pp. 487–499. [Google Scholar]
- Zaki, M.J.; Parthasarathy, S.; Ogihara, M.; Li, W. New Algorithms for Fast Discovery of Association Rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 14–17 August 1997; AAAI Press: Palo Alto, CA, USA, 1997; pp. 283–286. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y. Mining Frequent Patterns without Candidate Generation. SIGMOD Rec.
**2000**, 29, 1–12. [Google Scholar] [CrossRef] - Bustio-Martínez, L.; Cumplido, R.; Hernández-León, R.; Bande-Serrano, J.M.; Feregrino-Uribe, C. On the Design of Hardware-Software Architectures for Frequent Itemsets Mining on Data Streams. J. Intell. Inf. Syst.
**2018**, 50, 415–440. [Google Scholar] [CrossRef] - Mahmood, S.; Shahbaz, M.; Guergachi, A. Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets. Sci. World J.
**2014**, 2014, 973750. [Google Scholar] [CrossRef] [PubMed] - Naulaerts, S.; Meysman, P.; Bittremieux, W.; Vu, T.N.; Berghe, W.V.; Goethals, B.; Laukens, K. A Primer to Frequent Itemset Mining for Bioinformatics. Brief. Bioinform.
**2015**, 16, 216–231. [Google Scholar] [CrossRef] [Green Version] - MiniZinc. Available online: https://www.minizinc.org/ (accessed on 16 June 2022).
- PyMzn—PyMzn Documentation. Available online: http://paolodragone.com/pymzn/ (accessed on 16 June 2022).
- FIMI. Frequent Itemset Mining Dataset Repository. Available online: http://fimi.uantwerpen.be/data/ (accessed on 16 June 2022).
- Borgelts, C. Christian Borgelt’s Web Pages. Available online: http://www.borgelt.net/fpm.html (accessed on 16 June 2022).

Notation | Description |
---|---|

$D$ | Dataset |

$~D$ | Intermediate form of dataset |

${D}^{s}$ | Sanitized dataset |

${T}_{i}$ | i-th transaction of dataset |

$I$ | Set of items |

$\sigma \left(X\right)$ | Support count of itemset X in dataset |

${\sigma}^{s}\left(X\right)$ | Support count of itemset X in sanitized dataset |

${\sigma}_{min}$ | Minimum support count threshold |

$S$ | Set of sensitive itemsets |

$Ss$ | Set of supersets of sensitive itemsets |

$F$ | Set of frequent itemsets in dataset |

${F}^{n}$ | Set of non-sensitive frequent itemsets in dataset |

${F}^{s}$ | Set of frequent itemsets in sanitized dataset |

${d}_{ij}$ | Item of dataset in bitmap notation at i-th row j-th column |

$~{d}_{ij}$ | Item of intermediate form of dataset in bitmap notation at i-th row j-th column |

${d}_{ij}^{s}$ | Item of sanitized dataset in bitmap notation at i-th row j-th column |

${u}_{ij}$ | Binary variable of intermediate form of dataset in bitmap notation at i-th row j-th column |

$SI$ | Set of sibling itemsets |

$SI\left(X\right)$ | Set of sibling itemsets of itemset X |

${r}_{Y}$ | Binary variable of constraint defined for itemset Y |

TID | Items |
---|---|

T_{1} | AC |

T_{2} | ACDE |

T_{3} | CD |

T_{4} | BE |

T_{5} | ACDE |

T_{6} | DE |

T_{7} | C |

T_{8} | AB |

T_{9} | AC |

T_{10} | CD |

A | B | C | D | E |
---|---|---|---|---|

1 | 0 | 1 | 0 | 0 |

1 | 0 | 1 | 1 | 1 |

0 | 0 | 1 | 1 | 0 |

0 | 1 | 0 | 0 | 1 |

1 | 0 | 1 | 1 | 1 |

0 | 0 | 0 | 1 | 1 |

0 | 0 | 1 | 0 | 0 |

1 | 1 | 0 | 0 | 0 |

1 | 0 | 1 | 0 | 0 |

0 | 0 | 1 | 1 | 0 |

A | B | C | D | E |
---|---|---|---|---|

1 | 0 | 1 | 0 | 0 |

1 | 0 | u_{2,3} | u_{2,4} | 1 |

0 | 0 | u_{3,3} | u_{3,4} | 0 |

0 | 1 | 0 | 0 | 1 |

1 | 0 | u_{5,3} | u_{5,4} | 1 |

0 | 0 | 0 | 1 | 1 |

0 | 0 | 1 | 0 | 0 |

1 | 1 | 0 | 0 | 0 |

1 | 0 | 1 | 0 | 0 |

0 | 0 | u_{10,3} | u_{10,4} | 0 |

A | B | C | D | E |
---|---|---|---|---|

1 | 0 | 1 | 0 | 0 |

1 | 0 | 1 | 1 | 1 |

0 | 0 | 0 | 1 | 0 |

0 | 1 | 0 | 0 | 1 |

1 | 0 | 1 | 0 | 1 |

0 | 0 | 0 | 1 | 1 |

0 | 0 | 1 | 0 | 0 |

1 | 1 | 0 | 0 | 0 |

1 | 0 | 1 | 0 | 0 |

0 | 0 | 0 | 1 | 0 |

Dataset Name | Number of Transactions (Count) | Average Transaction Length | Number of Items | Minimum Support Count | Number of Frequent Itemsets | Runtime (Seconds) |
---|---|---|---|---|---|---|

T10I4D100K | 100,000 | 10.10 | 870 | 500 (%0.5) | 1073 | 9.15 |

T40I10D100K | 100,000 | 39.60 | 942 | 500 (%0.5) | 1,286,037 | 392.96 |

Mushroom | 8124 | 23.00 | 119 | 406 (%5) | 3,755,704 | 9.77 |

retail | 88,162 | 10.30 | 16,470 | 440 (%0.5) | 581 | 2.00 |

BMS1 | 59,602 | 2.51 | 497 | 60 (%0.1) | 3991 | 0.88 |

BMS2 | 77,512 | 4.62 | 3,340 | 77 (%0.1) | 24,143 | 5.22 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/0 | 5.72 | 4.04 |

HS_2.2 | 0/0 | 6.75 | 5.2 |

HS_2.3 | 0/0 | 7.62 | 6.03 |

HS_3.1 | 0/0 | 6.78 | 5.54 |

HS_3.2 | 0/0 | 9.51 | 10.31 |

HS_4.1 | 0/0 | 9.01 | 10.5 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/1 | 13.64 | 13.31 |

HS_2.2 | 0/1 | 27.59 | 14.59 |

HS_2.3 | 0/2 | 62.49 | 22.88 |

HS_3.1 | 0/0 | 95.61 | 18.92 |

HS_3.2 | 0/0 | 1,110.97 | 31.32 |

HS_4.1 | 0/1 | 408.48 | 24.02 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/0 | 8.98 | 1.56 |

HS_2.2 | 0/0 | 18.45 | 2.6 |

HS_2.3 | 0/0 | 22.45 | 4.23 |

HS_3.1 | 0/0 | 23 | 2.67 |

HS_3.2 | 0/0 | 25.78 | 4.96 |

HS_4.1 | 0/0 | 17.44 | 6.11 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/0 | 1.07 | 1.06 |

HS_2.2 | 0/0 | 1.14 | 1.17 |

HS_2.3 | 0/1 | 1.18 | 1.18 |

HS_3.1 | 0/0 | 1.2 | 1.36 |

HS_3.2 | 0/1 | 1.47 | 1.39 |

HS_4.1 | 0/2 | 2.96 | 1.57 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/0 | 1.07 | 1.06 |

HS_2.2 | 0/0 | 1.14 | 1.17 |

HS_2.3 | 0/1 | 1.18 | 1.18 |

HS_3.1 | 0/0 | 1.2 | 1.36 |

HS_3.2 | 0/1 | 1.47 | 1.39 |

HS_4.1 | 0/2 | 2.96 | 1.57 |

Hiding Scenario | Number of Lost Itemsets (IPA/HISB) | Algorithm IPA (Seconds) | Algorithm HISB (Seconds) |
---|---|---|---|

HS_2.1 | 0/0 | 36.21 | 33.21 |

HS_2.2 | 0/0 | 38.59 | 38.54 |

HS_2.3 | 0/0 | 43.36 | 52.48 |

HS_3.1 | 0/0 | 39.15 | 34.89 |

HS_3.2 | 0/0 | 43.92 | 42.98 |

HS_4.1 | 0/0 | 45.11 | 44.52 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yildiz, B.; Kut, A.; Yilmaz, R.
Hiding Sensitive Itemsets Using Sibling Itemset Constraints. *Symmetry* **2022**, *14*, 1453.
https://doi.org/10.3390/sym14071453

**AMA Style**

Yildiz B, Kut A, Yilmaz R.
Hiding Sensitive Itemsets Using Sibling Itemset Constraints. *Symmetry*. 2022; 14(7):1453.
https://doi.org/10.3390/sym14071453

**Chicago/Turabian Style**

Yildiz, Baris, Alp Kut, and Reyat Yilmaz.
2022. "Hiding Sensitive Itemsets Using Sibling Itemset Constraints" *Symmetry* 14, no. 7: 1453.
https://doi.org/10.3390/sym14071453