Privacy-Preserving Hierarchical Top-k Nearest Keyword Search on Graphs
Abstract
:1. Introduction
1.1. Background
1.2. Our Contribution
- We propose a hierarchical privacy-preserving top-k keyword search scheme (PH-kNK) on graphs which achieves hierarchical access control and privacy preserving.
- The scheme has fine-grained access control allowing users with certain security level can only access vertices with lower level.
- We analysis the proposed scheme and conduct experiments with real-world datasets. The result of the experiment shows the proposed scheme has higher efficiency compared to other existing solutions.
2. Related Works
3. Preliminaries
3.1. Pruned 2-Hop Labeling
3.2. Order-Preserving Encryption
3.3. Proxy Re-Encryption
4. Model and Definition
4.1. System Construction
4.1.1. Definition of Hierarchical Labeled Graph
4.1.2. Graph Encryption Scheme
4.2. Graph Encryption Scheme
- •
- : This algorithm generates a set of secret keys, using a security parameter as input. The result is a key set K.
- •
- : Given a graph G and its edges E, the algorithm constructs an encrypted index set I.
- •
- : This algorithm encrypts the index I produced by BuildIndex, using the key set K and outputs the search index .
- •
- : This algorithm takes the data owner’s public key K as input and produces a unique user key .
- •
- : This algorithm generates a query token T by using the user’s secret key and the query information S.
- •
- : This algorithm takes the query token T and the encrypted index as input, returning an encrypted query result .
- •
- : This algorithm decrypts the encrypted query result using the user’s key , producing the final result R.
4.3. Security Model
5. Algorithm Construction
5.1. Building Blocks
5.1.1. Pruned Landmark Labeling (PLL)
- PLL_indexgen() → search index: Generates an efficient search index for the graph.
- PLL_search(, , ) → dis: Calculates the shortest distance between two vertices in the graph.
5.1.2. Order-Preserving Encryption (OPE)
- OPE_Enc(K, d) → c: Encrypts the plaintext value d using the secret key K.
- OPE_Dec(K, c) → d: Decrypts the encrypted value c using the secret key K to retrieve the plaintext.
5.1.3. Proxy Re-Encryption (PRE)
- PRE_KeyGen() → keypair (pk, sk): Generates a pair of public and private keys.
- PRE_ReKeyGen(skA, pkB, N, t) → N fragments of the re-encryption key: Generates fragments of the re-encryption key, allowing re-encryption from Alice to Bob.
- PRE_Encapsulate(pkA) → K, capsule: Encapsulates a symmetric key and generates a capsule using Alice’s public key.
- PRE_Decapsulate(skA, capsule) → K or ⊥: Decrypts the capsule using Alice’s private key to retrieve the symmetric key.
- PRE_ReEncapsulate(kFrag, capsule) → cFrag: Re-encapsulates a fragment of the capsule using the re-encryption key fragment.
- PRE_DecapsulateFrags(skB, , capsule) → K or ⊥: Uses Bob’s private key to decrypt the capsule fragments and retrieve the symmetric key.
- PRE_Encrypt(K, M) → C: Encrypts a message M using the symmetric key K.
- PRE_Decrypt(K, C) → M or ⊥: Decrypts the ciphertext using the symmetric key and retrieves the original message.
5.2. KeyGen Algorithm
Algorithm 1 KeyGen |
Input: A security parameter Output: A set of secret keys K
|
5.3. Buildindex Algorithm
- Initialization of Index : is initialized by traversing the set W. Each keyword w in W serves as a key, while the associated value is an array of tuples where v represents vertices and l indicates levels containing the keyword w. Thus, the format of entries is .
- Initialization of Index : is initialized for each keyword by creating an array with levels from 0 up to l, associating each level l with an entry index v that corresponds to that level. The is structured as . To secure this structure, employs HMAC encryption on a constant c and entry v, which restricts backward decryption and prevents direct access to subsequent array elements.
- Initialization of Index : is constructed based on the 2-hop pruned algorithm to compute the shortest distance between two vertices in a query graph. In , each vertex in the graph is a key, with values represented as pairs where is a reachable vertex and d denotes the shortest distance from to . This structure allows efficient querying of shortest paths in the graph.
Algorithm 2 BuildIndex |
Input: A graph G Output: A set of indexs I
|
5.4. Encryptindex Algorithm
Algorithm 3 EncryptIndex |
|
5.5. KeyAssign Algorithm
Algorithm 4 KeyAssign |
Input: A data owner secret key and a user public key K Output: A user key set
|
5.6. Query Token Algorithm
Algorithm 5 QueryToken |
|
5.7. Knk Search Algorithm
Algorithm 6 kNKSearch |
Input: A query token T and aet of encrypted indexs Output: A set of encrypted search results
|
5.8. Decrypt Algorithm
Algorithm 7 Decrypt |
Input: A user key set , owner’s publickey and query result Output: A decrypted set of query result R
|
- Step 1: User generates a query token T = F(), SE.Enc OPE.Enc and send it to server.
- Step 2: Server uses F as a key to get 4 values OPE.Enc, OPE.Enc, OPE.Enc, OPE.Enc in the . The server uses the user level from the query token to compare with encrypted entries in the entry index. Leveraging the order-preserving encryption (OPE) property, it identifies the position where the previous entry is greater and the next entry is smaller than . It retrieves the tuple and extracts the value 1 to access the corresponding entry in the word index . It means server need to skip the first vertex has level higher than user level and get values from index 1 to the end of the list are then retrieved. Here includes the tuple .
- Step 3: Finally, is combined with from the query token and the query index as inputs to the function for further query processing. Server get the distance between and from query index. The queryindex shows that the distance of to is 1 and it can be known that the distance of to and is also 1. Therefore, distance between and is = 2 and the final query result of nearest top-2 is . Then user decrypt the result and finally get .
6. Analysis and Experiment
6.1. Security Analysis
- selects graphs and with identical condition: (1) same number of vertices ; (2) same number of edges ; (3) same number of keyword and number of vertices containing same keyword ; (4) same number of security level and number of vertices containing same level .
- The challenger encrypts () and sends to .
- adaptively queries and receives tokens/results.
- outputs guess . The experiment succeeds if .
- Simulator uses and to generate simulated index and tokens .
- replaces cryptographic primitives with random sampling.
6.2. Theoretical Analysis
6.3. Experimental Evaluation
- RAM: 16 G
- operating system: Windows 11
- CPU: 11th Gen Intel(R)Core(TM) i5-11400H @2.70 GHZ
- language: python
- library: 1. The hashlib library to construct a pseudorandom function. 2. The pyope library was employed for order-preserving encryption (OPE). 3. umbral library based on OpenSSL was used for proxy re-encryption. 4. The PrunedLandmarkLabeling(PLL) algorithm was employed to obtain the shortest distances in the graph.
- experiment parameter: 1. we used a 128-bit security parameter. 2. we set 1 to 10 security levels. 3. we set 10 to 10,000 keyword frequency.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tao, Y.; Papadopoulos, S.; Sheng, C.; Stefanidis, K. Nearest keyword search in xml documents. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 12–16 June 2011; pp. 589–600. [Google Scholar]
- Wang, B.; Yu, S.; Lou, W.; Hou, Y.T. Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 2112–2120. [Google Scholar]
- Qiao, M.; Qin, L.; Cheng, H.; Yu, J.X.; Tian, W. Top-k nearest keyword search on large graphs. Proc. VLDB Endow. 2013, 6, 901–912. [Google Scholar] [CrossRef]
- Cao, N.; Yang, Z.; Wang, C.; Ren, K.; Lou, W. Privacy-preserving query over encrypted graph-structured data in cloud computing. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems, Minneapolis, MN, USA, 20–24 June 2011; pp. 393–402. [Google Scholar]
- Jiang, M.; Fu, A.W.C.; Wong, R.C.W. Exact top-k nearest keyword search in large networks. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; pp. 393–404. [Google Scholar]
- Amorim, I.; Costa, I. Leveraging Searchable Encryption through Homomorphic Encryption: A Comprehensive Analysis. Mathematics 2023, 11, 2948. [Google Scholar] [CrossRef]
- Amorim, I.; Costa, I. Homomorphic Encryption: An Analysis of its Applications in Searchable Encryption. arXiv 2023, arXiv:2306.14407. [Google Scholar]
- Gui, Z.; Paterson, K.G.; Patranabis, S. Rethinking searchable symmetric encryption. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; pp. 1401–1418. [Google Scholar]
- Noorallahzadeh, M.; Alimoradi, R.; Gholami, A. Searchable Encryption Taxonomy: Survey. J. Appl. Secur. Res. 2023, 18, 880–924. [Google Scholar] [CrossRef]
- Zou, L.; Chen, L. Dominant graph: An efficient indexing structure to answer top-k queries. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 7–12 April 2008; pp. 536–545. [Google Scholar]
- Lysenko, A.; Roznovăţ, I.A.; Saqi, M.; Mazein, A.; Rawlings, C.J.; Auffray, C. Representing and querying disease networks using graph databases. BioData Min. 2016, 9, 1–19. [Google Scholar] [CrossRef] [PubMed]
- Ortega-Guzmán, V.H.; Gutiérrez-Preciado, L.; Cervantes, F.; Alcaraz-Mejia, M. A Methodology for Knowledge Discovery in Labeled and Heterogeneous Graphs. Appl. Sci. 2024, 14, 838. [Google Scholar] [CrossRef]
- Hertz, A.; Plumettaz, M.; Zufferey, N. Variable space search for graph coloring. Discret. Appl. Math. 2008, 156, 2551–2560. [Google Scholar] [CrossRef]
- Zou, L.; Chen, L.; Lu, Y. Top-K correlation sub-graph search in graph databases. In Database Systems for Advanced Applications, Proceedings of the 14th International Conference, DASFAA 2009, Brisbane, Australia, 21–23 April 2009; Proceedings 14; Springer: Berlin/Heidelberg, Germany, 2009; pp. 168–185. [Google Scholar]
- Yuan, Y.; Wang, G.; Chen, L.; Wang, H. Efficient keyword search on uncertain graph data. IEEE Trans. Knowl. Data Eng. 2013, 25, 2767–2779. [Google Scholar] [CrossRef]
- Teng, Y.; Cheng, X.; Su, S.; Bi, R. Privacy-preserving top-k nearest keyword search on outsourced graphs. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 815–822. [Google Scholar]
- Li, Y.; Zhou, F.; Ji, D.; Xu, Z. A Hierarchical Searchable Encryption Scheme Using Blockchain-Based Indexing. Electronics 2022, 11, 3832. [Google Scholar] [CrossRef]
- Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy. S&P 2000, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
- Chase, M.; Kamara, S. Structured encryption and controlled disclosure. In Advances in Cryptology-ASIACRYPT 2010, Proceedings of the 16th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, 5–9 December 2010; Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2010; pp. 577–594. [Google Scholar]
- Liu, X.; Yang, G.; Mu, Y.; Deng, R.H. Multi-user verifiable searchable symmetric encryption for cloud storage. IEEE Trans. Dependable Secur. Comput. 2018, 17, 1322–1332. [Google Scholar] [CrossRef]
- Shen, M.; Wang, M.; Xu, K.; Zhu, L. Privacy-preserving approximate top-K nearest keyword queries over encrypted graphs. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
- Cheng, J.; Zhang, Y.; Ye, Q.; Du, H. High-precision shortest distance estimation for large-scale social networks. In Proceedings of the IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 10–14 April 2016; pp. 1–9. [Google Scholar]
- Delling, D.; Goldberg, A.V.; Pajor, T.; Werneck, R.F. Robust distance queries on massive networks. In Algorithms-ESA 2014, Proceedings of the 22th Annual European Symposium, Wroclaw, Poland, 8–10 September 2014; Proceedings 21; Springer: Berlin/Heidelberg, Germany, 2014; pp. 321–333. [Google Scholar]
- Yang, J.; Yao, W.; Zhang, W. Keyword search on large graphs: A survey. Data Sci. Eng. 2021, 6, 142–162. [Google Scholar] [CrossRef]
- Li, P.; Zhou, F.; Xu, Z.; Li, Y.; Xu, J. Privacy-Preserving Top-K Nearest Keyword Search Queryies over Encrypted Graph Data. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22–24 October 2021; pp. 531–537. [Google Scholar]
- Sellami, S.; Zarour, N.E. Keyword-based faceted search interface for knowledge graph construction and exploration. Int. J. Web Inf. Syst. 2022, 18, 453–486. [Google Scholar] [CrossRef]
- Cozza, V. Towards a framework for graph-based keyword search over relational data. Int. J. Intell. Inf. Database Syst. 2022, 15, 183–198. [Google Scholar] [CrossRef]
- Akiba, T.; Iwata, Y.; Yoshida, Y. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 349–360. [Google Scholar]
- Boldyreva, A.; Chenette, N.; Lee, Y.; O’neill, A. Order-preserving symmetric encryption. In Advances in Cryptology-EUROCRYPT 2009, Proceedings of the 28th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; Proceedings 28; Springer: Berlin/Heidelberg, Germany, 2009; pp. 224–241. [Google Scholar]
- Nunez, D. Umbral: A Threshold Proxy Re-Encryption Scheme; NuCypher Inc and NICS Lab, University of Malaga: Malaga, Spain, 2018. [Google Scholar]
- Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. Available online: http://snap.stanford.edu/data (accessed on 1 March 2024).
k-NK | MVSSE | PPkNK | Aton | PH-kNK | |
---|---|---|---|---|---|
Privacy protection | ✓ | ✓ | ✓ | ✓ | |
labeled graphs | ✓ | ✓ | ✓ | ✓ | |
accurate search | ✓ | ✓ | ✓ | ||
hierarchical search | ✓ | ✓ |
Notations | Denotations |
---|---|
G | A graph |
n | The number of vertices in the graph G |
vertex in graph | |
Keyword which vertex contains | |
Level of vertex | |
V | A tuple of |
E | Edegs in the graph G |
A tuple having , , in | |
Shortest distance between vertex and vertex | |
A Secret key for hash and symmetric encryption | |
A Secret key for order-preserving encryption | |
A index generated for search sorted by keywords | |
A index indicating where to start the search in | |
A search index of PLL algorithm to get | |
A set of encrypted indexes above | |
PLL | A pruned landmark labeling scheme |
OPE | An order-preserving encryption scheme |
PRE | A proxy re-encryption scheme, Umbral encryption |
SE | A symmetric encryption scheme |
HAMC | a hash-based Message Authentication Coding method |
g | a pseudorandom function |
KeyGen | - | - | - | - | ||
KeyAssign | O(1) | - | - | |||
BuildIndex | - | - | - | - | ||
EncryptIndex | - | - | - | - | ||
QueryToken | - | - | - | |||
KnkSearch | - | - | - | - | ||
Decrypt | - | - | - | - |
Dataset | |V| | |E| | |W| |
---|---|---|---|
ego-Facebook | 4039 | 88,234 | 1311 |
Facebook LPPN | 22,470 | 171,002 | 4714 |
Process | Dataset | KeyGen | KeyAssign | Build Index | Encrypt Index | kNKSearch |
---|---|---|---|---|---|---|
memory peak (kb) | ego-facebook | 7.47 | 92.78 | 11,652.75 | 16,697.89 | 16,868.49 |
Facebook LPPN | 7.52 | 93.21 | 115,566.41 | 161,575.56 | 166,739.00 | |
time cost (ms) | ego-facebook | 0.09 | 50.91 | 1899.33 | 18,176.5 | 198.55 |
Facebook LPPN | 0.09 | 51.24 | 61,031.88 | 1,477,650.34 | 277.83 |
Scheme | Dataset | k | ||||
---|---|---|---|---|---|---|
10 | 20 | 30 | 40 | 50 | ||
PH-kNK | ego-Facebook | 198 | 217 | 232 | 240 | 262 |
Facebook LPPN | 277 | 286 | 305 | 316 | 338 | |
Aton | ego-Facebook | 243 | 259 | 270 | 285 | 302 |
Facebook LPPN | 303 | 326 | 340 | 361 | 373 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, X.; Xu, Z.; Hu, C.; Lin, J. Privacy-Preserving Hierarchical Top-k Nearest Keyword Search on Graphs. Electronics 2025, 14, 736. https://doi.org/10.3390/electronics14040736
Zhu X, Xu Z, Hu C, Lin J. Privacy-Preserving Hierarchical Top-k Nearest Keyword Search on Graphs. Electronics. 2025; 14(4):736. https://doi.org/10.3390/electronics14040736
Chicago/Turabian StyleZhu, Xijuan, Zifeng Xu, Chao Hu, and Jun Lin. 2025. "Privacy-Preserving Hierarchical Top-k Nearest Keyword Search on Graphs" Electronics 14, no. 4: 736. https://doi.org/10.3390/electronics14040736
APA StyleZhu, X., Xu, Z., Hu, C., & Lin, J. (2025). Privacy-Preserving Hierarchical Top-k Nearest Keyword Search on Graphs. Electronics, 14(4), 736. https://doi.org/10.3390/electronics14040736