Veriﬁable and Searchable Symmetric Encryption Scheme Based on the Public Key Cryptosystem

: With the rapid development of Internet of Things technology and cloud computing technology, all industries need to outsource massive data to third-party clouds for storage in order to reduce storage and computing costs. Veriﬁable and dynamic searchable symmetric encryption is a very important cloud security technology, which supports the dynamic update of private data and allows users to perform search operations on the cloud server and verify the legitimacy of the returned results. Therefore, how to realize the dynamic search of encrypted cloud data and the effective veriﬁcation of the results returned by the cloud server is a key problem to be solved. To solve this problem, we propose a veriﬁable dynamic encryption scheme (v-PADSSE) based on the public key cryptosystem. In order to achieve efﬁcient and correct data updating, the scheme designs veriﬁcation information (VI) for each keyword and constructs a veriﬁcation list (VL) to store it. When dynamic update operations are performed on the cloud data, it is easy to quickly update the security index through obtaining the latest veriﬁcation information in the VL. The safety and performance evaluation of the v-PADSSE scheme proved that the scheme is safe and effective.


Introduction
With the development of IoT technology, in order to achieve industrial informatization, more and more IoT devices are being connected, and the amount of data is becoming larger and larger.In order to save on storage and computing costs, enterprises choose to outsource massive amounts of data to cloud servers.However, while enjoying the convenience brought by the cloud, the security of the data has become crucial.To protect data privacy, sensitive private data need to be encrypted before being outsourced to the IoT cloud [1].Song et al. [2] proposed searchable symmetric encryption (SSE).SSE is an encryption scheme that allows users to store encrypted data on a third-party cloud server and search the encrypted cloud data through the trapdoor generated by the keywords.However, most SSE schemes only consider keyword search operations on statically encrypted cloud data, which is inconsistent with the real-time and dynamic update requirements for enterprise data.Moreover, some studies have shown that conventional SSE schemes are vulnerable to leakage abuse attacks [3], file injection attacks [4], and technical attacks [5].In order to realize the dynamic update (add, delete, or modify) operations of encrypted data stored on cloud servers, some SSE schemes supporting the dynamic update operations of private data have been proposed [6][7][8][9].Kamara et al. [6] proposed an SSE scheme that supports dynamic data updating.The scheme realizes a sublinear search via extending an inverted index and uses a search array and delete array combined with other storage space to realize the dynamic update of the data.Subsequently, they proposed another method based on the keyword red-black tree index structure [7] to support parallel keyword search and parallel data insertion and deletion.Guo et al. [8] proposed a dynamic SSE scheme based on an inverted index.The scheme records the keyword position by the inverted index and realizes the data dynamics through updating the index.Xia et al. [9] proposed a dynamic keyword search scheme for encrypted cloud data based on the tree index structure that supports multi-keyword sorting.
The above DSSE schemes do not consider the correctness and integrity verification of the returned matching result of the cloud server.In practice, the cloud server may return un-updated or incorrect matching results to the user in order to save computing resources.Therefore, users need to verify the results returned by the cloud server to ensure the correctness and integrity of the returned results.Some schemes [10][11][12] use the timestamp function of the RSA accumulator to verify the search results, which generates accumulator bits for all files and indexes, which can be saved by the data owner.If the cloud server returns an un-updated result, the user can check it with the latest accumulator.The RSA accumulator [13] can synthesize a large number of data into a fixed-size value to achieve member authentication, which can effectively reduce communication overhead.The RSA accumulator is applied to the compressed prefix tree structure to realize both efficient retrieval and result verification.However, the RSA accumulator is based on asymmetric cryptosystems, and the computational costs and verification costs are high.Some research teams have proposed verifiable schemes based on message authentication code (MAC) [14][15][16], but in DSSE application scenarios, MAC cannot verify whether the results returned by the cloud server are the latest, that is, it cannot resist replay attacks [17].Ge et al. [18] proposed a verifiable DSSE scheme based on cumulative authentication tags (AATs), which generates authentication tags for keywords and verifies the returned results through recording the number of global updates and the number of updates of a single file containing those keywords.Each update operation consumes only one label, which is highly efficient.However, the pseudo-random permutation and pseudo-random function are used in this scheme to replace and encrypt the keywords and global update times, which leads to key management problems.Relational authentication tags (RALs) [19] are used to verify the relationship of query keywords in documents, and audit certificates can be generated without exposing sensitive information.However, the program requires third-party auditors to be involved in the search process.Therefore, how to effectively verify the correctness and integrity of the returned results is an urgent problem to be solved.
According to the research on the above schemes, the verification of search results in most schemes is not comprehensive and also involves key management problems.Therefore, how to effectively verify the correctness and integrity of search results as well as the security management of encryption keys are the problems we should focus on and solve.
In this paper, we explore how to use the public key cryptosystem in the DSSE scheme to verify the correctness and integrity of the result returned by the cloud server and to manage the encryption key effectively and securely.
The contributions of this paper can be summarized as follows: (1) In order to efficiently realize the index update and index lookup, we constructed a bitmap index to store the relationship between the keywords and encrypted files.A verification list (VL) was used to store the latest verification information of files containing keywords so that we can quickly obtain the latest verification information from the VL to perform the secure index update.
(2) In order to support the effective verification of dynamic data, we designed publickey-based cumulative verification information (V I), which is stored in the bitmap index.When the encrypted cloud data are dynamically updated, the verification information can be easily updated.In addition, the verification information contains the corresponding keywords' information, which makes the verification information of various keywords different.Moreover, replay attacks can be resisted by the V I, that is, through verifying whether the returned result is up to date.
(3) In order to achieve forward security, the scheme places the node information in the bitmap index to avoid statistical attacks.When we need to search or update private data, the cloud server will return or change the whole column's data so that malicious cloud software cannot obtain the relationship of the keywords and index.
(4) Based on the above description, we design a verifiable DSSE scheme based on the public key cryptosystem.The security, verification efficiency, and updating efficiency of the scheme are analyzed and explored.The results show that the scheme is safe and effective.
Organization: The rest of the paper is organized as follows.We summarize the related work in Section 2. In Section 3, the formulas and algorithms involved in the scheme are defined, including model construction, design objectives, etc.In Section 4, we describe the construction of the scheme and the execution of the algorithm.The security analysis of the v-PADSSE scheme is given in Section 5.In Section 6, the implementation efficiency and updating efficiency of the program are analyzed and evaluated.

Related Work
With the development and application of the Internet of Things and cloud computing technology, many industries have chosen to outsource data to third-party clouds for storage.While cloud storage brings convenience to enterprises, it also brings new security challenges.Users cannot directly control the data stored in the cloud, so it is impossible to determine whether the data stored are complete and correct.To solve the problem of data verification, the research community has proposed some cloud storage verification schemes [20][21][22] to audit and verify data in the cloud.In addition, before uploading private data to the cloud for storage, users need to encrypt it to prevent it from being accessed directly by cloud providers.However, in this case, how users perform search operations on encrypted cloud data is also an important problem to be solved.To solve the above problems, the research community proposes searchable symmetric encryption (SSE), which allows users to perform search operations directly on the ciphertext.Compared with the searchable encryption scheme of the public key encryption system [23,24], the efficiency of the SSE scheme has received more attention from the industry.
Dynamic SSE.Searchable encryption can be divided into two categories: symmetric key encryption [25] and public key encryption [26].Song et al. [2] first proposed a searchable encryption scheme that encrypts each keyword through constructing a special two-layer encryption structure.Some static SSE schemes, such as semantic search schemes [27] and ranked keyword search schemes [28,29], are also proposed.However, in practice, industrial data are dynamically updated in real time, and the static SSE scheme does not support the dynamic update of encrypted cloud data, so it cannot meet the requirements of cloud storage data encryption at this stage.In order to support the dynamic update of encrypted data, Kamara et al. [6] proposed a dynamic SSE scheme through constructing an extended inverted index to achieve sublinear search efficiency and CKA-2 security.Scheme [30] proposed a dynamic SSE scheme which allows data owners to store privacy files in a way that the cloud server does not know the number of files through constructing a blind storage system on the cloud server.Guo et al. [9] proposed a DSSE scheme based on the inverted index, which allows data users to search multiple phrases in a query request, and the scheme supports the ordering of search results.In recent years, a number of cloudassisted schemes have been proposed for searchable encryption [31].Scheme [32] utilized the searchable encryption technologies of keyword range search and multi-keyword search.Since the cloud is untrustworthy, scheme [32] used Bloom filters and message verification codes to classify health information, filter out fake data, and check data integrity.In order to verify whether the cloud faithfully performs the search operation, a multi-user verifiable searchable symmetric encryption is proposed in scheme [25].Authorized users can search the data, verify the authenticity of the search results, and improve the accuracy of the search results.Since the access rights of authorized users are always valid, it is not secure.In order to automatically revoke a user's access, the time key was introduced in [33].At the beginning of encryption, the key is encapsulated in ciphertext, which means that all users, including the data owner, are bound by the time period.Later, Yang et al. [34] proposed a conjunctional keyword search with the function of specifying testers and enabling timed proxy re-encryption.It utilizes a time server to generate time tokens for users.In addition, it implements time-controlled access revocation to prevent authorized users from accessing future EHRs.Scheme [35] proposed timed-release computational secret sharing and threshold encryption which used a time-release function instead of a time server to reduce overhead.Scheme [36] proposed 0-encoding and 1-encoding to generate the time key.However, the retrieval efficiency of this work is low.In order to improve search efficiency, scheme [37] with hidden data structures was proposed in the literature.The user expected to find more ciphertexts in one step.However, scheme [37] reduced the number of computation-intensive operations without searching for at least two matching ciphertexts in just one step.This work cannot meet the need for a quick search and prevent authorized users from accessing future data.While all of the above work enables cloud-based search, there is still a challenge: the cloud is not a fully trusted entity and can collude with other entities to gain access to users' private information.
Verifiable SSE.In practice, cloud servers are semi-trusted entities [38] that may return incorrect or un-updated results to the data user in order to save on computing overhead.Miao et al. [39] constructed the verifiable SE framework (VSEF), which can withstand internal KGA and achieve verifiable searchability.Wu et al. [40] proposed a new authentication data structure based on homomorphic encryption and showed how to apply it to verify the correctness and integrity of search results.However, the verification proof in their scheme is generated by the cloud server, which can forge the proof to pass the verification when the cloud server is seen as an adversary.To avoid this, Chai et al. [41] first proposed a verifiable keyword search scheme for encrypting cloud data, using hash functions to generate proof of document identity.Jiang et al. [14] proposed a verifiable multi-keyword ranked search scheme based on encrypted cloud data, which realized an efficient keyword search through constructing the special data structure QSet.Yang et al. [42] designed a forward-privacy VDSSE scheme with Bloom filters and message authentication codes to allow verification and support dynamic updates of outsourced data.Zhang et al. [43] proposed a verifiable data structure based on a multi-set hash function, which guarantees forward security and realizes effective verifiable data updates.Gao et al. [19] used relational authentication tags (RALs) to verify the relationship of the query keywords in the document, which can generate audit certificates without exposing sensitive information.However, the program requires third-party auditors to be involved in the search process.Merkle hash trees [44] are used to validate data elements in large databases.Through adding data elements to the leaf node of the tree, the tree structure is constructed layer by layer from the leaf node to the root node, and finally the unique root node is obtained.A change to any element in the data set will make the root node change.A Merkle Patricia tree is proposed in GSSE [45] to reduce the storage overhead of index structures in schemes based on Merkle hash trees.It reduces storage space through reducing the depth of the tree.However, in the above two scenarios, the proof provided by the cloud server to the DU is larger in scale, which brings more communication overhead.Chen et al. [46] extend the Merkle hash tree [47] to a searchable index tree to achieve efficient result verification, where search time grows sublinearly with the size of the data set, and verification is more efficient than the accumulator structure.In addition, verifiable DSSE has been implemented by schemes [45,48], but they either support a single keyword match search or use two rounds of communication in a singleuser setup to achieve result verification.The RSA accumulator [13] can aggregate a large amount of data into a fixed value to achieve member verification, which can effectively reduce communication overhead.The RSA accumulator is applied to the compressed prefix tree structure to realize the combination of efficient retrieval and result verification.Schemes [10,12,13] all use an RSA accumulator to realize result verification for dynamic data.Most of the above VDSSE schemes are based on asymmetric key cryptography, and the results returned by the cloud server are verified using the public key signature.
Forward secure SSE.Forward privacy protection requires that update operations (insert or delete) performed by the data owner cannot be associated with previously performed search operations.Because the secret key is used for the deterministic encryption of private data in the DSSE scheme, it is easy for untrusted servers to obtain repeated queries and other information, which leads to information leakage (such as the number of keyword queries, etc.).If ORAM is introduced into the scheme, such problems will be avoided, but the communication cost and calculation cost are high, which causes the calculation and execution efficiency of DSSE to be exchanged through allowing some information to be leaked in actual use.However, such leaks are often attacked in different ways [49,50].Bost et al. [51] proposed to use a one-way trapdoor replacement to eliminate the correlation between the latest trapdoor and the previous trapdoor, that is, the latest trapdoor can search all encrypted documents, but the previous trapdoor cannot match the latest encrypted document.Cao et al. [52] used the KNN method to construct the security index and trapdoor.This method is used to encode indexes and trapdoors so that even if the keyword is the same, the encoding is different.In this way, the cloud server can avoid obtaining the number of keyword queries and the association between keywords and encrypted data based on data user query operations, thereby protecting forward privacy.Li et al. [53] used partitioning and pointer hiding technology to partition the secure index and extracted sub-keywords according to the original keywords as the keywords of partition search, then encrypted and hid the index block, which only needed to save the index table header identification and encryption key locally.Since the search token information is calculated using subkeywords, it is difficult for subsequent query keywords to be directly associated with the newly added encrypted document.

Security Model
In the design scheme, there are four entities that need to be involved, namely: the data owner (DO), the data user (DU), the cloud server (CS), and the key distribution center (KGC).The system security architecture is shown in Figure 1.

•
Data owner: This entity encrypts private files and secure indexes with a symmetric key and encrypts verification information with the data user's public key.After the encryption is finished, the ciphertext and index are uploaded to the cloud server.
When the data owner wants to update the privacy data, the update token needs to be generated locally and then sent to the cloud server for data updating.Upon receiving the V I request from the data user, the data owner returns the number of files N and the total number of file updates V that contain the keyword w. • Data user: The entity shares the encrypted private key with the data owner.When he wants to perform a search operation containing keywords, he needs to generate a trapdoor locally, then send the trapdoor to the cloud server for searching, and apply to the data owner for the latest verification information of the keyword.The w of V I indicates the keyword for which the user wants to perform the search operation.Upon receiving the returned results from the cloud server, the correctness and integrity of the results are verified according to the verification information.

•
Cloud server: This entity stores the ciphertext and security index information uploaded by the data owner.When it receives a search request, it performs the search operation on the security index and returns the corresponding matching results and verification information.When it receives an update request, it performs an update operation on the security index and the corresponding ciphertext.• Key Distribution Center (KGC): This entity is primarily used to generate keys.Upon receiving a key request from the data user, the entity returns a key pair (PK, SK) to the corresponding user.
In the system model, both the data owner and data user must be trusted entities.The data owner honestly encrypts the private files and builds the secure index.The data user honestly generates trapdoors for the desired keywords and sends them to the cloud server.The cloud server is an untrusted entity that allows search operations to record the correspondence between search keywords and encrypted files, and it may return incorrect or un-updated results to the data user in order to save computing overhead.The key distribution center is a trusted entity that honestly generates the key pair requested by the data user and sends it to the requesting user.

Design Goals
Based on the above model architecture, to achieve a verifiable DSSE scheme, we design a scheme which needs to meet the following objectives: • Support keyword search over the encrypted cloud data: The scheme needs to match all ciphertexts containing the corresponding keywords according to the search token and demonstrate high query efficiency.• Support efficient dynamic data updates: The scheme needs to support the dynamic update of encrypted data and secure indexes, such as dynamic addition, dynamic deletion, and dynamic modification.• Support search result verification: The scheme needs to support data users in efficiently verifying the correctness and integrity of the matching results returned by the cloud server, and the verification does not involve any complex operations.• Privacy protection.: Due to the scheme being based on the public key cryptosystem, the public key cannot be used to encrypt private information directly.The private information is encrypted using the symmetric key, and the asymmetric key is used to encrypt the verification information.In addition, the scheme should hide the encrypted file containing information about keyword quantity and keyword search frequency.

•
Replay attack resistance: To save on computing or storage overhead, the cloud server directly returns the un-updated results to the data user.The scheme should enable data users to verify the returned results to determine whether the returned results are up to date.

Algorithm Definition
The related algorithms in the v-PADSSE scheme we designed are KeyGen, PSKeyGen, IndexBuild, Building VL, GenToken, Search, Verify, Decrypt, UpdateToken, and Update.These algorithms are defined as follows: . The data owner outputs the key (K) through using the secure random parameter λ 1 as the input.• (PK, SK) ← PSKeyGen(λ 2 ).The KGC outputs the key pair (PK,SK), using the secure random parameter λ 2 as input.
• (I, C) ← IndexBuild(K, PK, F, W, N, V, v i , f lag).When building a secure index, encrypt the file F and the keywords W with K. Use the public key (PK) to encrypt the number of files (N) and the total number of updates (V) containing the keywords, the number of updates (v i ) per file, and the flag bit of whether the file contains the keyword ( f lag), where the random number generator function rand() is used to generate random numbers (randA for odd numbers, randB for even numbers).If the file contains the keyword, the f lag is odd.If not, the f lag is an even number.Calculate the file F using the SHA-3 hash algorithm (later replaced with the symbol "H"), and finally, output security index I and ciphertext C.
Take key K and keyword w as inputs, and output trapdoor T w .
The output is the verification information V I, matching ciphertext set C(w).The V I contains the update times of the file (v i ) matching the trapdoor, the flag of whether the ciphertext contains the keyword, and the result H(F) after hashing the plaintext file using T w , I and C as input.• (Y, N) ← Veri f y(V I, SK, T w , C(w)).Output the result of verifying (Y or N) through using V I, PK(N, V), SK, T w , and C(w) as inputs.• F(w) ← Decrypt(K, C(w)).Take K and C(w) as inputs, and output plaintext file F. • τ ← U pdateToken(K, PK, F, {w, f lag}, v i ).Update token information includes the update operation type, the newly updated file F, the document identifier F id , the number of updates per file v i , H(F) (the hash result of F), the set of keywords w contained in the file, and f lag.When the add operation is performed, the VL is matched according to the keyword set contained in file F, V = V + 1 and N = N + 1 are calculated in the matched node, and the document ID (F id ) and update number v i = 1 of file F are added to the node.At this point, the update token contains the addition of F id , v i , the keyword set w i contained in the file F, and H(F).If F contains the keyword, f lag = randA; otherwise, randB.When the delete operation is performed, the verification list VL is matched according to the keyword set contained in F; then, N = N − 1 and V = V − v i are calculated in the matched nodes, and F i d and its update times v i in the nodes are deleted, so that v i = 0 and f lag = randB.
When the modify operation is performed, the verification list VL is matched according to the keyword set contained in F. In the matching node, perform V = V + 1 and v i = v i + 1; N and f lag remain unchanged, and the new file is hashed to H(F).After the above verification information is modified, the public key is used to encrypt this information except for H(F).

•
(I , C ) ← U pdate(τ).The cloud server executes the update algorithm.Match according to F i d contained in τ, and replace the nodes' value in the column.According to the update token τ, The cloud server generates a new index item I and ciphertext C .

Security Definition
• Updated Reliability: A verifiable DSSE scheme first needs to ensure that the cloud server performs reliable update operations, that is, replay attack resistance.Since the cloud server is not trusted after it receives an update request from the data owner, it may not perform the corresponding update operation according to the update token content, that is, it will not update the security index and ciphertext collection.After receiving a search request from the data user, the un-updated data are returned to the data user, and the data user should verify that the returned results are up to date.If the opponent obtains the latest authentication information V I and valid ciphertext C(w) , and the forged information can pass the verification algorithm, the opponent wins.

•
Verifiability: If the probability of the opponent successfully forging search results is negligible, the v-PADSSE scheme is considered verifiable.Due to the unreliability of the cloud server, it may return incorrect or incomplete results to the data user.Data users should be able to detect the improper behavior of the cloud server using verifi-cation algorithms to ensure the correctness and integrity of the returned results.If the opponent obtained the latest verification information V I and the valid ciphertext set C(w) and can forge the authentication information to pass the verification algorithm, the opponent wins.

v-PADSSE Scheme Construction and Algorithm Description
We have summarized some common symbols used in the design of the v-PADSSE scheme, as shown in Table 1.

Overview of the v-PADSSE Scheme
In order to solve the problem of correctness and integrity verification of the results returned by the cloud server, this paper designs a DSSE scheme based on public key verification (v-PADSSE).In this scheme, verification information is added to the security index and encrypted using the public key of the data user, so that the user can verify the returned results.Below, the construction of the v-PADSSE scheme is described in detail.
When constructing V I, the v-PADSSE scheme needs to include v i , f lag, and H(F).The V I needs to be encrypted with the user's public key.Assume that the encryption function is PK(V I).To prevent the cloud server from collecting statistics on the correlation between keywords and updated files, all index nodes in the column of the document representation of the updated file must be updated so that the update operation can hide the correlation between the ciphertext and the keywords.Therefore, the verification information V I = PK(v i ) + PK( f lag) + H(F).In addition, the data owner creates a verification list (VL) locally, which stores the latest verification information of each keyword, including the number of files containing keyword N, the total number of updates of files containing keyword V, the document identification of files containing keyword id, and its single set of file update times v i so that the latest update token can be generated directly when the update operation is performed.For different update operations (such as modify, add, and delete), V I needs to be performed in different operations.V I is calculated as follows: • Add new file F .
When the data owner obtains the latest V I and performs the add operation, the number of updates of a single file is initialized to v i = 1.If the newly added file F contains the keyword w in the VL, the corresponding node in the VL needs to execute N = N + 1, V = V + 1, then add the document id of the new file F and the number of updates v i to the node.If it does not, a new node needs to be added to the VL, where N = 1, V = 1, v i = 1, and V I = PK(v i ) + PK( f lag = randA) + H(F ).

•
Modify file F to F .
When file F needs to be updated to a new file F (both F and F contain the keyword w), the data owner updates VL with the latest verification information for the keyword w in the corresponding node and executes V = V + 1 and File F contains keyword w, and the data owner updates N = N − 1, V = V − v i , and v i = 0 in the VL in the node where keyword w resides.In this case, the latest verification information V I = V I − PK(N) + PK(N − 1) − PK(v i ) + PK(v i = 0) − PK( f lag = randA) + PK( f lag = randB).Additionally, the document id and v i of file F are removed from the VL.
After the data owner performs different update operations, the corresponding update token τ is generated and sent to the cloud server, which updates the security index and ciphertext according to the update token information.

Secure Index Structure
In v-PADSSE, the data owner constructs the security index through using the bitmap index and constructs VL locally.The data owner generates the symmetric encryption key through executing the algorithm KeyGen, and the data user's key pair (PK, SK) is generated via KGC executing the PSKeyGen algorithm.
First, the data user publicly releases the public key, and the data owner, after obtaining the user's public key, uses the symmetric private key K to encrypt the privacy files and keywords and uses PK to encrypt the verification information corresponding to the keywords.When the security index is firstly constructed, the data owner needs to initialize V I through initializing N to the number of files containing the keyword w, V = ∑ n i=1 v i , v i = 1 (1 ≤ i ≤ N).When the security index is constructed, the column header contains the keyword w i , and the row header contains the document ID docId.Middle node information includes v i (the number of updates to the file containing the keyword w i ), f lag (indicating whether the document contains the keyword), and H(F) (the result of the hash operation of the plaintext file).The security index structure is shown in Figure 2. The number of rows in the secure index is determined by the number of keywords, and each row is associated with a keyword.The number of column nodes is determined by the number of privacy files.When the data owner needs to perform an update operation, PK(v i ), PK( f lag), and H(F) in all nodes in the whole column are modified according to the updated document identifier.
The data owner needs to obtain the latest verification information of the corresponding keyword when generating the update token.Therefore, VL is designed in the scheme and is owned by the data owner.The latest V I of keywords contained in each privacy file must be recorded in VL.When updating, the data owner modifies the V I in VL to ensure that the VL and V I in the security index are updated simultaneously.The structure of the VL is shown in Figure 3, where N indicates the number of files containing the keyword, V indicates the total number of updates to files containing the keyword, and {id, v i } indicates the set composed of the document identification of the file containing the keyword and the number of updates to the file.

Algorithm Description
In this section, we give the execution steps of the core algorithm of the v-PADSSE scheme and explain the related functions in detail.
In v-PADSSE, the core algorithms involved are IndexBuild (Algorithm 1), Building VL (Algorithm 2), GenToken, Search (Algorithm 3), Verify (Algorithm 4), UpdateToken (Algorithm 5), and Update (Algorithm 6).The IndexBuild Algorithm 1 is used by the data owners to construct the secure index with bitmaps.Among them, the header node stores keyword and document identification, and the middle node stores keyword-related verification information.After constructing the secure index, data users upload the secure index and encrypted files to the cloud server.Data users use the GenToken algorithm to generate a trapdoor and send it to the cloud server.The cloud server executes the Search Algorithm 3, performs the matching query on the security index according to the trapdoor, and returns the matching results and verification information to the data user.After receiving the results, the user executes the Verify Algorithm 4 to verify the correctness and integrity of the results.If not verified, refuse.To update the privacy file, the data owner obtains the verification information related to the keyword contained in the file from VL, runs the UpdateToken Algorithm 5 to generate the corresponding update token, and sends it to the cloud server.The cloud server executes the Update Algorithm 6 based on the updated token information to update the security index and related ciphertext.Below, we will give a detailed explanation of the above core algorithm execution process.

•
Initialization parameters: (1) Obtain the keyword set {w} contained in the plaintext file and save it in the keyword set W = {w 1 , w 2 , . . . ,w n }, where n is the number of keywords.(2) Obtain the number of files containing keyword w, N = F(w).num,the total number of updates to files containing keyword w, V = N, the set {id, v i = 1} consisting of the document ID and the number of updates per file, and whether the file contains the f lag of the keyword.(PK(v[N]), PK( f lag)) ← EncryptPK(v[N], f lag) through using the public key.(3) Encrypt keyword w and plaintext file F using the symmetric key algorithm, compute (Kw, C) ← EncryptK(w, F), and hash file F to obtain H(F).
• Building secure index: For each privacy file, document identification docId and keyword set {w i } ∈ W(1 ≤ i ≤ n) are used to construct the bitmap index.The functions and parameters required to construct the security index are described as follows: (1) Create the header node BuildHeadNode(K w , docId), where K w is the keyword set after symmetric key K encryption, and docId is the private file identification set.
(2) Create an intermediate node BuildMiddleNode(K w , docId, PK(v i ), PK( f lag), H(F)); K w and docId indicate where verification information is stored in the bitmap index, PK(v i ) is the number of file updates containing the keyword w, and PK( f lag) indicates whether the privacy file corresponding to docId contains keyword w.If yes, f lag = randA; if no, f lag = randB.H(F) is the result of hashing the privacy file F.

11:
end for 12: end for 13: //The above process of creating head nodes and middle nodes together forms the bitmap index, and assigns the f lag bit according to whether the keyword is contained in the privacy file, and generates the security index I.The verification list is constructed and stored locally by the data owner.The VL is a single linked list, and the linked list node needs to contain the verification information of each keyword.The creation process is as follows.
The VL header node does not store any information, only the address of the first keyword.
BuildHeadNode( * f irstKeyWord): * f irstKeyWord indicates the address of the first keyword.
BuildListNode(w i , N i , V i , {id, v i }, * nextKeyWord).The middle node of the linked list stores the keyword w i , the number of files N i containing w i , and the total number of updates to files V i containing w i .A set of the document identification of the privacy file containing w i and the number of updates v i to the file, and a pointer to the next keyword address * nextKeyWord are also included.

Algorithm 2 Building VL
1: //Because there are n keywords, the creation of the intermediate index node needs to be executed n times.2: for i = 0; i < n; i++ do 3: BuildListNode(w i , N i , V i , {id, v i }, * nextKeyWord); //Obtain keyword verification information.The Verity algorithm process is as follows: 1: DU: 2: //The user uses the private key SK to decrypt the verification information.3: (v i , f lag) ← DecSK(PK(v i ), PK( f lag)); 4: //Decrypt the latest verification information returned by the data owner.5: getNewVeri f yIn f o(T w , PK(N w ), PK(V w )); 6: DecSK(PK(N w ), PK(V w )); 7: //Check whether the keyword is contained in the privacy file according to the f lag.If yes, proceed with the execution.If not, the privacy file will not be decrypted.8: if flag% 2=0 then

end if 17: end if
The UpdateToken algorithm process is as follows: Algorithm 5 τ ← U pdateToken(K, PK, F, {w, f lag}, v i ) 1: DO: 2: Add: 3: //The data owner needs to add the privacy file F, he firstly obtains the keyword set {w} and document identification docIdF in F and then encrypts F and keyword {w} with K, v i is the number of updates corresponding to the privacy file.4: ({K w }, C) ← EncryptK({w}, F);

Comparison
In this section, we compare our scheme with Σoφoς [51], Ge's scheme [18], Gao's scheme [19], and Zhang's scheme [43].All of those schemes can ensure the verifiability of search results.Assume there are n files and m keywords in total.For simplification, we assume that each search returns n files.We neglect the communication costs and only compare the computation overhead in different phases of these schemes.Table 2 shows the results of the comparison.
As can be seen from Table 2, the efficiency of our scheme is close to Zhang's scheme [43], Gao's scheme [19], and Ge's scheme [18] in terms of search and verify operations.However, in the update process, our scheme and Ge's scheme [18] are better than others; the time complexity of the two schemes is O(n).

Security Analysis
In this section, we will analyze the security of the v-PADSSE scheme in two aspects: update reliability and verifiability.

Update Reliability Analysis
Due to the cloud server being unreliable in v-PADSSE, it may not update the security index and ciphertext after receiving the update request from the data owner in order to save computing or storage resources.We are going to prove that the Verify algorithm outputs "N" when the cloud server returns un-updated results.
Assume that the result returned by the cloud server is (V I , C (w)), and the correct result and verification information is (V I, C(w)).The number of files containing the keyword w is N , and the total number of updates to files containing the keyword w is V .In addition, the scheme proposes that when the data user performs a query, it will apply to the data owner for the latest verification information PK(N), PK(V) for the keyword.At this time, we will consider the following three scenarios to prove the reliability of the v-PADSSE.
(1) VI = VI', C(w) = C'(w) If the cloud server updates only the security index but not the ciphertext, the returned result is (V I, C (w)).
But the return verification information contains H(F).At this time, we decrypt the return ciphertext C (w) to get F .If you want to pass the verification, then H(F) = H(F ); that is, F = F .If the plaintext is the same, the result C(w) = C (w) after encryption with the same key, which is inconsistent with the assumption that C(w) = C (w).The above calculation shows that if only the security index is updated without the ciphertext, the Verify Algorithm 4 cannot output 'Y' when the data user performs verification.
(2) VI = VI', C(w) = C'(w) In this case, the cloud server only updates the ciphertext but not the verification information in the security index.
Since the verification information in the security index is not updated, if PK(v i ) = PK(v i ) and the rest are equal, the total update times V = ∑ v[1, . . ., N ] will output 'N' when verifying the returned results.If PK( f lag) = PK( f lag ) and the rest are the same, the number of returned results is not equal to N, and the Verify Algorithm 4 will output 'N'.If the number of f lag = randA is the same as the number of f lag = randA in the returned V I, it is also necessary to ensure that the v i corresponding to the two are the same, which indicates that the attacker needs to obtain the verification information from the data owner, but VL is private to the data owner, and the probability of information leakage can be almost ignored.If H(F) = H(F ) and the rest are the same, in this case F = F , the Verify Algorithm 4 will output 'N'.
(3) VI = VI', C(w) = C'(w) Assume that the cloud server does not update the security index and ciphertext after receiving the update token from the data owner.If the data user performs a Search operation (Algorithm 3), the cloud server returns the un-updated results to the user.In the system model architecture (Figure 1), before performing the Search operation, the data user needs to send the latest V I request for the searched keyword to the data owner.The data owner searches VL and returns the latest V I of the keyword to the data user.After receiving the V I, the data user uses the Verify Algorithm 4 to compare the un-updated V I with the latest.If any inconsistency is found, the Verify Algorithm 4 directly outputs 'N'.
The above shows that if the cloud server does not update the security index and ciphertext to save computing or storage resources, our scheme can verify the verification information and return results through the verification algorithm to find the un-updated situation in time.Therefore, the v-PADSSE scheme proposed by us meets the updated reliability.

Verifiability Analysis
In this section, we assume that the attacker can forge (C (w), V I ) so that the returned results pass the Verify Algorithm 4. Assuming that the correct results and verification information are (C(w), V I), we will compare the forged information with the real information to prove that the probability of the attacker passing the Verify Algorithm 4 through forging verification information is negligible.
We will consider the following three scenarios to demonstrate the verifiability of the v-PADSSE.
(1) VI = VI', C(w) = C'(w) Attackers forge V I = PK( f lag) + PK(V I) + H(F ), while proper verification information V I = PK( f lag) + PK(V I) + H(F).This makes: In this case, because N, V, f lag, and v i are encrypted using the user's public key, the cost of forgery is relatively small.At this point, we can consider: According to the properties of the hash function, F = F is certain.In this case, C (w) = C(w), which is not consistent with the assumption.Therefore, the probability of an attacker passing the Verify Algorithm 4 in this way is almost negligible.
(2) VI = VI', C(w) = C'(w) The attacker forges the ciphertext C (w) to be consistent with the correct ciphertext.According to the design of the v-PADSSE, the verification information contains H(F).At this point, we can consider: PK(v i ) + PK( f lag ) = PK(v i ) + PK( f lag); According to the above situation, PK(V) = PK(∑ v i ) = PK(∑ v i ); when PK( f lag ) = PK( f lag), if the number of f lag = randA is not equal to the number of f lag = randA, the known probability of information leakage of VL can be ignored.In this case, the probability of the number of returned results being N is negligible, and the Verify Algorithm 4 will output 'N'.Therefore, the probability of the above situation passing the Verify Algorithm 4 can also be ignored.
(3) VI = VI', C(w) = C'(w) In this case, the data user will spend a communication after sending the trapdoor to the data owner to request the latest verification information N, V of the keyword.Therefore, under this assumption, as long as any of the verification information is different, or the encrypted files returned are different, H(F) is inconsistent, which will make the Verify Algorithm 4 output 'N'.Therefore, the probability of the attacker passing the verification can be ignored under this condition.
The above three scenarios show that if a malicious attacker forges ciphertext or verification information, our scheme can also determine which information is forged and give feedback.Therefore, our scheme satisfies verifiability.

Performance and Experiments
In this section, we will analyze the performance of the proposed v-PADSSE scheme.The basic logic of the experiment was written in C++, and the running environment was Windows 10 equipped with a 2.40 GHz 12th Gen Intel(R) Core(TM) i7 CPU and 4.0 GB RAM.
Index construction efficiency.We evaluated the bitmap index proposed in the scheme and the verification list construction efficiency.Figure 4 shows the time spent to build the security index and verification list when the number of keywords is set to 10,000 and the number of privacy files changes from 1000 to 10,000.In the scheme, the security index adopts the form of a bitmap index, the number of rows is the number of keywords, and the number of columns is the number of document identifiers of privacy files.When the number of rows in the bitmap index is fixed and the number of private files increases, the number of columns in the bitmap also needs to increase, and the time cost of building a secure index also increases.Figure 5 shows the time spent to construct the security index and verification list when the number of privacy files is 10,000 and the number of keywords contained in the privacy files changes from 1000 to 10,000.When the number of secure index columns is fixed, the increase in the number of keywords leads to an increase in the number of rows in the bitmap index, and the time cost of building the secure index Algorithm 1 also increases.During secure index construction, the number of nodes is related to the number of keywords and privacy files.Therefore, when the number of privacy files or keywords increases, the number of columns or rows of the bitmap index will also increase, and the time cost of building a security index will also increase.Since the number of nodes in VL is only related to the number of keywords contained in the privacy file, when the number of keywords increases, the number of nodes in VL increases, and the time cost of building VL (Algorithm 2) increases at the same time.
Update token generation efficiency.Figure 6 shows the time cost of generating update tokens (Algorithm 5) (modify token, delete token, and add token) in the scheme.Since the generation of the update token involves the document identification and the number of keywords contained in the privacy file after the document identification of the modified file is determined, the latest update times of the file need to be obtained from VL.Therefore, the generation efficiency of update tokens is linearly related to the number of nodes in VL, and the higher the number, the longer the token generation time.However, since the added token may involve increasing the number of VL nodes, the generation time will be slightly longer.Search efficiency analysis.The VL obtains the latest verification information and generates an update token.After receiving the update token, the cloud server searches for it in the security index.Figure 7 shows the time cost of performing a search operation in the security index when the number of keywords is 10,000 and the number of private files changes from 1000 to 10,000.It can be seen that when the number of rows in the security index is fixed, that is, the number of keywords is fixed, the time cost of searching the index increases linearly with the increase in the number of columns, that is, the number of privacy files.Figure 8 shows the time cost of searching (Algorithm 3) the VL and the secure index when the number of privacy files is 10,000 and the number of keywords changes from 1000 to 10,000.Since the number of nodes in VL is equal to the number of keywords, the search time also increases linearly when the number of keywords increases.When the number of columns in the security index is fixed and the number of rows in the bitmap index increases as the number of keywords increases, the search time cost increases.In this section, we compare this scheme with what is generally regarded as the most typical verifiable SSE scheme [54] in terms of verification efficiency and update efficiency.
Verify efficiency analysis.We made a comparative analysis of the verification (Algorithm 4) time cost of our scheme and scheme [54].As can be seen from Figure 9, the verification time cost of our scheme is lower than scheme [54].Scheme [54] used a bilinear mapping accumulator to verify search results, which is based on asymmetric key encryption.Our scheme is based on a VL; the number of file updates is stored in the VL, which is more efficient than the accumulator.As shown in Figure 9, when the number of privacy files containing search keywords is 200, the verification time cost of scheme [54] is roughly 5 ms, and the verification time cost of our scheme is 0.3725 ms.When the number of search keywords is 2000, the verification time cost of scheme [54] is about 48 ms, and the verification time cost of our scheme is about 9.2437 ms.As can be seen from Figure 10, the number of CPU clock cycles of our scheme is lower than that of scheme [54].Therefore, the verification efficiency of our scheme is higher than that of scheme [54].Update efficiency analysis.After receiving the update token, the cloud server needs to perform the corresponding Update operation (Algorithm 6) on the security index.As can be seen from Figures 11-13, the number of columns or rows in the security index needs to be increased due to the add and modify operation, and the time cost is slightly larger than the delete operation.The cloud server deletes the corresponding column in the security index when it performs the delete operation, and the time cost is lower.As can be seen from Figures 11-13, the update efficiency of our scheme is better than scheme [54].

Discussion
In this section, we analyze the advantages and disadvantages of schemes [18,19,43,51], as shown in Table 3. Σoφoς [51], Zhang's scheme [43], and Gao's scheme [19] realize the dynamic update and searchability of data through constructing an inverted index.If the update file contains many keywords, the update efficiency is relatively low.None of the above four schemes involve key management securely.Σoφoς [51] uses a one-way trap gate to realize forward security, but the calculation cost is high.Zhang's scheme [43] is improved on the basis of Σoφoς [51], using random states to achieve forward safety and improve efficiency.However, the correctness of the returned results is not verified; that is, the update reliability proposed in this scheme is not satisfied.Gao's scheme [19] requires third-party TPA to verify the integrity of search results, which requires TPA to honestly implement the verification algorithm, which requires a trap gate, data block number, RAL, and authenticator to perform related calculations, which is relatively complex.Both Ge's scheme [18] and our scheme used a bitmap to construct the security index, which has high updating efficiency.However, the accumulated authentication tags (AATs) in Ge's scheme [18] contain ciphertext data blocks, which consume additional storage resources.In our scheme, the verification process does not involve complex operations, and the verification information is simple, which makes the verification efficiency high, but the scheme also needs to consume communication resources once more.In conclusion, compared with the above schemes, our scheme is relatively efficient in the process of searching, updating, and verifying.

Schemes Advantages Disadvantages
Σoφoς [51] Effectively implement forward security.The calculation of trapdoor replacement is costly.
Zhang's scheme [43] The multi-level hash function is used to replace the one-way permutation function, and the update efficiency is improved significantly.
The correctness of the returned result is not verified.
Gao's scheme [19] The verification process effectively hides the relation between the keywords and the encrypted files to avoid statistical attacks.
The verification process is complex, and TPA must perform the verification process honestly.
Ge's scheme [18] The verification of the returned results consumes only one AAT resource, which has high efficiency.
AAT contains ciphertext data blocks, which consume additional storage resources.

our scheme
The verification procedure is simple and does not involve complex operations, which has high efficiency.
The verification process consumes an additional communication resource.

Conclusions
In this paper, we first studied the research status of the DSSE scheme and analyzed the advantages and disadvantages of different schemes.Since most schemes do not involve key management, we proposed a verifiable DSSE scheme based on the public key cryptosystem which can realize secure key management.In Section 3, we defined the security model, design goals, core algorithm, and security analysis of the V-PDSSE scheme.In Section 4, we described the bitmap index and verification list construction of the scheme in detail and explained the core algorithm steps of the scheme.Finally, we compared the time complexity with schemes [18,19,43,51] to prove that the implementation efficiency of our scheme is high.In Section 5, a security analysis was carried out on the reliability and verifiability of our scheme to prove that our scheme meets the security requirements.In Section 6, we tested the efficiency of the core algorithms and compared the efficiency of scheme [54] in security index construction, verification list construction, searching, search result verification, and updating.The results show that our scheme has high efficiency and strong feasibility.In Section 7, we analyzed the advantages and disadvantages of schemes [18,19,43,51].
Compared with previous schemes, the functional design of this scheme is more comprehensive.The verification process does not involve complex operations, the verification information structure is simple, and the execution efficiency is high.Our scheme can solve the problems of dynamic searchability, forward security, integrity, and correct verification of search results and key management well.In future work, given the rapid development of quantum computers and verifiable DSSE schemes, how to deal with quantum attacks effectively will become a key direction of our research.
14: //The data owner sends the generated security index I and ciphertext C to the cloud server.15: Send to CS(I,C); • Building verification list: VL 4: end forThe Search algorithm process is as follows: Algorithm 3 (veri f yIn f o, C(w)) ← Search(T w , I, C) 1: DO: 2: //Data users execute trapdoor generation algorithm GenToken, use symmetric key K to encrypt keyword information, generate trapdoor T w , and send it to the cloud server.It is also sent to the data owner through the channel to obtain the latest verification information of the keyword.3: T w ← GenToken(K, w); 4: Send (T w ) to CS and DO; 5: CS: 6: if IndexSearch(T w ) = null then 10: verifyInfo = GetVerifyInfo(PK(v i ),PK(flag),H(F)); 11: C(w) = search(docId); 12: end if 13: //Return the search results and VI to the data user.14: Return (verifyInfo,C(w)); verification information returned by the data owner compares with that returned by the cloud server.If the verification information is correct, the ciphertext is accepted and decrypted.If not, the ciphertext is rejected.11: else12:    if N w = C(w).num and V w = ∑ v i and H(Decrypt(K,C(w)))=H(F) then

Figure 4 .
Figure 4. Security index and VL construction time cost.

Figure 5 .
Figure 5. Security index and VL construction time cost.

Figure 6 .
Figure 6.The update token generation time cost.

Figure 7 .
Figure 7. Search secure index time cost.

Figure 8 .
Figure 8. Search secure index and VL time cost.

Figure 10 .
Figure 10.CPU clock cycles of per data comparison.

Figure 11 .
Figure 11.The comparison of add operation time cost.

Figure 12 .
Figure 12.The comparison of modify operation time cost.

Figure 13 .
Figure 13.The comparison of delete operation time cost.

Table 1 .
Common symbols and descriptions