An Efﬁcient Search Algorithm for Large Encrypted Data by Homomorphic Encryption

: The purpose of this study is to provide an efficient search function over a large amount of encrypted data, where the bit length of each item is several tens of bits. For this purpose, we have improved the existing hybrid homomorphic encryption by enabling the longer data items to be stored while using multiple encrypted databases and by suggesting an improved search method working on top of the multiple instances of the database. Further, we found the optimal number of databases to be needed when 40-bit information, such as social security number, is stored after encryption. Through experiments, we were able to check the existence of a given (Korean) social security number of 13 decimal digits in approximately 12 s from a database that has 10 million encrypted social security numbers over a typical personal computer environment. The outcome of this research can be used to build a large-scale, practical encrypted database in order to support the search operation. In addition, it is expected to be used as a method for providing both security and practicality to the industry dealing with credit information evaluation and personal data requiring privacy.


Introduction
In recent times, it is not uncommon to say that there are many personalized services utilizing IT technology. For personalized services, the use of personal information, such as personal identification numbers, is inevitable. Unfortunately, such use has led to a sharp increase in the number of incidents involving the leakage of personal information [1]. In particular, the Korean residence registration numbers (RRNs), which have roles that are similar to social security numbers in the U.S., have been widely used as personal identification numbers for a long time; they are more extensively collected along with personal information, recorded, and managed in multiple repositories, owing to the progress of the information society. RRN has a strong personal identification value and it can be used as an identity verification tool in all fields of activity; hence, if it is exposed, it is possible not only to impersonate another person, but also use it for fraud or illegal collection, and marketing of various target advertisements. Therefore, unprotected RRN can cause serious privacy infringement [2][3][4].
The hybrid HE [18,19] technique, which provides an efficient matching search, has recently been proposed as an HE method. It aims to quickly search for the presence or absence of a specific sequence from a set of encoded sequence information. We improve this method and propose an efficient search method on encrypted RRN database. The problem with the existing hybrid HE method is that the size of the data item to be searched is limited to 11 bits; hence, there is not enough space to store RRNs that require more than 40 bits each. In addition, the existing hybrid HE methods target approximately tens of thousands of data for searching. Thus, it seems to be infeasible to use these methods for real-time processing of approximately 10 million data points.
In this paper, we focus on this problem and propose a new method that provides a search function for ten million encrypted data in practical time. The proposed method is based on the existing hybrid HE method [18,19]. However, by extending the maximum bit length of plaintexts that can be encrypted, it is possible to search for longer data than that using the previous method in a short time. In the basic database constituting the existing hybrid HE method [18], the size of the search keyword and data item to be searched is limited to 11 bits each, and a method of increasing the keyword to 22 bits using two databases is proposed. We increase the storage efficiency by expanding the database not only the size of the search keyword, but also the size of the data item to be searched to 22 bits. We also propose a method that uses three or more multiple encrypted databases and enables a parallel search on them. Therefore, the search performance of the proposed method can be improved. In addition, when a keyword search is performed on the part of the RRNs, it is necessary to process data items that share the same keyword, unlike the existing hybrid HE method. Therefore, the proposed method also handles it together in the process of expanding the database.
We prove the efficacy of the proposed method by randomly generating tens of millions of RRNs and then measuring the elapsed times for encryption, database creation, and search. In a conventional personal computer (PC) environment, the proposed method takes less than 10 s to search for one of the 10 M data. This result is approximately 43.9 times faster than the Paillier encryption-based search method [9], which, as far as we know, is one of the most widely used efficient HE scheme.
In addition, it is known that the Ring LWE problem [20], which is the security assumption of the encryption system used in the hybrid HE method [18,19] that is the basis of the proposed method, is safe from quantum computing attacks. Therefore, the framework that is described in this paper is expected to be robust against quantum computer attacks in the future.
The remainder of this paper is organized, as follows. Section 2 describes the preliminaries and related work. Section 3 provides the proposed method for efficient search of large-scale encrypted data. Section 4 describes the results of the performance evaluation. Section 5 presents a complementary discussion on the issues that are related to the implementation of the proposed work. Finally, Section 6 presents the conclusions, limitations, and directions for future research.

Preliminaries and Related Work
This section discusses the preliminaries that help to understand this work and provide related work. In the first subsection, we present the basic concept of HE and explain somewhat HE (SWHE), where the hybrid HE can be classified. Subsequently, we introduce the hybrid HE. The second subsection covers related work. Finally, the third subsection briefly describes the structure of the RRNs to be stored in the database in this study.

Preliminaries
Homomorphic Encryption (HE): HE is an encryption technique that enables computation between the plaintexts hidden in their corresponding ciphertexts. The result of the computation in HE is another ciphertext that contains the correct computation result between plaintexts. In conventional encryption algorithms, once a plaintext is encrypted, the hidden plaintext in the ciphertext cannot be manipulated, because the ciphertext completely hides the plaintext, and nothing is supported by the encryption algorithms, except the decryption and encryption operations. Therefore, the ciphertexts must be decrypted to extract the plaintexts to perform operations on the plaintexts embedded in ciphertexts. In addition, to create a ciphertext containing the operation result, the plaintext of the operation result must be encrypted again. However, HE can perform operations without decrypting the text and obtain the ciphertext containing the result of the computation. Rivest and Adelman first proposed the concept of HE, who were the proponents of the RSA encryption method, named after them [10]. They [10] proposed "privacy homomorphism" as a way to obtain sensitive information from company-owned customers and suggested five encryption methods. However, these five schemes have security problems. Almost all of the proposed HEs until the mid-1990s have been shown to have security problems [11][12][13][14]. The concept of "operation in the encrypted state", as presented in [10], however, became the basic idea of HE and, since then, some secure HE algorithms have been proposed [9,[15][16][17]. Unfortunately, they support limited functionality [15]: to be specific, some algorithms support only 1 bit XOR operation [16], some support only multiplication, whereas others [9,17] support only addition on the underlying plaintexts hidden in the ciphertexts.
Somewhat HE (SWHE): SWHE [7,8,21] refers to homomorphic encryption methods that support a limited number of consecutive multiplications operations from a fresh ciphertext, which is, the ciphertext as a result of encryption, while the number of consecutive addition operations allowed is practically infinite. In generating the security parameters, the number of consecutive maximum multiplications operations that can be performed for a ciphertext, which is, the multiplicative depth of the circuit to be performed, is determined. The SWHE scheme is defined by the following algorithms in Figure 1. decryption and encryption operations. Therefore, the ciphertexts must be decrypted to extract the plaintexts to perform operations on the plaintexts embedded in ciphertexts. In addition, to create a ciphertext containing the operation result, the plaintext of the operation result must be encrypted again. However, HE can perform operations without decrypting the text and obtain the ciphertext containing the result of the computation. Rivest and Adelman first proposed the concept of HE, who were the proponents of the RSA encryption method, named after them [10]. They [10] proposed "privacy homomorphism" as a way to obtain sensitive information from company-owned customers and suggested five encryption methods. However, these five schemes have security problems. Almost all of the proposed HEs until the mid-1990s have been shown to have security problems [11][12][13][14]. The concept of "operation in the encrypted state", as presented in [10], however, became the basic idea of HE and, since then, some secure HE algorithms have been proposed [9,[15][16][17]. Unfortunately, they support limited functionality [15]: to be specific, some algorithms support only 1 bit XOR operation [16], some support only multiplication, whereas others [9,17] support only addition on the underlying plaintexts hidden in the ciphertexts.
Somewhat HE (SWHE): SWHE [7,8,21] refers to homomorphic encryption methods that support a limited number of consecutive multiplications operations from a fresh ciphertext, which is, the ciphertext as a result of encryption, while the number of consecutive addition operations allowed is practically infinite. In generating the security parameters, the number of consecutive maximum multiplications operations that can be performed for a ciphertext, which is, the multiplicative depth of the circuit to be performed, is determined. The SWHE scheme is defined by the following algorithms in Figure 1. Hybrid HE(Homomorphic Encryption) [18,19] provides a method of encoding a large amount of data on the existing SWHE and a method of quickly searching for given data among the encoded data; it easily finds the associated data with the searched data. Hybrid HE methods have been proposed in order to quickly find specific genome sequences in large amounts of genome sequence information. In [18,19], the genome sequencing data are stored in files of variations call format (VCF). Figure 2 shows the met hod used to crea te an encrypted database from the genome sequence data.
KeyGen(params)  (pk,evk,sk): a key generation algorithm that takes the parameter params and outputs the public key pk for encryption, the evaluation key evk for computation over encrypted data, and the secret key sk for decryption.
Enc pk (m)  c: an encryption algorithm that takes a plaintext and a public key and outputs a ciphertext that contains the plaintext. The ciphertext maintains the maximum number of multiplication operations that can be done with it. Whenever a multiplication is performed with it, the resultant ciphertext has the maximum possible multiplication number that is decremented by one from that of the input ciphertext.
Dec sk (c)m: A decryption algorithm that takes a ciphertext and a secret key and outputs a plaintext in the ciphertext if the secret key is valid.
Operation params,evk (c 1 ,c 2 )c or ⊥: A binary operation is performed with the input ciphertexts and outputs a resultant ciphertext. If no more multiplication can be done with either c 1 or c 2 , it outputs ⊥. The operations supported depend on the scheme. Normally, they contain addition, multiplication, etc. Hybrid HE(Homomorphic Encryption) [18,19] provides a method of encoding a large amount of data on the existing SWHE and a method of quickly searching for given data among the encoded data; it easily finds the associated data with the searched data. Hybrid HE methods have been proposed in order to quickly find specific genome sequences in large amounts of genome sequence information. In [18,19], the genome sequencing data are stored in files of variations call format (VCF). Figure 2 shows the method used to create an encrypted database from the genome sequence data.  The base sequence information includes all genotype information, such as chromosome number and nucleotide sequence position, and it is stored in a VCF file. A set of base sequence information is retrieved from the files of the VCF to create a database. Subsequently, the retrieved base sequence information is encoded into an element of a polynomial ring, which is treated as a unit storage of the database. Table 1 shows the format of a VCF file that stores the base sequence information. In Table 1, in the VCF file descriptions, each line consists of (ch i , pos i , SNPs i ). ch i is an identifier of a chromosome and it is a value in the range 1-22 X and Y. pos i indicates the position information and it is a positive integer value. SNPs i contains the information of the REF base and ALT base sequence, respectively. The REF/ALT base sequences that are stored in SNPs i consist of a number of SNPs; SNP refers to one base and each base is one of A, T, G, and C. It is represented using two bits according to the following rules: A → 00, T → 01, G → 10, C → 11 m SNP refers to the maximum number of REF/ALT base sequences that can constitute a key in a unit data stored in the database and, thus, be compared with query data. If m SNP is two, then the key part is made with two bases from REF, ALT, or both. Therefore, as m SNP increases, the number of alleles that is used as a key in the REF and ALT bases is increased.  T  1  161235657  G  T  1  161235981  G  A  1  161237503  -TTTTTTGT  1  161239028  AG  -1  161239142  A  G  1  161239346  G  T  1  161239470  C  T  1  161239788  -AA  1  16123978  C  T  1  161240641 TGAT - In Table 1, we explain how to encode each line in a VCF file. Each line in the VCF file is converted to a (d i , α i ) pair. Each d i is calculated while using ch i and pos i values, and α i is calculated using the SNPs i value, for i = 1, . . . , N, where N is the size of the database, as follows: In Equation (2) (1) and (2) is embedded as a term of an element in a polynomial ring. Thus, if we suppose that l SNP /2 ≤ m SNP , we can represent a database as an element in a polynomial ring if the number of values inserted is less than the maximum possible degree of the element, as follows: Let us explain how a query and search operation is created. To search a (d, α) pair in DB(x), we first generate a query polynomial Q(x) = X −d . Subsequently, the polynomial is multiplied with DB(x) over the defined polynomial ring. The constant term of the multiplication result is then extracted. Let this be β. Subsequently, we compare β with α. If both of them are the same, (d, α) is in DB(x), or else it is not considered.
Owing to [18], we can perform such a search operation with an encrypted database and an encrypted query. After encrypting DB(x) to Enc(DB(x)) and Q(x) to Enc(Q(x)), it is possible to perform multiplication with both ciphertexts in order to obtain the ciphertext of DB(x) × Q(x). Thus, we can obtain DB(x) × Q(x), which can be easily transformed to the search result after decrypting the ciphertext.
Unfortunately, hybrid HE cannot be applied directly as a means for our purpose; the reasons are, as follows. The first is the problem of data size. Hybrid HE deals with data of size less than 30 bits [18]; it does not describe how to deal with data of longer bit length, such as RRN. Second, if the number of data increases, to search for a large amount of data, we need to use more than a single polynomial DB(x) because of the limitation of the maximum degree of polynomials in the ring. The database of multiple encrypted polynomials must be efficient without wasting space, and they should be wisely organized for efficient search operations. However, in [18], an extension of database to deal with data of longer length and more numbers is not considered.

Related Work
In this subsection, we examine previous research on encrypted data retrieval using HE that is based on their comparison operations. In general, in HE, comparing whether the encrypted data are identical to the queried data is known to be inefficient, because it requires a high circuit depth [21][22][23].
Togan and Pleşca [24] designed a model for comparison between hidden plaintexts of binary strings in FHE(Fully Homomorphic Encryption)-based encrypted data, and implemented it using the HElib [25][26][27] library. In Reference [24], the following method was applied to compare two ciphertexts, including n-bit length values.
A comparison operation using Equation (4) is performed, as follows. To find the larger of the two input ciphertexts, X and Y, the values are compared in order from the most significant bit to the least significant bit. The result of the comparison operation for each bit is stored as a ciphertext and, as a last step, the comparison result of x 0 and y 0 bits and the stored result ciphertexts are operated together to calculate the final comparison result. When the corresponding model is implemented with FHE and a comparison operation is performed on an 8-bit value, an operation to find the equality of the data takes approximately 5 s and the comparison operation takes approximately 10 s in a conventional PC setting. In addition, it was confirmed that, as the bit length of the encrypted data increases, the time that is required for the operation also increases linearly.
Carlton [28] proposed a comparison protocol when integers were encrypted with FHE; he implemented his method using SageMath [29] in Python. In Reference [28], the threshold function was defined and applied, as shown in Equation (5).
In Equation (5), when an addition operation is performed on ciphertexts m 1 and m 2 , the result becomes 0 if it exceeds the threshold t. Therefore, if the operation result of m 1 + m 2 is 0, then it can be seen that the actual value is t or more. The protocol proposed in [28] provides a comparison function for integer data, not binary numbers, and takes approximately 0.2 s to execute. However, the protocol is difficult to use in HE, which makes it difficult to apply the threshold.
Bonte, C. and Iliashenko, I. worked to increase the efficiency of these comparisons in their pattern matching study [30]. They approximate the terms of the OR gate that make up the comparison operation of pattern matching while using a low degree multilinear polynomial that is based on the Razborov-Smolensky method [31,32]. This approach is efficient because the comparison circuit can have a multiplication depth that is independent of the length of the pattern to be matched. However, the application of this technique was not considered, because the RRN to be searched in this study is relatively short.
In Laine, K's work [33], a method was proposed and implemented to determine whether to match the encrypted string based on SEAL [34]. It employs cuckoo hashing [35] to compare strings of arbitrary length. This method takes approximately 225 s to compare 10,000 bits of data. However, because of the limitation of SEAL, there is a disadvantage that the number of operations for comparing whether or not data are identical is limited.
In addition, studies have been proposed to modify the comparison operation according to the purpose of the application or to efficiently perform the comparison operation on multiple data [36][37][38]. The study [36] proposed an application of FHE for expressing floating-point numbers, such as IEEE 754, and designed a specialized comparison operation while considering the sign, exponent, and fraction of real numbers in floating-point format. In the study [37], a method for efficiently finding MAX values among multiple data was proposed through parallel processing. For data, each comparison operation with other values is performed in parallel and logical operations are performed on the results of the comparisons to determine which value is MAX. In the study of [38], comparison operation is performed to determine the case/special character of ASCII code and check the string length, and an encoding method that is suitable for this was developed. It extends the ASCII encoding to the slot inside the ciphertext in a form that is easy to compare.

The Structure of the Korean Residence Registration Numbers (RRNs)
This subsection describes RRNs that will aid in an understanding of this paper. All Korean citizens must get an RRN immediately after birth and all citizens have a unique 13-digit RRN. The inventors of RRN in Korea used the social security number system used in the United States as a reference. Because every citizen has a unique RRN, the RRN is used by many Korean websites and smartphone applications as a weak secret information source for identification and authentication. Therefore, when an individual's RRN is exposed, the damage can be greater than that of other sensitive information, and a study of [3,4] to actually collect RRN has also been proposed.
The structure of the RRN to be described in this subsection follows the notation of Choi, D. et al. [3]. RRN consists of 13 decimal digits, such as "ABCDEF-GHIJKLM". The meaning of the numbers is as follows:

Search Algorithm for Encrypted Data
In this work, we propose a method for efficiently storing and searching a large number of encrypted RRN data. For this, we modify the hybrid HE and design it to store the RRN data efficiently. Figure 3 presents an overview of the proposed scheme. There are two actors in the system. One is the user who generates private, evaluation, and public keys using hybrid HE [18,19]. The public and evaluation keys are sent to the other actor, the server, which manages the encrypted database and performs query operations with the querying ciphertexts from users. The proposed scheme supports two operations: the database setup and query operation. Figure 3a shows the database setup operation. It consists of three steps. The first step is the formatting of the data. Each RRN is formatted to a number of (d, α) pairs, depending on the number of databases (nDB) used. For simplicity, we assume that an RRN is formatted to a single (d, α) pair. Subsequently, each (d, α) pair is embedded into a polynomial, which makes the coefficient of the dth term as α.

Overview of the Proposed Work
Electronics 2021, 10, 484 Unfortunately, the overview does not explain the proposed scheme exactly. of the limitation in shown in [18], an RRN cannot be represented with a single ( , A single pair can only contain 22 bits; thus, we propose a method to encode an RR multiple ( , ) pairs and make the database setup and the query operation effici We use two approaches to perform this operation. The first approach is to us ple pairs ( , ), ( , ), … , ( , ), where d is fixed but is different. Because only contain one of the dth term in a polynomial, we need at least polyno It is possible that two different RRNs can be of the same ds, but different αs. In this case, a new polynomial is generated, and the later RRN is embedded into the newly generated polynomial. The final step is encryption, where the polynomial is encrypted using the encryption scheme that is specified in [18]. The encrypted database is constructed as a result of encryption. Figure 3b shows the query operation. In the database setup step, the RRN to be queried is formatted and then encoded to a single term polynomial. Afterwards, the polynomial is encrypted and sent to the server. Subsequently, the server performs a multiplication operation between the delivered query polynomial and each of the stored polynomials in the database without decrypting them. Because the scheme presented in [18] is a homomorphic encryption, the result of the multiplication of the ciphertext is a ciphertext that contains the multiplication result of the two plaintext polynomials hidden in the two input ciphertexts. The result of multiplication is sent back to the user; then, the user performs decryption with the received ciphertexts. It checks that there is α' (one chunk from the input RRN being queried) among the decryption results. If so, the queried polynomial is in the database, or else, it is not.
Unfortunately, the overview does not explain the proposed scheme exactly. Because of the limitation in shown in [18], an RRN cannot be represented with a single (d, α) pair. A single pair can only contain 22 bits; thus, we propose a method to encode an RRN with multiple (d, α) pairs and make the database setup and the query operation efficient.
We use two approaches to perform this operation. The first approach is to use multiple pairs (d, α 0 ), (d, α 1 ), . . . , (d, α n−1 ), where d is fixed but α i s is different. Because we can only contain one α i of the dth term in a polynomial, we need at least n polynomials to store one RRN. In addition, if two RRNs need to be stored, but they have the same d, the number of polynomials needed increases by n. Another approach is to use multiple ds. In this case, the number of polynomials needed can be less than the first case, because both the d-part and α-part can be used to store the RRN. However, the number of ciphertexts needed for a query is increased because the number of ds used is more than one. In the next subsection, we provide the details of each step of the two approaches. We assume that the number of ds used is denoted as nDB, the number of databases, and the number of αs used is the number of columns, nCol.
From a privacy point of view, in the query process (4) of Figure 3b, the result of the data items to be searched is stored as a constant term of polynomial α i s and the user has to check the constant terms. Therefore, there are risks of leakage of server database information, as follows: First, some of the internal information of the server database might be included in coefficients due to the computation result of (4). Second, the user can obtain the data α i of another user who shares the same value as his/her d' as the query result.
The solutions to these problems are covered in the study [18], and the same approach can be applied to our proposed method. First, we apply the technique [39] of converting the polynomial RLWE encryption to the constant LWE encryption, which is used in the bootstrapping process of FHE, in order to extract the constant term of the search result. The server can extract constant terms in encrypted form by applying this conversion procedure to c i s, and prevent the leakage of unqueried information of the coefficients. Next, the one-way hash function can be applied for safe comparison in (6) of Figure 3b. If the hashed values of the data items are stored in the sever database, the user performs a comparison of the hashed result in (6), so that the user can only check the equality without knowing the information stored in the database. Additionally, exposure to indirect information through checking the number of c i s in the query result cRet can be prevented in a way that the server includes arbitrary values that are encrypted in cRet.

Data Formatting for RRN
We provide details of how to format an RRN into a set of (d, α) pairs in this subsection. We can choose different formatting strategies depending on the nDB value, as shown in Figure 4. Figure 4a shows the case where nDB is one. In this case, three αs are required to represent one RRN. It is to be noted that both d and α can contain up to 11 bits. However, to represent whether there is a value in the bit or not, the most significant bit of α is used as a flag bit. Thus, each of three-digit numbers in an RRN are covered by each α. In addition, the 3-6th digits of an RRN represent the birthday of the holder. Thus, this number is less than 2047. Therefore, we can contain those digits in d. Figure 4b shows the case of nDB = 2. In this case, there are two ds; hence, they contain up to 22 bits. Because the first digit in the second part of RRN can be only one of 1-4, all seven digits can be contained in two ds. Figure 4c represents the case where nDB = 4. In this case, all the digits in an RRN can be contained in the four ds, and we do not have to put any digit in the α part. However, the most significant bit of α is set to one to represent whether a given d value is used. Figure 4d depicts the relation between nDB and the number of α values needed to represent an RRN. From the table, we can see that the maximum nDB that we need to consider is four.
used as a flag bit. Thus, each of three-digit numbers in an RRN are covered by each α. In addition, the 3-6th digits of an RRN represent the birthday of the holder. Thus, this number is less than 2047. Therefore, we can contain those digits in d. Figure 4b shows the case of nDB = 2. In this case, there are two ds; hence, they contain up to 22 bits. Because the first digit in the second part of RRN can be only one of 1-4, all seven digits can be contained in two ds. Figure 4c represents the case where nDB = 4. In this case, all the digits in an RRN can be contained in the four ds, and we do not have to put any digit in the α part. However, the most significant bit of is set to one to represent whether a given d value is used. Figure 4d depicts the relation between nDB and the number of α values needed to represent an RRN. From the table, we can see that the maximum nDB that we need to consider is four.

Data Encoding and Building Databases
After formatting, the data are encoded to organize databases or to make a query. We focus on making databases with a large number of formatted RRNs, as the case of making a query is encoding a single formatted RRN. The encoding comprises two steps. The first step is embedding the formatted RRNs into polynomials. Figure 5 depicts this step. It shows that each ( , ) pair constitutes a single term in a polynomial. In addition, at = , when the two pairs of formatted ( , ) and ( , ) are encoded, they are stored in two separate polynomials. Figure 5a explains the case when a single d is used with multiple αs. In this case, depending on the number of αs used, the number of polynomials that is newly created when two RRNs collide with regard to their d values is considered. Figure  5b deals with the case where two ds are used to format an RRN value. As fewer numbers

Data Encoding and Building Databases
After formatting, the data are encoded to organize databases or to make a query. We focus on making databases with a large number of formatted RRNs, as the case of making a query is encoding a single formatted RRN. The encoding comprises two steps. The first step is embedding the formatted RRNs into polynomials. Figure 5 depicts this step. It shows that each (d, α) pair constitutes a single term in a polynomial. In addition, at d 0 = d 1 , when the two pairs of formatted (d 0 , α 0 ) and (d 1 , α 1 ) are encoded, they are stored in two separate polynomials. Figure 5a explains the case when a single d is used with multiple αs. In this case, depending on the number of αs used, the number of polynomials that is newly created when two RRNs collide with regard to their d values is considered. Figure 5b deals with the case where two ds are used to format an RRN value. As fewer numbers of αs are mapped to a single d, the number of polynomials created in a collision, which is, the event that any of d in two RRNs is matched occurs is decreased.
After the polynomials are generated, they are encrypted using the scheme presented in [18]. The encrypted polynomials are stored in the server as databases. It is to be noted that, even if they are encrypted, we can perform search operations with them if we have a ciphertext to be queried.

Search Protocol
We provide the details of the search protocol shown in Figure 6. The query polynomial is encrypted with the public key, which is used to encrypt the databases in the server, as shown in step (2) of Figure 6. The server performs the operations in step (4), and the results are returned to the database. Owing to [18], the multiplication result can be much shorter than a ciphertext in the database. Thus, the size of the query result can be processed online. The user performs step (6) after the query result cRet is delivered. If the decryption result is matched to the αs generated from the queried RRN, it concludes that the queried RRN exists in the database. of αs are mapped to a single d, the number of polynomials created in a collision, which is, the event that any of d in two RRNs is matched occurs is decreased. After the polynomials are generated, they are encrypted using the scheme presented in [18]. The encrypted polynomials are stored in the server as databases. It is to be noted that, even if they are encrypted, we can perform search operations with them if we have a ciphertext to be queried.

Search Protocol
We provide the details of the search protocol shown in Figure 6. The query polynomial is encrypted with the public key, which is used to encrypt the databases in the server, as shown in step (2) of Figure 6. The server performs the operations in step (4), and the results are returned to the database. Owing to [18], the multiplication result can be much shorter than a ciphertext in the database. Thus, the size of the query result can be processed online. The user performs step (6) after the query result cRet is delivered. If the decryption result is matched to the s generated from the queried RRN, it concludes that the queried RRN exists in the database.

Performance Evaluation
In this section, we evaluate the performance of the proposed scheme with real data and then compare it with that of other methods. For this, we implement the search operation on all of the methods and perform the experiment in the same environment as the proposed method. The compared methods are implemented with open source libraries, such as HElib [24][25][26], HEAAN [40,41], nuFHE [42], and Python-Paillier [43]. We measure the execution times of operations, the size of the database to store the same size of data, the size of the search query, and the resultant ciphertexts of the search operation. For the measurement, the proposed method encodes a pair of d as a search keyword and a data item α to be searched while using each encryption algorithm. However, it is impossible for other encryption algorithm to construct a database using a polynomial utilizing α as coefficient and d as degree, so, for other encryption algorithms, d and α are stored in bit encoding, except for Paillier ciphers, which are in integer encodings. Thus, parallel processing of data sharing keywords and configuration of multiple DBs were not considered, which are based on the polynomial encoding.

Search Algorithms for Other Methods Compared
For performance comparison, we implemented the FHE-based encryption data search algorithm that is described in Figure 7 using various homomorphic encryption libraries of HElib, HEAAN, and nuFHE. The performance of the other methods was then compared with the proposed method. The search is performed by subtracting the data to be searched for in each of the data stored in the database, and multiplying all of the result values. As a result, if there are data to be searched in the database, the result ciphertext encrypting 0 is returned and, if there is no data, the result ciphertext encrypting a nonzero value is returned.
cQ: It is a ciphertext that encrypts the query Q for search, cQ←Enc(Q).

Performance Evaluation
In this section, we evaluate the performance of the proposed scheme with real data and then compare it with that of other methods. For this, we implement the search operation on all of the methods and perform the experiment in the same environment as the proposed method. The compared methods are implemented with open source libraries, such as HElib [24][25][26], HEAAN [40,41], nuFHE [42], and Python-Paillier [43]. We measure the execution times of operations, the size of the database to store the same size of data, the size of the search query, and the resultant ciphertexts of the search operation. For the measurement, the proposed method encodes a pair of d as a search keyword and a data item α to be searched while using each encryption algorithm. However, it is impossible for other encryption algorithm to construct a database using a polynomial utilizing α as coefficient and d as degree, so, for other encryption algorithms, d and α are stored in bit encoding, except for Paillier ciphers, which are in integer encodings. Thus, parallel processing of data sharing keywords and configuration of multiple DBs were not considered, which are based on the polynomial encoding.

Search Algorithms for Other Methods Compared
For performance comparison, we implemented the FHE-based encryption data search algorithm that is described in Figure 7 using various homomorphic encryption libraries of HElib, HEAAN, and nuFHE. The performance of the other methods was then compared with the proposed method. The search is performed by subtracting the data to be searched for in each of the data stored in the database, and multiplying all of the result values. As a result, if there are data to be searched in the database, the result ciphertext encrypting 0 is returned and, if there is no data, the result ciphertext encrypting a non-zero value is returned.

Search Algorithms for Other Methods Compared
For performance comparison, we implemented the FHE-based encryption data search algorithm that is described in Figure 7 using various homomorphic encryption libraries of HElib, HEAAN, and nuFHE. The performance of the other methods was then compared with the proposed method. The search is performed by subtracting the data to be searched for in each of the data stored in the database, and multiplying all of the result values. As a result, if there are data to be searched in the database, the result ciphertext encrypting 0 is returned and, if there is no data, the result ciphertext encrypting a nonzero value is returned. We also implemented a search algorithm that could work with the Paillier cryptosystem, which is an additive homomorphic encryption algorithm, as shown in Figure 8. To implement this, we used Python-Paillier, an implementation of the Paillier cryptosystem [9] with Python language. Unlike the FHE schemes, the Paillier scheme does not support cQ: It is a ciphertext that encrypts the query Q for search, cQ←Enc(Q  We also implemented a search algorithm that could work with the Paillier cryptosystem, which is an additive homomorphic encryption algorithm, as shown in Figure 8. To implement this, we used Python-Paillier, an implementation of the Paillier cryptosystem [9] with Python language. Unlike the FHE schemes, the Paillier scheme does not support the multiplication operation of two ciphertexts. Here, the ciphertext obtained by subtracting the data to be searched and the data stored in the database are returned as a result. If there are data to be searched in the database, there is a 0 in the result ciphertext; if there are no data, there is non-zero value in the result ciphertext.

Evaluation Environment
For the performance evaluation of the search algorithm, Table 2; Table 3 show the hardware environment and parameter settings in which the algorithm was performed.

Evaluation Environment
For the performance evaluation of the search algorithm, Table 2; Table 3 show the hardware environment and parameter settings in which the algorithm was performed.  Table 3. Parameters used for each library.

Performance Evaluation of the Proposed Algorithm
In this subsection, the execution time of the proposed method implemented on the environment that is shown in Table 2 and the size of the ciphertext are compared for various nDB environments. First, the execution times of each operation are compared, and then the size of the ciphertext is compared.

Comparison of the Execution Time
Encryption Execution Time: when the number of databases is 100,000, one million, or 10 million, the time that is required to encrypt the database where nDB is from 1 to 4 is shown in Figure 9a. It can be seen that, when the number of databases is 100,000, 1 million, or 10 million, encryption is performed fastest when the nDB is three.
Query Encryption Execution Time: when the number of databases is 100,000, one million, or 10 million, the time that is required to encrypt queries where nDB is from 1 to 4 is shown in Figure 9b. Query data are generated as many as nDB, and it can be seen that the execution time for query encryption increases as the nDB increases.
Search Execution Time: when the number of data in the database is 100,000, one million, or 10 million, the time that is required to search for data in the database where nDB is from 1 to 4 is shown in Figure 9c. Similar to the encryption execution time, when the number of data in the database is 100,000, one million, or 10 million, it can be seen that the data search is performed fastest when the nDB is three.
Result Decryption Time: when the number of databases is 100,000, one million, or 10 million, the time required to decrypt the result ciphertext where nDB is from 1 to 4 is shown in Figure 9d. It can be seen that when the number of data in the database is 100,000, one million, or 10 million, and when the nDB is three, the decryption is performed faster with a slight difference than in other cases.

Comparison of the Size of the Result Ciphertexts
Size of Ciphertext for Database Representation: when the number of databases is 100,000, one million, or 10 million, the size of the ciphertext for the database where nDB is from one to four is as shown in Figure 10a. When the number of data in the database is 100,000, one million, or 10 million, and, when the nDB is three, the size of the database is the smallest. Ciphertext Size for Query Result: when the number of databases is 100,000, 1,000,000, or 10 million, the size of the result ciphertext where nDB is from 1 to 4 is as shown in Figure 10b. Similar to the size of the database, when the number of data in the database is 100,000, one million, or 10 million, it can be seen that the size of the result ciphertext is the smallest when the nDB is three.

Performance Comparison with the Search Algorithm to Be Compared
In this subsection, we compare the performance of the proposed algorithm with an optimal nDB setting, which we found as a result in Section 4.3, to the other methods that were previously mentioned. Comparisons are only performed when the number of data in the database is 10,000, owing to performance limitations of the other methods. The execution time and size of the ciphertext are compared to execute the search operation.

Comparison of Search Algorithm Execution Time
Search Execution Time: when the number of data points is 10,000, the time that is required for the search is shown in Figure 11. Among the methods to be compared, Python-Paillier is the fastest; however, the performance is approximately 43.9 times worse than the proposed algorithm. Search Result Decryption Time: Figure 12 shows the time that is required for decoding. In the search algorithm for performance comparison, it can be seen that nuFHE performs the operation the fastest; however, it also takes 77.5 times longer than the proposed algorithm. Size of Ciphertext for Database Representation: when the number of databases is 10,000, the size of the encrypted database of each method is shown in Figure 13. It can be seen that the ciphertexts of nuFHE are generated in the smallest size. The proposed algorithm is 2.5 MB, 250 bytes per record, and it can be considered to be practical. Ciphertext Size for Query Result: Figure 14 show the size of the ciphertext in the search query result. It can be seen that nuFHE is generated in the smallest size. The result ciphertext size of the proposed algorithm is 625 kB, which is affordable when considering the available memory/storage size of the current computer system.

Review of the Evaluation Results
In this subsection, based on the results of Sections 4.3 and 4.4, once again we discuss the nDB value setting of the proposed method and its search performance. Because the performance of the search operation differs according to the nDB setting, the optimal nDB value for our environment was found in this study. It was found that in a general PC environment, data can be searched within 12 s, even in a database in which 10 million RRN data are contained. Table 4 summarizes the timings and parameter values for the fastest search for 100,000, one million, and 10 million RRN data. When the data are 100,000, one million, or 10 million, the speed of the search operation is best when the nDB is three; the search operation for 10 million data points can be performed within 12 s. From the result, we can see that the search operation can be performed up to 49,000 times faster than the search operation that was provided by the previous HE methods [24][25][26][40][41][42][43]. Table 5 summarizes the size and parameters of the ciphertext encrypted with the smallest size as a result of comparing the result ciphertext size of the search operation with the database, in which 100,000, 1,000,000, and 10 million RRN data are encrypted. This result can be used to reduce the overall search performance and ciphertext size values. In the search for RRNs, it can be seen that, when there are 100,000, one million, and 10 million data, the database and the resulting ciphertexts are created with the smallest size when the nDB value is three.

Discussion
In this section, we discuss some of the issues that need to be resolved or were not clearly addressed in the previous sections. The first issue is the possibility of a false positive when multiple ds are used. In order to resolve this issue, we employ the unused αs to store the hash of the formatted (d, α) pairs for an input RRN. After decrypting the query result, once a candidate matching is found, we extract the hash value from the unused αs in the decrypted one and then check whether the stored hash can be computed from the queried input. Owing to the one-wayness property, the probability of the false positive will be low if we use the hash algorithm whose output bit is around 20 bits.
Another issue is the possibility that this approach can be extended to store an arbitrary type of data. It is possible to store any binary data. The database in this paper acts as a so-called dictionary structure, where the data are organized as (key, value) pairs: ds act as keys and αs are the corresponding values. Therefore, any binary string can be stored in the proposed database structure.

Conclusions
In this work, we proposed a new search method over encrypted data, which complements the existing hybrid HE and helps to deal with a large number of RRNs. In the existing hybrid HE, there is a limit where a single database encoded and encrypted in a polynomial ring format can only store data of up to 11 bits as a key and value, respectively. Therefore, we designed and implemented an extended method to more efficiently store RRN data by expanding the bits of data that can be stored using multiple databases from the basic hybrid HE. Through experiments, we confirmed that the search operation can be performed up to 49,000 times faster than the search operation that was provided by the previous methods [24][25][26][40][41][42][43].
The result of this work solves the problem of the limitation on the size of the encrypted database for search operations because of the slow operation speed, which is one of the well-known problems in FHE-based solutions for searching the encrypted data. Based on this, it is expected that the proposed method can be practically used in the industrial field dealing with personal information data, such as credit information evaluation and personal health, which requires privacy protection.

Conflicts of Interest:
The authors declare no conflict of interest.