Next Article in Journal
Time Series Analysis of Cryptocurrency Prices Using Long Short-Term Memory
Next Article in Special Issue
Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees
Previous Article in Journal
On Edge Pruning of Communication Networks under an Age-of-Information Framework
Previous Article in Special Issue
MAC Address Anonymization for Crowd Counting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Privacy-Preserving Feature Selection with Fully Homomorphic Encryption

1
Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Japan
2
Computer Centre, Gakushuin University, 1-5-1 Mejiro, Toshimaku, Tokyo 171-8588, Japan
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(7), 229; https://doi.org/10.3390/a15070229
Submission received: 30 May 2022 / Revised: 21 June 2022 / Accepted: 26 June 2022 / Published: 30 June 2022
(This article belongs to the Special Issue Privacy Preserving Machine Learning)

Abstract

:
For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let D, F, and C be data, feature, and class sets, respectively, where the feature value x ( F i ) and the class label x ( C ) are given for each x D and F i F . For a triple ( D , F , C ) , the feature selection problem is to find a consistent and minimal subset F F , where ‘consistent’ means that, for any x , y D , x ( C ) = y ( C ) if x ( F i ) = y ( F i ) for F i F , and ‘minimal’ means that any proper subset of F is no longer consistent. On distributed datasets, we consider feature selection as a privacy-preserving problem: assume that semi-honest parties A and B have their own personal D A and D B . The goal is to solve the feature selection problem for D A D B without sacrificing their privacy. In this paper, we propose a secure and efficient algorithm based on fully homomorphic encryption, and we implement our algorithm to show its effectiveness for various practical data. The proposed algorithm is the first one that can directly simulate the CWC (Combination of Weakest Components) algorithm on ciphertext, which is one of the best performers for the feature selection problem on the plaintext.

1. Introduction

1.1. Motivation

This study proposes a secure feature selection protocol that works effectively as a preprocessor for traditional machine learning (ML). Let us consider a scenario where different data owners are interested in private ML model training (e.g., logistic regression [1], SVM [2,3], and decision tree [4,5]) on their combined data. There is a large advantage to securely training these ML models on distributed data due to competitive advantage or privacy regulations. Feature selection is the problem of finding a subset of relevant features for model training. Using well-chosen features can lead to more accurate models, as well as speedup during model training [6].
Consider a data set D associated with a feature set F and a class variable C, where all feature values x ( F i ) ( F i F ) and the corresponding class label x ( C ) are defined for each datum x D . In Table 1, for example, we show a concrete example. Given a triple ( D , F , C ) , the feature selection problem is to find a minimal F F that is relevant to the class C. The relevance of F is evaluated, for example, by  I ( F ; C ) , which measures the mutual information between F and C. On the other hand, F is minimal, if any proper subset of F is no longer consistent.
To the best of our knowledge, the most common method for identifying favorable features is to choose features that show higher relevance in some statistical measures. Individual feature relevance can be estimated using statistical measures such as mutual information and Bayesian risk. For example, at the bottom row of Table 1, the mutual information score I ( F 1 ; C ) of each feature F i to class labels is described. We can see that F 1 is more important than F 5 , because  I ( F 1 ; C ) > I ( F 5 ; C ) . F 1 and F 2 of Table 1 will be chosen to explain C based on the mutual information score. However, a closer examination of D reveals that F 1 and F 2 cannot uniquely determine C. In fact, we find x 2 and x 5 with x 2 ( F 1 ) = x 5 ( F 1 ) and x 2 ( F 2 ) = x 5 ( F 2 ) but x 2 ( C ) x 5 ( C ) . On the other hand, we can see that F 4 and F 5 uniquely determine C using the formula C = F 4 F 5 while I ( F 4 ; C ) = I ( F 5 ; C ) = 0 . As a result, the traditional method based on individual feature relevance scores misses the correct answer.
Thus, we concentrate on the concept of consistency: F F is considered to be consistent if, for any x , y D , x ( F i ) = y ( F i ) for all F i F implies x ( C ) = y ( C ) . In machine learning research, consistency-based feature selection has received a lot of attention [8,9,10,11,12]. CWC (Combination of Weakest Components) [8] is the simplest of such consistency-based feature selection algorithms, and even though CWC uses the most rigorous measure, it shows one of the best performances in terms of accuracy as well as computational speed compared to other methods [7]. Throughout the proposed secure protocol, none of the parties learns the values of the data as all computations are done over ciphertexts. Next, the parties train an ML model over the pre-processed data using existing privacy-preserving training protocols (e.g., logistic regression training [13] and decision tree [14]). Finally, they can disclose the trained model for common use.
To design a secure protocol for feature selection, we focus on the framework of homomorphic encryption. Given a public key encryption scheme E, let E [ m ] denote a ciphertext of integer m; if E [ m + n ] can be computed from E [ m ] and E [ n ] without decrypting them, then E is said to be additive homomorphic, and if E [ m n ] can also be computed, then E is said to be fully homomorphic. Furthermore, modern public key encryption must be probabilistic: when the same message m is encrypted multiple times, the encryption algorithm produces different ciphertexts of E [ m ] .
Various homomorphic encryption schemes have been proposed to satisfy these homomorphic properties over the last two decades. The first additive homomorphic encryption was proposed by Paillier [15]. Somewhat homomorphic encryption that allows a sufficient number of additions and a limited number of multiplications has also been proposed [16,17,18], and we can use these cryptosystems to compute more difficult problems, such as the inner product of two vectors. Gentry [19] proposed the first fully homomorphic encryption (FHE) with an unlimited number of additions and multiplications, and since then, useful libraries for fully homomorphic encryption have been developed, particularly for bitwise operations and floating-point operations. TFHE [20,21] is known as the fastest fully homomorphic encryption that is optimized for bitwise operations.
For the private feature selection problem, we use TFHE to design and implement our algorithm. In this case, we assume two semi-honest parties A and B : each party complies with the protocol but tries to infer as much as possible about the secret from the information obtained. The parties have their own private data D A and D B and they jointly compute advantageous features for D A D B while maintaining their privacy. The goal is to jointly compute the CWC algorithm result on D = D A D B without revealing any other information.
In this paper, we describe the simplest case where there are two data owners, and they perform the cooperative secure computation. More generally, there are many data owners, and they encrypt with their own public keys. Since homomorphic operations cannot be applied to two data encrypted with different public keys, a simple approach would be for the server to attempt to re-encrypt them with some common public key. However, there is no guarantee that the server or the new public key can be trusted. To solve this problem, the framework of multi-key homomorphic encryption was proposed. This allows FHE operations on data encrypted with different keys, i.e., we can extend the two-party computation model to a more general case because TFHE has the required property. Using this property, its application to the framework of oblivious neural network inference [22] has been proposed.
This should be a realistic requirement, if one wants to draw some conclusions from data that are privately distributed over more than one party. Multi-party computation (MPC) can provide effective technical solutions to realize this requirement in many cases. In MPC, certain computations that essentially rely on the distributed data are performed through cooperation among the parties. In particular, fully homomorphic encryption (FHE) is one of the critical tools of MPC. One of the most significant advantages of FHE-based MPC is thought to be that FHE realizes outsourced computation in a simple and straightforward manner: parties encrypt their private data with their public keys and send the encrypted data to a single trusted party with sufficient computational power to perform the required computation; although the computational results of the trusted party may be incorrect, if some malicious parties send incorrect data, honest parties are at least convinced that their private data have not been stolen as far as the cryptosystem used is secure. In contrast, when a party shares his/her secret with other parties to perform MPC, even if it uses a secure secret sharing scheme, collusion of a sufficient number of compromised parties may reveal the party’s secret. In general, it is difficult to prove the security of MPC protocols for the situation where we cannot deny the existence of active malicious parties, and hence, the security is very often proven assuming that all the parties are at worst semi-honest. In reality, however, even this relaxed assumption is unable to hold. Thus, the property that a party can protect its private data only relying on its own efforts should be counted as an important advantage of FHE-based MPC.
On the other hand, the current implementations of FHE are thought to be significantly inefficient, and consequently, their ranges of application are actually limited. This is currently true, but may not be true in the future: the Goldwasser–Mmicali (GM) cryptosystem [23] is considered as the first scheme with provable security. Unfortunately, because the GM cryptosystem encrypts data in a bitwise manner, it has turned out not to have sufficient efficiency in time and memory to be used in the real world. In 2001, however, RSA-OAEP was finally proven to have both provable security and realistic efficiency [24,25], and is widely used through SSL/TLS. Thus, studying FHE-based MPC does not merely have theoretical meaning, but also will yield significant contributions in terms of application to the real world in the future.
In this paper, we propose an MPC protocol which relies on FHE-based outsourced computation as well as mutual cooperation among parties. The target of our protocol is to perform the computation of CWC, a feature selection algorithm known to be accurate and efficient, preserving the privacy of the participating parties. If we fully perform CWC by FHE-based outsourced computation, we have to pay unnecessarily large costs in time in the phase of sorting the features of CWC. Therefore, in our proposed scheme, we add ingenuity so that two parties cooperate with each other to sort the features efficiently.
Converting CWC into its privacy-preserving version based on different primitives of MPC—for example, based on secret sharing techniques—is not only interesting but also useful both in theory and in practice. We will pursue this direction as well in our future work.

1.2. Our Contribution and Related Work

Table 2 summarizes the complexities of the proposed algorithms in comparison to the original CWC on plaintext. The baseline is a naive algorithm that can simulate the original CWC [8] over ciphertext using TFHE operations. The bottleneck of private feature selection exists in the sorting task over ciphertext, as we mention in the related work below. Our main contribution is the improved algorithm, shown as ‘improved’, which significantly reduces the time complexity caused by the sorting task. We also implement the improved algorithm and demonstrate its efficiency through experiments in comparison to the baseline.
There are mainly two private computation models, secret sharing-based MPC and public key-based MPC, and secret sharing-based MPC currently has an advantage. On the other hand, we focus on the convenience of FHE. Public key-based MPC can establish a simple mechanism to obtain results while keeping the learning model possessed by the server and the personal information of many data owners confidential from each other, relying only on cryptographic strength. The secret sharing-based MPC is faster but requires at least two trusted parties that do not collude with each other, which creates a different problem to cryptographic strength.
Other drawbacks of public key-based MPC are its security against the chosen plaintext attack (CPA) and computational cost. TFHE is, however, computationally secure against the chosen ciphertext attack (CCA), which assumes a stronger adversary than CPA so that an attacker cannot obtain meaningful information from plaintext or ciphertext within polynomial time.
In this section, we discuss related work on private feature selection as well as the benefits of our method. Rao [26] et al. proposed a homomorphic encryption-based private feature selection algorithm. Their protocol allows the additive homomorphic property only, which invariably leaks statistical information about the data. Anaraki and Samet [27] proposed a different method based on the rough set theory, but their method suffers from the same limitations as Rao et al., and neither method has been implemented. Banerjee et al. [28], and Sheikhalishahi and Martinellil [29] have proposed MPC-based algorithms that guarantee security by decomposing the plaintext into shares, as a different approach to the private feature selection, while achieving cooperative computation. Li et al. [30] improved the MPC protocol on the aforementioned flaw and demonstrated its effectiveness through experiments.
These methods avoid partial decoding under the assumption that the mean of feature values provides a good criterion for feature selection. This assumption, however, is heavily dependent on data. The most important task in general feature selection is feature value-based sorting, and CWC and its variants [7,8,11] demonstrated the effectiveness of sorting with the consistency measure and its superiority over other methods. On ciphertext, this study realizes the sorting-based feature selection algorithm (e.g., CWC).
We focus on the learning decision tree by MPC [31] as another study that employs sorting for private ML, where the sorting is limited to the comparison of N values of fixed length in O ( N log 2 N ) time by a sorting network. In the case of CWC, however, the algorithm must sort N data points, each of which has a variable length of up to M, so a naive method requires O ( M N log N + N log 2 N ) time. Our algorithm reduces this complexity to O ( M N + N log 2 N + N log N log M ) , which is significantly smaller than the naive algorithm depending on M and N. Through experiments, we confirm this for various data, including real datasets for ML.
Although sorting itself is not ML, a fast-sorting algorithm is an important preprocess for ML model training. In the previous result [7], it was shown that sorting-based feature selection can classify with higher accuracy than other heuristic methods. Furthermore, preprocessing by sorting has proven to be an important task in decision tree model training [31]. On the other hand, it is also well known that sorting can speed up ML model training. For example, in SVM, which is widely used in text classification and pattern recognition, the problem of finding the convex hull of n points in Euclidean space can be reduced from O ( n 2 ) to O ( n l o g n ) time by preprocessing it with an appropriate sorting algorithm.

2. Preliminaries

2.1. Consistency Measure

First of all, we review the notion of the consistency measure employed in our problem. A consistency measure μ : 2 F [ 0 , ) for a feature set F is a function to represent how far the data deviate from a consistent state and is required to satisfy determinisity ( μ ( F ) = 0 if and only if F is consistent) and monotonicity ( F G implies μ ( F ) μ ( G ) ). The following consistency measures satisfy this requirement.
  • μ bin ( F ) = 0 , F is consistent; 1, otherwise (binary consistency [11])
  • μ icr ( F ) = x Pr ( F = x ) max c Pr ( F = x , C = c ) (ICR [32])
  • μ rs ( F ) = 1 c F D C = c | D | , F D = { D F = x D F = x D } (rough set [33])
  • μ ie ( F ) = x c c | D F = x , C = c | · | D F = x , C = c | | D | 2 (inconsistent pair [34])
The fully homomorphic encryption used in this study is specialized for binary operations. Therefore, among these consistency measures, we employ μ bin .

2.2. CWC Algorithm over Plaintext

We generally assume that the dataset D associated with F and C contains no errors, i.e., if x ( F i ) = y ( F i ) for all i, x ( C ) = y ( C ) . When D contains such errors, they are removed beforehand and D contains not more than one x D with the same feature values.
In Algorithm 1, we describe the original algorithm for finding a minimal consistent feature for two-class data. Given D with F i and C = { 0 , 1 } , a datum x D of x ( C ) = 1 is referred to as a positive datum and y D of y ( C ) = 0 is referred to as a negative datum. Let n represent the number of positive data and m = | D | n . We consider two-dimensional bit array B i [ 1 . . n ] [ 1 . . m ] such that, for any 1 p n and 1 q m , B i [ p ] [ q ] = 0 if x p ( F i ) = y q ( F i ) and B i [ p ] [ q ] = 1 otherwise, where x p is the p-th positive datum ( 1 p n ) and y q is the q-th negative datum ( 1 q m ) . B i [ p ] [ q ] = 0 means that F i is not consistent with the pair ( x p , y q ) because x p ( F i ) = y q ( F i ) despite x p ( C ) y q ( C ) . Recall that F i is said to be consistent only if x ( F i ) = y ( F i ) implies x ( C ) = y ( C ) for any x , y D . As a result, | | B i | | is defined to be the number of 1s in B i .
For a subset F F , F is said to be consistent, if for any p [ 1 , n ] and q [ 1 , m ] , there exists i such that F i F and B i [ p ] [ q ] = 1 hold. CWC uses this to remove irrelevant features from F in order to build a minimal consistent feature set. We note that finding the smallest consistent feature set is clearly NP-hard. There is a simple reduction from the minimum set cover to this problem as follows: given S 1 , , S k S ( | S | = n ) with the intention that S i is regarded as B i in CWC, covering any element of S corresponds to the condition that for any j { 1 , , n } , there exists at least one i such that B i [ j ] = 1 .
Since the point of B i is that it contains information, for every pair of data across different classes, whether F i is consistent with the pair or not, it can be easily extended to multi-class data that have more than two classes. Although we focus on two-class data for the sake of simplicity, for multi-class data, the m n -factors in the complexities are replaced with the number of pairs of data across different classes, which is upper bounded by | D | ( | D | 1 ) / 2 . Moreover, in extending to multi-class data, it is convenient to consider B i as an appropriately serialized one-dimensional bit string because there is no way to represent it as a dense two-dimensional bit array. Hence, in what follows, we treat B i as a bit string.
Algorithm 1 The algorithm CWC for plaintext
1:
Input: A dataset D associated with features F = { F 1 , , F k } and class C = { 0 , 1 } .
2:
Output: A minimal consistent subset S F .
3:
Sort F 1 , , F k in the incremental order of | | B i | | .
4:
Let π be the sorted indices of { 1 , , k } .
5:
for  i = 1 , , k
6:
    if F { F π [ i ] } is consistent then
7:
        update F F { F π [ i ] }
8:
    end if
9:
end for
Table 3 shows an example of D, and Table 4 shows the corresponding B i . Consider the behavior of CWC in this case. All B i ( 1 i 4 ) are computed as preprocessing. Then, the features are sorted by the order | | B 2 | | = 5 | | B 4 | | = 5 | | B 3 | | = 6 | | B 1 | | = 8 and π = ( 2 , 4 , 3 , 1 ) . By the consistency order π , CWC checks whether F π [ i ] can be removed from the current F. Using the consistency measure, CWC removes F 2 and F 4 and the resulting { F 1 , F 3 } is the output. In fact, we can predict the class of x by the logical operation x ( F 1 ) ¯ x ( F 3 ) .

2.3. Security Model

2.3.1. Indistinguishable Random Variables

Let N denote the set of natural numbers. A function ϵ : N [ 0 , 1 ] is called negligible, if c > 0 , k , n k , ϵ ( n ) < 1 / n c . Let X = { X k k N } and Y = { Y k k N } be sequences of random variables such that X k and Y k are defined over the same sample space. We say that X and Y are indistinguishable, denoted by X c Y , if, and only if, Pr [ X n = Y n ] is a negligible function.

2.3.2. Security of Multi-Party Computation (MPC)

Although the discussion of this section can be extended to MPC schemes which involve more than two parties, merely for simplicity, we focus on the case where only two parties are involved.
A two-party protocol is a pair Π = ( P 1 , P 2 ) of PPT Turing machines with input and random tapes. Let x i be an input of P i and y i be an output of P i , respectively.
We assume a semi-honest adversary  A and consider a protocol ( A , P 2 ) , replacing P 1 in Π by A , where A takes x 1 as input and apparently follows the protocol. Let REAL Π , A ( x 1 , x 2 ) denote the random variable representing the output ( y 1 , y 2 ) of ( A , P 2 ) , and we define the class REAL Π , A = { REAL Π , A ( x 1 , x 2 ) } x 1 , x 2 = { y 1 , y 2 } x 1 , x 2 .
On the other hand, let F denote the functionality that the protocol Π is trying to realize, i.e., F is a PPT that simulates the honest ( P 1 , P 2 ) so that F ( x 1 , x 2 ) ( P 1 ( x 1 ) , P 2 ( x 2 ) ) . Here, we assume a completely reliable third party, denoted by F . In this ideal world, for this F and any adversary B acting as P 1 with input x 1 , possibly x 1 x 1 , we define the random variable IDEAL F , B ( x 1 , x 2 ) = ( B ( x 1 , F 1 ( x 1 , x 2 ) , F 2 ( x 1 , x 2 ) ) ) , where F i ( · , · ) denotes the i-th component of the output of F ( · , · ) for i = 1 , 2 . Similarly, we denote the class IDEAL F , B = { IDEAL F , B ( x 1 , x 2 ) } x 1 , x 2 .
Using such random variables, we define the security of protocol Π as follows.
Definition 1.
It is said that a protocol Π securely realizes a functionality F if, for any attacker A against Π, there exists an adversary B , and REAL Π , A c IDEAL F , B holds.
The definitions stated above can be intuitively explained as follows. Exactly conforming to the protocol, a semi-honest adversary A plays the role of P 1 to steal any secrets. The information sources which A can take advantage of are the following three:
1.
the input tape to P 1 ;
2.
the conversation with P 2 ;
3.
the execution of the protocol.
While the information that A can obtain from the first and third sources is exactly x 1 and y 1 , respectively, we call the information from the second source a view.
To denote it, we use the symbol View P 1 .
Since the protocol inevitably requires that A obtains the information of x 1 and y 1 , the security of the protocol questions what A can obtain, in addition to what can be computationally inferred from x 1 and y 1 . If there exists such information, its source must be View P 1 .
The security criterion of simulatability requires that View P 1 can be simulated on the input of x 1 and y 1 . In more formal terms, there exists a PPT Turing machine Sim that outputs a view on the input of x 1 and y 1 such that the output view cannot be distinguished from View P 1 by any PPT Turing machine. When View P 1 is simulatable, we see that Sim can generate by itself what Sim can obtain from View P 1 . Therefore, Sim cannot cannot obtain any information in addition to what Sim can compute from x 1 and y 1 .

2.3.3. IND-CPA

Indistinguishability against chosen plaintext attack (IND-CPA) is an important criterion for the secrecy of a public key cryptosystem. We let Π = ( Gen , Enc , Dec ) denote a public key cryptosystem consisting of key generation, encryption, and decryption algorithms. To describe IND-CPA, we introduce the IND-CPA game played between an adversary A and an oracle O : A is a PPT Turing machine, and k is the security parameter.
1.
O generates a public key pair ( s k , p k ) Gen ( 1 k ) .
2.
A generates two messages ( m 0 , m 1 ) of the same length arbitrarily and throws a query ( m 0 , m 1 ) to O .
3.
On receipt of ( m 0 , m 1 ) , O selects b { 0 , 1 } uniformly at random, computes c = Enc ( p k , m b ) , and replies to A with c.
4.
A guesses on b by examining c and outputs the guess bit b .
We view b and b as random variables whose underlying probability space is defined to represent the choices of the public key pair, b and b . The advantage of the adversary A is defined as follows to represent the advantage of A over tossing a fair coin to guess O ’s secret b:
Adv A = 2 · Pr [ b = b ] 1 .
when we let
Pr [ b = 0 | b = 0 ] = 1 2 + α 0 and Pr [ b = 1 | b = 1 ] = 1 2 + α 1 ,
We have
Adv A = α 0 + α 1
This definition of the advantage is consistent with the common definition found in many textbooks:
Adv A = Pr [ b = 0 | b = 0 ] Pr [ b = 1 | b = 0 ]
Definition 2.
A public key cryptosystem Π is secure in the sense of IND-CPA, or simply IND-CPA secure, if Adv A as a function in k is a negligible function.

2.4. TFHE: A Faster Fully Homomorphic Encryption

The proposed private feature selection is based on FHE. We review the TFHE [21], one of the fastest libraries for bitwise addition (this means XOR ‘⊕’) and bitwise multiplication (AND ‘·’) over ciphertext. On TFHE, any integer is encrypted bitwise: For -bit integer m = ( m 1 , , m ) , we denote its bitwise encryption by E [ m ] ( E [ m 1 ] , , E [ m ] ) , for short. These bitwise operations are denoted by f ( E [ x ] , E [ y ] ) E [ x y ] and f · ( E [ x ] , E [ y ] ) E [ x · y ] for x , y { 0 , 1 } and the ciphertexts E [ x ] and E [ y ] . The same symbol is used to represent an encrypted array. For example, when x and y are integers of length and , respectively, E ( x , y ) denotes
E [ x , y ] ( E [ x ] , E [ y ] ) ( ( E [ x 1 ] , , E [ x ] ) , ( E [ y 1 ] , , E [ y ] ) ) .
TFHE allows all arithmetic and logical operations via the elementary operations E [ x y ] and E [ x · y ] . In this section, we will consider how to build the adder and comparison operations. Let x , y represent -bit integers and x i , y i represent the i-th bit of x , y , respectively. Let c i represent the i-th carry-in bit and s i is the i-th bit of the sum x + y . Then, we can obtain E [ x + y ] by the bitwise operations of ciphertexts using s i = x i y i c i and c i + 1 = ( x i c i ) · ( y i c i ) c i . We can construct other operations such as subtraction, multiplication, and division based on the adder. For example, E [ x y ] is obtained by E [ x + ( y ) ] , where ( y ) is the bit complement of y obtained by y i 1 for all i-th bits. On the other hand, we examine the comparison. We want to obtain E [ x < ? y ] without decrypting x and y where x < ? y = 1 if x < y and x < ? y = 0 otherwise. We can obtain the logical bit for x < ? y as the most significant bit of x + ( y ) over ciphertexts here. Similarly, for the equality test, we can compute the encrypted bit E [ x = ? y ] .
Adopting these operations of TFHE, we design a secure multi-party CWC. In this paper, we omit the details of TFHE (see, e.g., [20,21]).
We should note that the secrecy of TFHE definitely impacts the security of our scheme. In fact, in our two-party feature selection scheme, the party B sends his/her inputs in an encrypted form to the party A, and A performs the computation of feature selection on the encrypted inputs. If the encrypted inputs could be easily infiltrated, any ingenious devices to secure the scheme would be meaningless.
Therefore, in designing our scheme, it was a matter of course to require our FHE cryptosystem to be IND-CPA secure. In fact, TFHE is known to be IND-CPA secure. Regarding this, we should note the following:
  • By definition, encryption with an ID-CPA cryptosystem is probabilistic. That is, the result E [ x ] of encryption unpredictably differs every time the encryption is performed. For this reason, by E [ x | t ] , we denote a ciphertext generated at time t. In particular, the notation of E [ x | ] means that the ciphertext has been generated at a time different from any other encryption events.
  • When we consider the IND-CPA security of an FHE cryptosystem, we should note that the way in which the oracle O generates c with D ( c ) = m b is not unique. For example, the oracle may compute c from two ciphertexts of additive shares of m b , say E [ r ] and E [ m b r ] , by  c = f ( E [ r ] , E [ m b r ] ) . The IND-CPA security of an FHE cryptosystem should require that A cannot guess b with effective advantage, no matter how c has been generated. This, however, holds, if the result of performing E [ x ] , f ( E [ x ] , E [ y ] ) and f · ( E [ x ] , E [ y ] ) distributes uniformly, and TFHE is known to satisfy this condition.

3. Algorithms

3.1. Baseline Algorithm

We present the baseline algorithm, a privacy-preserving variant of CWC. In this subsection, we consider a two-party protocol, in which a party B has its private data and outsources CWC computation to another party A , where this protocol can be extended to more than two data owners using the multi-key homomorphic encryption [22]. During the computation, party A should not gain other information than the number n of positive data, the number m of negative data, and the number k of features. It should be noted that party B can hide the actual number of data by inserting dummy data and telling A the inflated numbers n and m. Dummy data can be distinguished by adding an extra bit that indicates that the datum is a dummy if the bit is 1. The values of features and dummy bits of data in each class are encrypted by B ’s public key and sent to A .
The baseline algorithm consists of three tasks: computing encrypted bit string E [ B i ] , sorting E [ B i ] s, and executing feature selection on E [ B i ] s. In the baseline algorithm, all inputs are encrypted and they are not decrypted until the computation is completed. Thus, for simplicity, we omit the notation E in the following presentation.

3.1.1. Computing B i

We can compute B i [ m ( p 1 ) + q ] by ( x p ( F i ) y q ( F i ) ) x p ( d ) y q ( d ) , where x p ( d ) and y q ( d ) represent the dummy bits for data x p and y q , respectively. ( x p ( F i ) y q ( F i ) ) becomes 0 if F i is inconsistent for the pair of x p and y q . Since we want to ignore the influence of dummy data, the part ‘ x p ( d ) y q ( d ) ’ is added to make the whole value 1 (meaning that it is consistent) when one of x p and y q is a dummy. It takes O ( k m n ) time and space in total.

3.1.2. Sorting B s

We can compute B i in encrypted form by summing up the values in B i in O ( m n log ( m n ) ) time (noting that each operation on integers of log ( m n ) bits takes O ( log ( m n ) ) time). Instead, we can set an upper bound b max of the bits used to store the consistency measure to reduce the time complexity to O ( m n b max ) .
Then, sorting B s in the incremental order of consistency measures can be accomplished using any sorting network in which comparison and swap are performed in encrypted form, without leaking information about feature ordering. It should be noted that in this approach, the algorithm must spend Θ ( m n + log k ) time to swap (or pretend to swap) two-bit strings and original feature indices of log k bits regardless of whether the two features are actually swapped or not. Because this is the most complex part of our baseline algorithm, we will demonstrate how to improve it. Using an AKS sorting network [35] of size O ( k log k ) , the total time for sorting B i s is O ( m n b max + ( m n + b max + log k ) k log k ) .
In our experiments, we employ a more practical sorting network of Batcher’s odd-even mergesort [36] of size O ( k log 2 k ) . A simple oblivious radix sort [37] in the O ( k log k ) algorithm under the assumption that the bit length of each integer is constant was recently proposed.

3.1.3. Selecting Features

Let ( F π ( 1 ) , , F π ( k ) ) be the sorted list of features. We first compute a sequence of bit strings ( Z 2 , , Z k ) of length m n each such that Z i [ h ] = j = i + 1 k B π ( j ) [ h ] for any 2 i k and 1 h m n ; namely, Z i is the bit array storing the cumulative or each position h for B π ( i + 1 ) , B π ( i + 2 ) , , B π ( k ) . Note that Z i [ h ] = 0 indicates that the set { F π ( i + 1 ) , F π ( i + 2 ) , , F π ( k ) } of features is inconsistent with regard to a pair ( x p , y q ) satisfying h = m ( p 1 ) + q , and  { F π ( i + 1 ) , F π ( i + 2 ) , , F π ( k ) } is inconsistent if and only if the bit string Z i contains 0. See Table 5 for Zs in our running example. The computation requires O ( k m n ) time and space.
We simulate Algorithm 1 on encrypted B s and Zs for feature selection. Furthermore, we use two 0-initialized bit arrays, R of length k and S of length m n . R [ i ] is meant to store 1 if the i-th feature (in sorted order) is selected. S is used to keep track of the cumulative or for the bit strings of the currently selected features. Namely, S [ h ] is set to α = 1 B π ( j α ) [ h ] if features { F π ( j 1 ) , , F π ( j ) } have been selected at this time.
Assume that we are in the i-th iteration of the loop of Algorithm 1. Note that, at this time, F contains features { F π ( i ) , F π ( i + 1 ) , , F π ( k ) } and currently selected features, and F { F π ( i ) } is consistent if h = 1 m n ( Z i [ h ] S [ h ] ) is 1. Because we keep F π ( i ) in F if F { F π ( i ) } is inconsistent, the algorithm sets R [ i ] = ¬ h = 1 m n ( Z i [ h ] S [ h ] ) . After computing R [ i ] , we can correctly update S by S [ h ] S [ h ] ( R [ i ] B π ( i ) [ h ] ) for every 1 h m n in O ( m n ) time. Therefore, the total computational time is O ( k m n ) .

3.1.4. Summing Up Analysis

The sorting step takes O ( m n b max + ( m n + b max + log k ) k log k ) time. Because CWC works with any consistent measure, we do not need to use B i in full accuracy, so we assume that b max is set to be constant. Under the assumption, we obtain the following theorem.
Theorem 1.
For the two-party feature selection problem, we can securely simulate CWC in O ( k m n log k + k log 2 k ) time and O ( k m n ) space without revealing the private data of the parties under the assumption that TFHE is secure.
Proof. 
According to the discussion above, computing B i for all features takes O ( k m n ) time and space, sorting features takes O ( m n b max + ( m n + b max + log k ) k log k ) = O ( k m n log k + k log 2 k ) time, and selecting features takes O ( k m n ) time.
Finally, party A computes in O ( k log k ) time an integer array P with P [ h ] = R [ h ] · π ( h ) , which stores the original indices of selected features. In the outsourcing scenario, party A simply sends P to party B as the result of CWC. In the joint computing scenario, party A randomly shuffles P to conceal π to B . As a result, we can securely simulate CWC in O ( k m n log k + k log 2 k ) time and O ( k m n ) space.    □

3.2. Improvement of Secure CWC

Sorting is a major bottleneck for private CWC. The reason for this is that pointers cannot be moved across ciphertexts. For example, consider the case of a secure integer. Let the variables x and y contain integers a and b, respectively. In this case, by performing the secure operation a < ? b , the result is obtained as a < ? b = c { 0 , 1 } . Using this logical bit c, we can swap the values of x and y in O ( 1 ) time, satisfying x < y by the secure operation x c · a + c ¯ · b and y c ¯ · a + c · b .
In the case of CWC, however, each integer i of feature F i is associated with the bit string B i . Since any x cannot be decrypted, we cannot swap the pointers appropriately. Therefore, the baseline algorithm swaps B i explicitly. As a result, the computation time for sorting increases to O ( m n k log 2 k ) . Our main contribution is to improve this complexity to O ( m n k + k log 2 k ) by reducing the cost for such explicit sorting.
Based on the FHE, we propose the improved secure CWC (Algorithm 2), which reduces the time complexity to O ( m n k + k log 2 k ) . An example run of Algorithm 2 is illustrated in Figure 1. As shown in this example, the party A can securely sort k randomized features in O ( k log 2 k ) time using a suitable sorting network, and then, according to the result of sorting, A swaps each associated bit string of length n m in O ( k m n ) time. Following this preprocessing, the parties securely obtain minimal consistent features by decrypting the output of CWC. Finally, we have the following result.
Algorithm 2 Improved secure CWC between parties A and B
1:
Preprocessing:
Party A has E B [ F ] = E B [ F 1 , , F k ] for F i = ( F i , B i , B i ) encrypted with party B ’s public key, where each datum x is encrypted at time 0 as E B [ x | 0 ] .
2:
Party A :
Generates r i for i = 1 , , n uniformly at random.
Sends ( E B [ B i + r i | 1 ] , E A [ r i | 1 ] ) for i = 1 , , n .
3:
Party A :
Calculates E B [ i | 2 ] for i = 1 , , n .
Securely sorts ( E B [ F i | 0 ] , E B [ B i | 0 ] , E B [ i | 2 ] ) for i = 1 , , n in increasing order of B i .
As a result, obtains ( E B [ F i j | 3 ] , E B [ B i j | 3 ] , E B [ i j | 3 ] ) for j = 1 , , n .
Generates a permutation π S n uniformly at random and memorizes it.
Sends ( E B [ i π ( 1 ) | 3 ] , , E B [ i π ( n ) | 3 ] ) .
4:
Party B :
Decrypts ( i π ( 1 ) , , i π ( n ) ) .
Generates r i for i = 1 , , n uniformly at random.
Sends ( E B [ B i π ( j ) + r i π ( j ) + r i π ( j ) | 4 ] , E A [ r i π ( j ) + r i π ( j ) | 4 ] ) for j = 1 , , n .
5:
Party A :
Decrypts r i π ( j ) + r i π ( j ) for j = 1 , , n .
Obtains E B [ B i π ( j ) | 5 ] j = 1 , , n .
Obtains E B [ B i j | 5 ] j = 1 , , n through permutation by π 1 .
6:
Party A :
Simulates CWC for resulting E B [ F ] .
Theorem 2.
Algorithm 2 can simulate CWC in O ( k m n + k log 2 k + k log k log m n ) time and O ( k m n ) space under the assumption that FHE executes each bit operation in O ( 1 ) time.
Proof. 
Compared to the baseline, additional space is required for π and r i and r i . Thus, the space complexity remains O ( k m n ) . For the time complexity, the main task is to sort k-triple ( F i , B i , B i ) in the increasing order of B i . The improved algorithm sorts only the pairs x i = ( F i , B i ) of integers, where the size of x i is O ( log k + log m n ) bits. For each x i , x j , we can check if B i B j in O ( log m n ) time and we can swap them in O ( log k + log m n ) time using homomorphic operations in FHE. It follows that the time for sorting all x i ( i = 1 , , k ) is O ( k log k ( log k + log m n ) ) time. After sorting the pairs, the algorithm moves all B i to the correct positions according to the rank of x i ( i = 1 , , k ) . This cost is O ( k m n ) . Therefore, time complexity is O ( k m n + k log 2 k + k log k log m n ) .    □
Theorem 3.
Algorithm 2 is secure under the assumption that the employed FHE is IND-CPA secure.
Proof. 
We show the security by constructing simulators for parties A and B , respectively.
B ’s view (what B can obtain from A ) is the following:
  • ( E B [ B i + r i | 1 ] , E A [ r i | 1 ] ) for i = 1 , , n ;
  • E B [ i π ( 1 ) | 3 ] , , E B [ i π ( n ) | 3 ] .
Their probability distributions are uniform and independent of each other. Hence, the simulator for B can replace them with
  • ( E B [ B i + r i | 6 ] , E A [ r i | 6 ] ) for i = 1 , , n and r i , which are selected uniformly at random;
  • E B [ π ( 1 ) | 6 ] , , E B [ π ( n ) | 6 ] for π S n , which is selected uniformly at random.
Note that, even if an adversary knows B i , it is computationally impossible to distinguish between E B [ B i + r i | 6 ] and E B [ r i | 6 ] by the IND-CPA security of the cryptosystem E B .
Next, we construct a simulator Sim for the party A . Although what A can obtain from B is
E B [ B i π ( j ) + r i π ( j ) + r i π ( j ) | 4 ] , E A [ r i π ( j ) + r i π ( j ) | 4 ] j = 1 , , n
this is equivalent to { E B [ B i j | 5 ] j = 1 , , n } after decryption and permutation.
On the other hand, the sequence ( i 1 , , i n ) is not explicitly given to A , and A recognizes it through the alignment between
  • ( E B [ B i 1 | 3 ] , , E B [ B i n | 3 ] ) and
  • ( E B [ B i 1 | 5 ] , , E B [ B i n | 5 ] ) .
Therefore, we define A ’s view to be
View A = E B [ B i j | 3 ] , E B [ B i j ) | 5 ] j = 1 , , n
with B i 1 B i n .
On the other hand, we define the view that Sim should generate as follows. While A can generate { E B [ B i j | 3 ] j = 1 , , n } with B i 1 B i 2 , A needs B ’s cooperation to generate { E B [ B i j | 5 ] j = 1 , , n } . Without B ’s cooperation, Sim selects π S n uniformly at random, and generates its own view to be
View Sim = E B [ B i j | 3 ] , E B [ B π ( j ) | ] j = 1 , , n .
Sim can compute E B [ B π ( j ) | ] from E B [ 0 | ] and E B [ B π ( j ) | 0 ] taking advantage of the homomorphic property of the encryption system E B .
Furthermore, we define a distinguisher D as a PPT Turing machine that tries to distinguish between View A and View Sim on the input of { ( E B [ B i | 0 ] , E B [ B i | 0 ] ) i = 1 , , n } .
When we let Pr [ Y = A X = A ] = 1 / 2 + α 1 and Pr [ Y = Sim X = Sim ] = 1 / 2 + α 2 , the advantage of D is defined as α 1 + α 2 .
We show that, if  D ’s advantage α is not negligible, we can construct a PPT attacker Attck that can break the IND-CPA security of the encryption system E B with a non-negligible advantage. Our attacker Attck plays the IND-CPA game, exploiting an oracle O IND as follows:
1.
Attck generates B 1 , B 2 with B 1 B 2 ;
2.
Attck lets x 1 = B 1 and x 2 = B 2 and throws a query ( x 1 , x 2 ) to O IND ;
3.
O IND selects i { 1 , 2 } uniformly at random and sends c = E B [ x i | 1 ] to Attck ;
4.
Attck initializes D by inputting ( E B [ B 1 | 0 ] , E B [ B 1 | 0 ] ) , ( E B [ B 2 | 0 ] , E B [ B 2 | 0 ] ) ;
5.
First query. Attck throws to D the query: ( E B [ B 2 | 2 ] , c ) , ( E B [ B 2 | 2 ] , E B [ B 2 | 2 ] ) ;
6.
If D replies with A , Attck outputs 1 and terminates.
7.
Second query. Attck generates c by adding E B [ 0 | 3 ] to c. Note that D B ( c ) = D B ( c ) holds. Attck throws to D the query: ( E B [ B 1 | 3 ] , E B [ B 2 | 3 ] ) , ( E B [ B 2 | 3 ] , c ) .
8.
If D replies with Sim , Attck outputs 1 and terminates.
9.
Attck outputs 2.
We evaluate Attck ’s advantage as follows. We assume D B ( c ) = x 1 . The probability of this case is 1 / 2 . The probability that D replies with A to the first query or D replies with Sim to the second query is
1 2 + α 1 + 1 2 + α 2 ( 1 2 + α 1 ) ( 1 2 + α 2 ) = 3 4 + α 2 α 1 α 2 3 4 + α 4 ,
since the first and second queries are mutually independent.
When assuming D B ( c ) = x 2 , we see that Pr [ D outputs Sim at the first query ] 1 / 2 is negligible. Otherwise, D can be used as an attacker to break the IND-CPA security of E B . Therefore, Pr [ Attck outputs 2 ] 1 / 4 is negligible. Consequently, we have
Pr [ Attck s guess is right ] 1 2 3 4 + α 4 + 1 2 · 1 4 = 1 2 + α 8 .
Since we assume that α is not negligible, neither is α / 4 .    □

4. Experiments

We implemented the baseline and improved algorithms for secure CWC in C++ using the TFHE library (https://tfhe.github.io/tfhe (accessed on 28 January 2021)).
The experiments were carried out on a machine equipped with an Intel Core i7-6567U (3.30 GHz) processor and 16GB of RAM. In the following, m (resp. n) is the number of positive (resp. negative) data and k is the number of features.
Table 6 summarizes the running time of the baseline algorithm (naive implementation of Algorithm 1 using TFHE) for random data generated for k { 10 , 50 , 100 } and m n { 100 , 500 , 1000 } . The complexity analysis shows that the running time increases in proportion to m n . This experimental result confirms this in real data. The table clearly shows that the sorting process is the bottleneck.
Table 7 compares the running time of preprocessing in the baseline and improved algorithms. According to the results, the proposed algorithm significantly improves the bottleneck in naive CWC for secure computing. We should note that the baseline and improved algorithms both compute exactly the same solution as the CWC on plaintexts. We also show the details of the improved algorithm: ‘sorting’ means the time for sorting of the triples ( F i , | | B i | | , i ) of integers; ‘other task’ means the time for remaining tasks, including generating/adding/subtracting random noise r i , moving B i , decrypting integers, etc.
Table 8 displays the running time of the improved algorithm for real data available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php (accessed on 26 January 2022)), because, since these datasets contain more than three feature/class values, we treated them as a binary classification between one feature/class and the other.
We demonstrated that the proposed algorithm works well for real-world multi-level feature selection problems. We only evaluated the running time in this experiment, but the relevance of the extracted features is guaranteed because the secure CWC algorithm produces the same solution as the original [8].

5. Conclusions

On the basis of fully homomorphic encryption, we proposed a faster private feature selection algorithm that allows us to securely compute functional features from distributed private datasets. Our algorithm can simulate the original CWC algorithm, which chooses favorable features by sorting. In addition to the improvement in computational complexity, the proposed algorithm solves the private feature selection problem in practical time for a variety of real data. One of the remaining challenges is to improve sorting at a lower cost because CWC does not always require exact sorting. Then, ambiguous sorting possibly reduces the computation time, maintaining solution quality. At this time, the proposed algorithm is not applicable to real numbers for feature value. This is because TFHE [21] is not suitable for floating-point operations. Extending the TFHE library to enable secure feature selection for real-valued data is a future challenge.
A well-known feature selection method is to filter features by computing Gini impurity scores [14]. In this method, the optimal threshold for filtering is determined by the order of sorting of each feature. However, since sorting is time-consuming even for secret sharing-based MPC [38], a simpler method of determining the threshold by calculation has been proposed [30], and its effectiveness has been confirmed by experiments. On the other hand, this study focuses on consistency measure-based feature selection. As we mentioned previously, the consistency measure-based method has been confirmed to have advantages over other methods. This study proposes the first secure protocol that enables consistency measure-based feature selection in practical time.
We next compare our proposal with other methods in the framework of private feature selection. A secret sharing-based MPC using the distributed secure sum is proposed [29]. In this method, it is known that statistical information about the data is leaked during the computation. The authors in [30] propose an honest-majority three-party protocol that improves on the drawback of [29]. This method is fast but requires at least two trusted parties. In addition, [30] considered both semi-honest and malicious adversaries, but in this study, the parties are assumed to be semi-honest. A feature selection using homomorphic encryption was proposed in [26]. This method uses only the additive homomorphic property in a two-party model, which limits its computational power. Therefore, statistical information about the data is leaked because partial decryption is required during communication between the parties. Moreover, this protocol has not been implemented. The recent approach by [39] is not based on cryptography and does not provide a formal privacy guarantee, and it leaks information through the disclosure of intermediate representations. Although our method is inferior to secret sharing-based MPC in terms of practical computing time, it can handle feature selection from many data owners, even in situations where only they themselves can be trusted, and does not leak information during computation.
In addition, we discuss future issues and prospects related to secret computation with FHE, which were not discussed in detail in this study. First, this study assumes that the data are consistent, which cannot be applied to real-world data. If the data are inconsistent, e.g., when the data are merged, there are two entries with the same feature values they but are classified into different classes, the party can ignore these entries without decoding. Since the outsourced party can perform a comparison of two integers without decoding, they can use the encrypted logical bit to change the class label of these irrelevant entries to a special value, effectively ignoring them.
Next, we consider how to speed up the computation of real numbers using FHE. CKKS [40] and TFHE are the current state-of-the-art methods for computing real numbers on FHE. CKKS speeds up arithmetic operations on real numbers by converting reals to integers through scaling, performing arithmetic operations on the integers, and then converting the results back to reals. However, CKKS has the drawback that it cannot compute nonlinear functions (e.g., ReLU) or perform comparison operations, making it difficult to apply to ML. On the other hand, TFHE can evaluate comparison operations and NAND circuits, making all computations theoretically possible. Unfortunately, the runtime on TFHE has an overhead of approximately 10,000 times that on the plaintext, and thus the GPGPU-based architecture is currently being studied for speedup. Currently, the fastest implementation is around 20 times faster than algorithms on CPUs [41]. Thus, the application and speedup of TFHE to ML is a promising research area for the future.
In conclusion, it should be noted that with the development of FHE, practical algorithms for more challenging problems such as large-scale genome analysis (GWAS) [42,43] and deep learning [44,45,46] are emerging.

Author Contributions

Conceptualization, H.S.; methodology, T.I., K.S. and H.S.; software, S.O., J.T. and M.K.; validation, S.O., J.T. and M.K.; formal analysis, T.I., K.S. and H.S.; investigation, T.I., K.S. and H.S.; resources, S.O., J.T. and M.K.; data curation, S.O., J.T. and M.K.; writing—original draft preparation, T.I., K.S. and H.S.; writing—review and editing, T.I., K.S. and H.S.; visualization, H.S.; supervision, H.S.; project administration, H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by JSPS KAKENHI (Grant Number 21H05052, 18H04098).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, S.; Zhang, Y.; Dai, W.; Lauter, K.; Kim, M.; Tang, Y.; Xiong, H.; Jiang, X. HEALER: Homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 2015, 32, 211–218. [Google Scholar]
  2. Liu, F.; Ng, W.K.; Zhang, W. Encrypted SVM for Outsourced Data Mining. In Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing, New York, NY, USA, 20 August 2015; pp. 1085–1092. [Google Scholar]
  3. Qiu, G.; Huo, H.; Gui, X.; Dai, H. Privacy-Preserving Outsourcing Scheme for SVM on Vertically Partitioned Data. Secur. Commun. Netw. 2022, 2022, 9983463. [Google Scholar]
  4. Bost, R.; Ada Popa, R.; Tu, S.; Goldwasser, S. Machine learning classification over encrypted data. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 8–11 February 2015. [Google Scholar]
  5. Khedr, A.; Gulak, G.; Vaikuntanathan, V. SHIELD: Scalable Homomorphic Implementation of Encrypted Data-Classifiers. IEEE Trans. Comput. 2016, 65, 2848–2858. [Google Scholar]
  6. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar]
  7. Shin, K.; Kuboyama, T.; Hashimoto, T.; Shepard, D. SCWC/SLCC: Highly scalable feature selection algorithms. Information 2017, 8, 159. [Google Scholar]
  8. Shin, K.; Xu, X.M. Consistency-based feature selection. In Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Santiago, Chile, 28–30 September 2009; pp. 28–30. [Google Scholar]
  9. Almuallim, H.; Dietteric, T.G. Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 1994, 69, 279–305. [Google Scholar]
  10. Liu, H.; Motoda, H.; Dash, M. A monotonic measure for optimal feature selection. In Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; pp. 101–106. [Google Scholar]
  11. Shin, K.; Fernandes, D.; Miyazaki, D. Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; pp. 1491–1497. [Google Scholar]
  12. Zhao, Z.; Liu, H. Searching for interacting features. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6–12 January 2007; pp. 1156–1161. [Google Scholar]
  13. De Cock, M.; Dowsley, R.; Nascimento, A.C.A.; Railsback, D.; Shen, J.; Todoki, A. High performance logistic regression for privacy-preserving genome analysis. BMC Med. Genom. 2021, 14, 23. [Google Scholar]
  14. Breiman, L.; Friedman, J.; Stone, C.; Olshen, R. Classification and Regression Trees, 1st ed.; Taylor and Francis: Oxfordshire, UK, 1984. [Google Scholar]
  15. Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Application of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; pp. 223–238. [Google Scholar]
  16. Attrapadung, N.; Hanaoka, G.; Mitsunari, S.; Sakai, Y.; Shimizu, K.; Teruya, T. Efficient two-level homomorphic encryption in prime-order bilinear groups and a fast implementation in webassembly. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, Incheon, Korea, 4 June 2018; pp. 685–697. [Google Scholar]
  17. Boneh, D.; Goh, E.J.; Nissim, K. Evaluating 2-DNF formulas on ciphertexts. In Proceedings of the Theory of Cryptography Conference, Cambridge, MA, USA, 10–12 February 2005; pp. 325–341. [Google Scholar]
  18. Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (leveled) fully homomorphic encryption without bootstrapping. In Proceedings of the 3rd Innovations in Theoretical Computer Science, Cambridge, MA, USA, 8–10 January 2012; pp. 309–325. [Google Scholar]
  19. Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the 41st ACM Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar]
  20. Chillotti, I.; Gama, N.; Georgieva, M.; Izabachène, M. TFHE: Fast fully homomorphic encryptionover the torus. J. Cryptol. 2020, 33, 34–91. [Google Scholar]
  21. Chillotti, I.; Gama, N.; Georgieva, M.; Izabachène, M. TFHE: Fast Fully Homomorphic Encryption Library, August 2016. Available online: https://tfhe.github.io/tfhe (accessed on 28 January 2021).
  22. Chen, H.; Dai, W.; Kim, M.; Song, Y. Efficient Multi-Key Homomorphic Encryption with Packed Ciphertexts with Application to Oblivious Neural Network Inference. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 395–412. [Google Scholar]
  23. Goldwasser, S.; Micali, S. Probabilistic Encryption. J. Comput. Syst. Sci. 1984, 28, 270–299. [Google Scholar]
  24. Fujisaki, E.; Okamoto, T.; Pointcheval, D.; Stern, J. RSA-OAEP is secure under the RSA assumption. In Proceedings of the 21st Annual International Cryptology Conference, Santa Barbara, CA, USA, 19–23 August 2001; pp. 260–274. [Google Scholar]
  25. Bellare, M.; Rogaway, P. Optimal Asymmetric Encryption. In Proceedings of the Workshop on the Theory and Application of Cryptographic Techniques, Perugia, Italy, 9–12 May 1994; pp. 92–111. [Google Scholar]
  26. Rao, V.; Long, Y.; Eldardiry, H.; Rane, S.; Rossi, R.A.; Torres, F. Secure two-party feature selection. arXiv 2019, arXiv:1901.00832. [Google Scholar]
  27. Anarakia, J.R.; Samet, S. Privacy-preserving feature selection: A survey and proposing a new set of protocols. arXiv 2020, arXiv:2008.07664. [Google Scholar]
  28. Banerjee, M.; Chakravarty, S. Privacy preserving feature selection for distributed data using virtual dimension. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow Scotland, UK, 24–28 October 2011; pp. 2281–2284. [Google Scholar]
  29. Sheikhalishahi, M.; Martinelli, F. Privacy-utility feature selection as a privacy mechanism in collaborative data classification. In Proceedings of the 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises, Poznan, Poland, 21–23 June 2017; pp. 244–249. [Google Scholar]
  30. Li, X.; Dowsley, R.; Cock, M.D. Privacy-preserving feature selection with secure multiparty computation. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 6326–6336. [Google Scholar]
  31. Abspoel, M.; Escudero, D.; Volgushev, N. Secure training of decision trees with continuous attribute. Proc. Priv. Enhancing Technol. 2021, 2021, 167–187. [Google Scholar]
  32. Dash, M.; Liu, H. Consistency-based search in feature selection. Articial Intell. 2003, 151, 155–176. [Google Scholar]
  33. Pawlak, Z. Rough Sets, Theoretical Aspects of Reasoning about Data; Kluwer Academic Publishers: Alphen aan den Rijn, The Netherlands, 1991. [Google Scholar]
  34. Arauzo-Azofra, A.; Benitez, J.M.; Castro, J.L. Consistency measures for feature selection. J. Intell. Inf. Syst. 2008, 30, 273–292. [Google Scholar]
  35. Ajtai, M.; Szemerédi, E.; Komlós, J. An O(nlogn) sorting network. In Proceedings of the 15th Annual ACM Symposium on Theory of Computing, Boston, MA, USA, 25–27 April 1983; pp. 1–9. [Google Scholar]
  36. Batcher, K.E. Sorting networks and their applications. In Proceedings of the American Federation of Information Processing Societies Spring Joint Computing Conference, Atlantic City, NJ, USA, 30 April–2 May 1968; pp. 307–314. [Google Scholar]
  37. Hamada, K.; Chida, K.; Ikarashi, D.; Takahashi, K. Oblivious Radix Sort: An Efficient Sorting Algorithm for Practical Secure Multi-party Computation (iacr.org). 2014. Available online: https://eprint.iacr.org/2014/121 (accessed on 26 January 2022).
  38. Goodrich, M. Zig-zag sort: A simple deterministic data-oblivious sorting algorithm running in O(nlogn) time. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, New York, NY, USA, 31 May–3 June 2014; pp. 684–693. [Google Scholar]
  39. Ye, X.; Li, H.; Imakura, A.; Sakurai, T. Distributed collaborative feature selection based on intermediate representation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 4142–4149. [Google Scholar]
  40. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Online, 30 November 2017; pp. 409–437. [Google Scholar]
  41. Matsuoka, K.; Hoshizuki, Y.; Sato, T.; Bian, S. Towards Better Standard Cell Library: Optimizing Compound Logic Gates for TFHE. In Proceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography, New York, NY, USA, 15 November 2021; pp. 63–68. [Google Scholar]
  42. Bos, J.W.; Lauter, K.; Naehrig, M. Private predictive analysis on encrypted medical data. J. Biomed. Inform. 2014, 50, 234–243. [Google Scholar]
  43. Lauter, K.; López-Alt, A.; Naehrig, M. Private Computation on Encrypted Genomic Data. In Proceedings of the 3rd International Conference on Cryptology and Information Security in Latin America, Florianópolis, Brazil, 17–19 September 2014; pp. 3–27. [Google Scholar]
  44. Dowlin, N.; Gilad-Bachrach, R.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. CryptoNets: Applying neural networks to Encrypted data with high throughput and accuracy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 201–210. [Google Scholar]
  45. Bourse, F.; Minelli, M.; Minihold, M.; Paillier, P. Fast Homomorphic Evaluation of Deep Discretized Neural Networks. In Proceedings of the 38th Annual International Cryptology Conference, Santa Barbara, CA, USA, 19–23 August 2018; pp. 483–512. [Google Scholar]
  46. Badawi, A.A.; Jin, C.; Lin, J.; Fook Mun, C.; Jun Jie, S.; Hong Meng Tan, B.; Nan, X.; Mi Mi Aung, K.; Chandrasekhar, V.R. Towards the AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data With GPUs. IEEE Trans. Emerg. Top. Comput. 2021, 9, 1330–1343. [Google Scholar]
Figure 1. An example run of Algorithm 2. For simplicity, we omit the clock time in each ciphertext. (1): Parties A and B jointly compute B i and B i for each feature F i (same as the baseline algorithm). (2): A securely sends B i ; B cannot learn anything. (3): A appends encrypted index i for each F i . (4): A sorts only ( F i , B i ) by B i . (5): A sends the sorted indices with random permutation; B cannot learn anything. (6): B sends B i ; A cannot learn anything from it. (7): A decrypts the noise and obtains the correct order of B i j ; A cannot learn anything. (8): A simulates CWC the same as the baseline. (9): Party A , B share the resulting features.
Figure 1. An example run of Algorithm 2. For simplicity, we omit the clock time in each ciphertext. (1): Parties A and B jointly compute B i and B i for each feature F i (same as the baseline algorithm). (2): A securely sends B i ; B cannot learn anything. (3): A appends encrypted index i for each F i . (4): A sorts only ( F i , B i ) by B i . (5): A sends the sorted indices with random permutation; B cannot learn anything. (6): B sends B i ; A cannot learn anything from it. (7): A decrypts the noise and obtains the correct order of B i j ; A cannot learn anything. (8): A simulates CWC the same as the baseline. (9): Party A , B share the resulting features.
Algorithms 15 00229 g001
Table 1. An example dataset shown in [7].
Table 1. An example dataset shown in [7].
D F 1 F 2 F 3 F 4 F 5 C
x 1 101110
x 2 110000
x 3 000110
x 4 101000
x 5 111101
x 6 010101
x 7 010011
x 8 000011
I ( F i ; C ) 0.1890.1890.0490.0000.000
Table 2. Time and space complexities of the baseline and improved algorithms for secure CWC, where k is the number of features and m , n are the numbers of positive and negative data, respectively. We assume that the time of the respective operation (e.g., encryption/addition/multiplication/comparison) in FHE is O ( 1 ) .
Table 2. Time and space complexities of the baseline and improved algorithms for secure CWC, where k is the number of features and m , n are the numbers of positive and negative data, respectively. We assume that the time of the respective operation (e.g., encryption/addition/multiplication/comparison) in FHE is O ( 1 ) .
AlgorithmTimeSpace
CWC on plaintext [8] O ( k m n + k log k ) O ( k m n )
secure CWC (baseline) O ( k m n log k + k log 2 k ) O ( k m n )
improved O ( k m n + k log 2 k + k log k log m n ) O ( k m n )
Table 3. An example dataset D with F = { F 1 , F 2 , F 3 , F 4 } and C = { 0 , 1 } . The data consist of two positive data { x 1 , x 2 } and five negative data { y 1 , y 2 , y 3 , y 4 , y 5 } .
Table 3. An example dataset D with F = { F 1 , F 2 , F 3 , F 4 } and C = { 0 , 1 } . The data consist of two positive data { x 1 , x 2 } and five negative data { y 1 , y 2 , y 3 , y 4 , y 5 } .
x i D F 1 F 2 F 3 F 4 C
x 1 01101
x 2 00111
y i D F 1 F 2 F 3 F 4 C
y 1 10100
y 2 11000
y 3 01010
y 4 10100
y 5 11000
Table 4. The bit string B i for the example dataset D of Table 3. Each column ( x p , y q ) is 0 if x p ( F i ) = y q ( F i ) . For example, B 1 = ( 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 1 , 1 ) because x p ( F 1 ) = y q ( F 1 ) only for the two pairs ( x 1 , y 3 ) and ( x 2 , y 3 ) .
Table 4. The bit string B i for the example dataset D of Table 3. Each column ( x p , y q ) is 0 if x p ( F i ) = y q ( F i ) . For example, B 1 = ( 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 1 , 1 ) because x p ( F 1 ) = y q ( F 1 ) only for the two pairs ( x 1 , y 3 ) and ( x 2 , y 3 ) .
B i ( x 1 , y 1 ) ( x 1 , y 2 ) ( x 1 , y 3 ) ( x 1 , y 4 ) ( x 1 , y 5 ) ( x 2 , y 1 ) ( x 2 , y 2 ) ( x 2 , y 3 ) ( x 2 , y 4 ) ( x 2 , y 5 )
B 1 1101111011
B 2 1001001101
B 3 0110101101
B 4 0010011011
Table 5. Sorted B s for the example dataset D of Table 3 and the corresponding Z i s.
Table 5. Sorted B s for the example dataset D of Table 3 and the corresponding Z i s.
i π ( i ) B π ( i ) Z i
12 B 2 = 1001001101 Z 1 = 1111111111
24 B 4 = 0010011011 Z 2 = 1111111111
33 B 3 = 0110101101 Z 3 = 1101111011
41 B 1 = 1101111011 Z 4 = 0000000000
Table 6. Running time (s) of baseline algorithm (naive secure CWC). Task 1: computing B i s. Task 2: sorting B i s. Task 3: feature selection.
Table 6. Running time (s) of baseline algorithm (naive secure CWC). Task 1: computing B i s. Task 2: sorting B i s. Task 3: feature selection.
k mn Task 1Task 2Task 3
1010060.3835.8111.9
500300.54252.4558.1
1000601.48867.01114.2
50100301.86292.6589.3
5001502.930,364.62941.0
10003007.062,124.65919.7
100100603.716,148.51179.0
5003005.976,315.25952.5
10006014.1154,143.511,867.0
Table 7. Running time (s) of baseline and improved algorithms. Here, ‘baseline’ is same as Task 2 in Table 6 (i.e., the bottleneck); ‘improved:’ is the running time of corresponding task in the improved algorithm, where ‘sorting’ and ‘other tasks’ are the details.
Table 7. Running time (s) of baseline and improved algorithms. Here, ‘baseline’ is same as Task 2 in Table 6 (i.e., the bottleneck); ‘improved:’ is the running time of corresponding task in the improved algorithm, where ‘sorting’ and ‘other tasks’ are the details.
k mn BaselineImproved:SortingOther Tasks
10100835.8203.769.4134.2
5004252.4286.289.5196.6
10008867.0302.598.9203.6
10010016,148.53311.21865.51445.7
50076,315.24601.92647.71954.1
1000154,143.54671.42660.82010.5
Table 8. Running time (s) of improved algorithm for real data in UCI Machine Learning Repository.
Table 8. Running time (s) of improved algorithm for real data in UCI Machine Learning Repository.
Datasetk mn TimeSortingOther Tasks
Letter 16196252.280.6171.4
Breast Cancer 102464312.6103.1209.5
Covertype 549791653.5836.9816.6
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ono, S.; Takata, J.; Kataoka, M.; I, T.; Shin, K.; Sakamoto, H. Privacy-Preserving Feature Selection with Fully Homomorphic Encryption. Algorithms 2022, 15, 229. https://doi.org/10.3390/a15070229

AMA Style

Ono S, Takata J, Kataoka M, I T, Shin K, Sakamoto H. Privacy-Preserving Feature Selection with Fully Homomorphic Encryption. Algorithms. 2022; 15(7):229. https://doi.org/10.3390/a15070229

Chicago/Turabian Style

Ono, Shinji, Jun Takata, Masaharu Kataoka, Tomohiro I, Kilho Shin, and Hiroshi Sakamoto. 2022. "Privacy-Preserving Feature Selection with Fully Homomorphic Encryption" Algorithms 15, no. 7: 229. https://doi.org/10.3390/a15070229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop