Towards an Efficient Privacy-Preserving Decision Tree Evaluation Service in the Internet of Things

With the fast development of the Internet of Things (IoT) technology, normal people and organizations can produce massive data every day. Due to a lack of data mining expertise and computation resources, most of them choose to use data mining services. Unfortunately, directly sending query data to the cloud may violate their privacy. In this work, we mainly consider designing a scheme that enables the cloud to provide an efficient privacy-preserving decision tree evaluation service for resource-constrained clients in the IoT. To design such a scheme, a new secure comparison protocol based on additive secret sharing technology is proposed in a two-cloud model. Then we introduce our privacy-preserving decision tree evaluation scheme which is designed by the secret sharing technology and additively homomorphic cryptosystem. In this scheme, the cloud learns nothing of the query data and classification results, and the client has no idea of the tree. Moreover, this scheme also supports offline users. Theoretical analyses and experimental results show that our scheme is very efficient. Compared with the state-of-art work, both the communication and computational overheads of the newly designed scheme are smaller when dealing with deep but sparse trees.


Introduction
Recently, ubiquitous mobile devices equipped with various powerful embedded sensors (e.g., GPS, camera, digital compass, and gyroscope) have become an important part of our daily life. Moreover, the increasingly powerful wireless network technology has made communications between different mobile devices easier than before. The progress of these technologies gives rise to the concept of the Internet of Things (IoT). It is forecasted that there will be around 50 billion devices connected to the Internet by 2020 [1]. These devices in the IoT can generate volumes of data. The generated data can be used to find potential trends or making decisions on current events. Building a data mining model from the data generated from IoT has revolutionized our society in many ways, such as healthcare, social networks, and consumer electronics. For instance, the data generated from the wearable devices, such as heart rate, temperature, and oxygen saturation, along the data contributed from the hospitals, can be collected to build a data mining model for providing an online diagnosis.
Nowadays, several big Internet Giants, such as Amazon [2], Google [3] and Microsoft [4], all offer this kind of data mining service for users. These companies can collect the data from IoT and then use

Motivation
As stated before, most of the proposed works about the privacy-preserving decision tree evaluation scheme either could arouse large computation or communication costs for users or cannot provide efficient service for the client, especially when dealing with large but sparse trees. Therefore, we intend to propose a decision tree evaluation scheme that can give the cloud the ability to provide an efficient but highly secure classification service for the resource-constrained client in the IoT. In such a privacy-preserving decision tree evaluation scenario, we mainly consider two kinds of privacy and security issues. First, we should protect the query data and its prediction label from the cloud server as well as the outside unauthorized entities. Moreover, clients using the prediction service cannot infer anything about the decision tree during the evaluation. Besides privacy and security issues, efficiency is also supposed to be well considered. IoT devices usually have limited computation and communication ability. The less computation the client needs, the better for the client. Besides, for the system's scalability, our scheme also needs to support off-line clients.

Our Contributions
To address the mentioned issues, we designed a decision tree evaluation scheme that can protect the model and data security in this work. The main contributions are listed in the following: • We newly design a secure comparison protocol that can return additive shares of the comparison result on additively secret shared inputs. Compared with the Huang et al.'s work [21] and Zheng et al.'s work [20], the number of additive multiplications required can be reduced from 2l and 3l to l respectively, where l is the bit-length of a feature vector's element. Compared with Liu et al.'s work [22], which is based on additive secret sharing and additively homomorphic cryptosystem, the proposed work is more secure and efficient. • With the additive secret sharing technology and an asymmetrically homomorphic cryptosystem, i.e., Paillier cryptosystem [23], the privacy-preserving decision tree scheme based on the two-cloud model is proposed in this work. The scheme is tested on several widely used real-world datasets. The experimental results show that compared with the most recent work, i.e., Zheng et al.'s work [20], our scheme is more efficient when dealing with deeper trees. Particularly, the communication cost of our scheme is just 1/709 of Zheng et al.'s work [20]. • We show that our scheme can fully protect the privacy of the client. At the same time, during the evaluation process, the client also learns nothing of the decision tree. Additionally, since there are two clouds involved, we can also prove the model is not leaked to the other cloud except the number of the decision node.

Organization
The organization of this work is presented as follows. The preliminary background on the decision tree, additive secret sharing technology, and Paillier encryption system are given in Section 2. In Section 3, we give a system model and design goals of our scheme. Our privacy-preserving method for decision tree evaluation service is presented in Section 4. Next, we analyze the security and performance of our scheme in Sections 5 and 6 respectively. Then, the related works are introduced in Section 7. Finally, we conclude our work and present our future works in Section 8.

Preliminaries
Some basic concepts of the decision tree and cryptographic knowledge are introduced in this section. We also present the key notations used in Table 1. Notations Definition The ciphertext of x encrypted by Paillier Add(·) Addition on additive shares Mul(·) Multiplication on additive shares Rec(·, ·) x's value's reconstruction SC Secure Comparison SDTE Secure Decision Tree Evaluation

Decision Tree
The decision tree is a well-known data mining algorithm, which has many well-known applications. We present an example of a decision tree in Figure 1. In such a tree T, we assume that the number of the internal nodes is m. These internal nodes are also called as decision nodes. Besides, the nodes with classification labels are called leaf nodes. The longest path's length of T is called the depth of tree T. Note that a decision tree is usually not binary. However, we can easily transform a tree into a binary one [16]. For each internal node of T, it has a threshold y i , where i ∈ [1, m]. Moreover, the vector feature is represent as X = {x 1 , x 2 , · · · , x n }. There is a boolean function which is associated with a tree node, i.e., f (x, y) = (x ≤ y). The function's value decides the path of classification of this tree. For example, we can let "1" denote the left child of this node and "0" denote the right child [15]. When it comes to the leaf node, the corresponding label of this node is the prediction result of X.

Additive Secret Sharing
Shamir proposed the additive secret sharing technology [24] which is a typical secure multi-party computation scheme [25]. In this scheme, we split an integer x ∈ Z P into two additive shares, which can be represented as x . Besides, we use x A , x B to represent party A's and B's shares of x. The Rec is adopted to represent the reconstruction function, i.e., x ← Rec( x A , x B ) = x A + y B (mod P). Since all the calculations are made with the ring Z P , "mod" P will be omitted for simpilicty.

Addition of Additive Shares
To calculate the addition of two additive shares u and v , party A and B just need to make some local computations, i.e, Note that no interaction is required between the two parties.

Multiplication of Additive Shares
To calculate the multiplication of two additive shares, the Beaver triples [26] are needed. First, the two parties should share pre-computed triples which can be denoted as c = a · b . The Beaver triples can be generated by the two parties by Oblivious Transfer [27] or distributed from a trusted third party [21]. Suppose that the to be multiplied integers are x and y. After running this secure multiplication protocol, the two parties obtain x · y . More details about the multiplication of additive shares can be found in references [21,22].

Paillier Cryptosystem
Paillier cryptosystem is an asymmetric cryptosystem which is also additively homomorphic [23]. The plaintext space of Paillier cryptosystem is Z N while the ciphertext domain is Z N 2 . In this work, [x] is used to represent the ciphertext of x encrypted by the Paillier cryptosystem. As we have stated before, the Paillier cryptosystem is additively homomorphic, with the following properties: More details of the security proof of Paillier cryptosystem are shown in reference [23].

System Model and Design Goals
The system model and design goals are introduced as followed.

System Model
In this work, our scheme is designed on a widely-used two-cloud model, including a Cloud Service Provider (CSP) and an Evaluation Service Provider (ESP). Details of the system model are given in Figure 2 [28].
• Service Users: The Service User (SU) in our system wants to use a decision tree evaluation service in a privacy-preserving way. The SU splits the query vector into two additive shares before sending them to two clouds respectively. • Cloud Service Provider: Assume that a trained decision tree model belongs to Cloud Service Provider. The CSP provides a decision tree classification service to SU. Since only one of the shares is sent to it, CSP needs to cooperate with the Evaluation Service Provider to fulfill the evaluation. • Evaluation Service Provider: In our system, the ESP's mission is to cooperate with the CSP to give the SU the evaluation result of the decision tree model in a privacy-preserving way. Besides, ESP generates the public/private key pair of the Paillier cryptosystem and reveals the public key to CSP.
In this system, the decision tree model just belongs to CSP. This is a practical assumption in reality. For instance, the CSP could be a famous cloud service provider, such as Google, who is able to collect data from massive data owners and then train a decision tree model for remote diagnosis. Since it wants to provide an efficient and highly secure decision tree prediction service, an ESP is also essential in this scheme. Namely, there exist two main reasons. First, since CSP just owns parts of query data, it cannot make the decision tree evaluation all by himself. In addition, in the two-cloud model, the SUs usually can stay offline during the evaluation process while the single server model usually cannot [5,6,[28][29][30][31].

Threat Model
Here, we assume all parties in our scheme are semi-honest, which is also known as honest-but-curious, which means that they strictly follow the rules of designed protocols, and at the same time they may attempt to get other additional information. Besides, we also stress that two clouds cannot collude with each other. This is a basic assumption in secure multiparty computation. In addition, there is an active adversary A in such a scheme. Such an adversary A attempts to infer the query data of the SU, details of the decision tree model owned by CSP and the classification results as well. We assume A has the following capabilities: • A may eavesdrop the communication channel between CSP and ESP.
The adversary A is restricted from comprising (1) CSP and ESP simultaneously, and (2) the communication channels between SU and the cloud.

Design Goals
Our scheme's design goals are listed as follows: • Data Protection. For this decision tree evaluation scheme, data security and privacy issues are the most important ones to be solved. As we know, the outsourced data and the calculated classification result contain sensitive information that should be kept secret to the cloud, including CSP and ESP. Besides, for the CSP, the decision tree model is its assert, which also should not be leaked to ESP and the SUs. Moreover, all such information should be confidential to the active adversary A. • Classification Result's Accuracy. The classification result should be the same as the non-privacypreserving one. • Efficiency. In this scheme, we insist that the two clouds should finish the evaluation process as fast as they can and return the classification labels to SUs quickly. Thus, the computation and communication costs of the clouds should be small enough. • Offline SUs. As we know, SUs in the IoT usually do not have strong computation power and large storage space. Therefore, we should minimize the computation and communication burdens for the SUs. Thus, once sending the query to CSP and ESP, SUs should stay offline until obtaining results. We also should note that many clients are using this decision tree evaluation service. Thus, for the scalability of the system, this scheme is supposed to support offline SUs.

Privacy-Preserving Decision Tree Evaluation
Before introducing this privacy-preserving decision tree evaluation scheme, we first present our secure comparison, which serves as the basis of this scheme. In this work, the party A is CSP while the party B is ESP.

Secure Comparison Algorithm
As we have stated, the comparison is one of the most important parts of our decision tree evalutation scheme. Assume that CSP has a private input x, while CSP also additively shares a set of y 1 , y 2 , · · · , y l with ESP, where y i = y i A + y i B , and y i is the bit-composition of y. Let's assume that bit lenghts of x and y are both l. After running this Secure Comparison (SC) algorithm, CSP and ESP get the additvely shared comparison result r . First, CSP makes a bit-decomposition of x as x 1 , x 2 , · · · , x l . Then, CSP and ESP can locally calculate After that, CSP randomly picks an α from {0, 1}, and sets γ = 1 − 2α. CSP and . CSP randomly chooses an r i from Z N , and runs a Mul with ESP on r i , h i to get δ i . ESP randomly picks an ω from Z N , and calculates δ i ESP sends δ i B to CSP. CSP runs a Rec to reconstruct δ i . Then, ESP runs a permutaiton on all the δ i and sends them to CSP. Receiving them, CSP removes the blinded ω i . If there is at least one 0 of δ i , set β = 1. Otherwise, set β = 0. Note that the comparison result r is equal to (α ⊕ β) = (α − β) 2 .
Therefore, CSP and ESP run the Mul on α − β and α − β to obtain r . The specific details of our SC are shown in Algorithm 1.

Algorithm 1 Secure Comparison (SC)
Input: A private integer x belongs to CSP. y 1 , y 2 , · · · , y l are shared by CSP and ESP. x i and y i represent the i−th bit of x and y respectively, where i ∈ [1, l]. Output: CSP and ESP output r , where r = x ≤ y.
1: CSP: Calculate the bit-decomposition of x as Then, send all the δ i B to CSP. 6: CSP: Run a Rec on all the δ i . Then, run a permutation function π on these δ i as δ i and send them to ESP. 7: ESP: Remove ω for each δ i . If there is at least one 0, set β = 1. Otherwise, set β = 0. 8 CORRECTNESS. We stress that our SC is a kind of variant of DGK comparison [32], but is not a naive secret shared version of the DGK comparison. In DGK, the sender calculates the ciphertext , and then blinds it with a random integer r i . The receiver decrypts the received data. If one of them is 0, set β = 1. In this algorithm, no encryption is needed. More importantly, in this SC, we use d i to simulate x i ⊕ y i . The naive way to calculate x i ⊕ y i is to calculate x i + y i − 2 * x i * y i . If so, for each x i ⊕ y i , the two servers need to run a multiplication protocol. If using such a naive method, the two clouds need to run l times Mul, which can cause many computation and communication costs. To avoid these communication and computation costs, in this SC, we let the two clouds calculate d i instead. Here, We also should note that only when h i = 0 matters the value of β. Even though not every 2 , when α, β ∈ 0, 1, CSP and ESP cooperate with each other to run the Mul to obtain shares of the result. More correctness proof of our SC can be easily verifed according to DGK comparison [32].
Discussion. There already exist several protocols about privacy-preserving comparison of two additive shared integers [20][21][22]32]. First, we should note that the setting of these comparisons is different from ours. In their setting, the two integers x and y are additively shared by the two-party. In ours, one of the integers belongs to one party, and the other is shared by the two parties. Moreover, for the Huang et al.'s work [21], there are 2l Mul needed, while ours just need l + 1 Mul. Zheng et al.'s Secure decision node evaluation, is designed to get the additively shared comparison result. The basic idea is similar to Huang et al.'s SC [21], but is less efficient, which needs almost 3l Mul. Note that, Mul is the most time-consuming part of ours and Huang et al.'s [21], meaning that ours SC is more efficient. Liu et al.'s work [22] is based on a variant Paillier cryptosystem [33] and additive secret share technology. Since encryption and decryption are much time-consuming than fixed numbers multiplication and addition. Obviously, this SC is more efficient than [22] for integers with short bits. In addition, in reference [22], the r 1 (x − y) + r 2 is sent to ESP. Note that since r 1 , r 2 are not randomly chosen from Z N , they cannot perfectly disguise x − y. Thus, the secure comparison algorithm in [22] is not perfectly secure which just achieves statistical security [34].

Privacy-Preserving Decision Tree Evaluation
There are two stages in our privacy-preserving decision tree evaluation scheme, i.e., query vector issuing and secure decision tree evaluation. During the execution, the SU can be offline and cannot infer anything about the decision tree. Simultaneously, the clouds cannot get anything about the query vector and the final classification results. We also note that the decision tree just belongs to CSP. In this section, we propose two privacy-preserving decision tree evaluation schemes. One of them leaks the number of the nodes of the tree to the ESP, which achieves the same security level as previous works [15][16][17]. The other one can provide a higher security level, which can protect the number of tree nodes with statistical security.

Query Request Issuing
One of the cloud should send the modular P to SU, which is used to split and reconstruct integers. Note that, P < N. Receiving P from CSP, SUs split their query data vectors into additive shares. For a query vector S = {s 1 , s 2 , · · · , s n }, the SU first makes a bit decomposition on every s i , and then splits every s i,j into two additive shares. Specifically, the SU picks random integers r i,j from Z P where i ∈ {1, 2, · · · , n}. After that, the SU sets s i,j A = r i,j , and s i,j B = s i,j − r i,j . Then, the SU respectively sends S A to the CSP and S B to the ESP. In this scheme,the A cannot eavesdrop communication channels between the cloud and SUs.

Secure Decision Tree Evaluation
Once receiving the query from the SU, the clouds run a Secure Decision Tree Evaluation (SDTE) protocol and return shares of the classification result to SU. Our SDTE follows the similar idea from Tai et al., who propose to express a decision tree into a set of linear equations of path cost [17]. For every leaf node c i in the tree, we calculate the sum of the boolean results b i along the path from the root to the leaf c i as the path cost p c i . We present an example of the path costs of the tree in Figure 3a, i.e., p c Note that b i is the comparison result between the decision node and the corresponding attribute data in the query, which is 0 or 1. Therefore, for one specific query, there is only one p c i is 0.  SDTEI. Before running this SDTEI, ESP generates the public/private key pair of Paillier and then reveals the public key to CSP. First, we introduce a SDTEI algorithm, which can protect the query data and classification result well but leak the number of nodes to ESP. In the following, n denotes the length of the query vector, m denotes the number of the decision nodes in the tree, m represents the number of the leaf nodes in the tree. For each tree node, CSP and ESP compare it with its corresponding query vector element. Specifically, they run the SC to get additive shares of the comparison result. Then, ESP encrypts the shares he holds and sends the encrypted shares to CSP. Receiving the encrypted shares, CSP calculates every path's edge cost and blinds it with a random number. Note that there are only one of the path's value is zero. Then, CSP runs a permutation function on them and sends them to ESP. ESP decrypts the data received. ESP generates a vector α. If the decrypted result is 0, ESP sets its corresponding α i = 1. Otherwise, set it as α i = 0. Then, ESP encrypts the α and sends it to CSP. CSP runs the reverse permutation function π on these [α i ]. Next, CSP calculates [c] ← Π m i=1 [α i ] c i . After that, CSP sends c to the SU and [c"] to ESP. ESP decrypts [c"] as c and then sends it to SU. Receiving c and c", SU can run Rec to get the classification result.
Step 1 (CSP & ESP): For each node t i of tree T, a SC is run by CSP and ESP on the threshold with its corresponding feature in the query vector s i to get the comparison result b i . We can see that from SC, the comparison results are kept confidential to both CSP and ESP.
Step  Step 6 (ESP): ESP decrypts [c"] as c and then sends it to SU.
Step 7 (SU): Receiving both c and c", SU calculates c ← c + c mod P as the classification result. REMARK 1. For a fully binary tree, there is only a path cost p c i is 0 [17]. Therefore, in the Step 4, when one of the decrypted result is 0, ESP can stop the decryption and set the other α i = 0. In Step 5, Since there is only one α i is 1 and the others is 0, c is the corresponding classification label. REMARK 2. For a fully binary tree, we observe that the relationship between the number of the leaf nodes, i.e, m and the number of the decision node is m = m + 1. Since in Step 1, the ESP already infers m through counting the number of the SC, the number of [p c i ] dose not leak more information in Step 4. REMARK 3. In Zheng [20] et al.'s work, the two clouds blind the path cost along with the classification labels together and send them to the SU. Therefore, in their scheme, the SU needs to calculate more blinded path costs to recover the classification result. Moreover, the depth of the tree can be inferred by the SU through counting the number of the leaf nodes (Note that, in their scheme, the tree is transformed to a fully complete binary tree, where m = 2 d−1 .). However, in our scheme, the SU just runs Rec once, and the SU learns nothing about the tree model. SDTEII. We note the in our SDTEI algorithm, the ESP can infer the number of the nodes, i.e., m, in the tree through counting the number of the SC runs. To avoid such an information leakage, CSP can also add a proportional dummy node in the tree to disguise the information of m. Receiving all the [b i ] B from ESP, CSP can delete the dummy node's comparison result easily. Moreover, in Step 3, the CSP also needs to add the dummy path cost to disguise the number of leaf nodes. And in Step 5, CSP deletes the dummy path costs. Since the number of the dummy nodes is hard to decide for each evaluation, our SDTEII achieves a statistically secure for the protection of m.

Security Analysis
Security analysis of the proposed cryptographic building blocks is made first and then we analyze the security of our scheme.

Security of Cryptographic Blocks
In this section, we give the details of the security analysis of the proposed SC. Before that, we first introduce security definition under the semi-honest model [35].
Definition 1 (Security in the Semi-Honest Model [35]). A protocol can be represented as π. We use a i (resp. b i ) to represent the input (resp. ouput) of the protocol of party p i . Moreover, let Π i (π) and Π S i (π) represent P i 's execution image and simulated image of the protocol π repsectively. Therefore, a protocol π is secure if the distribution of the simulated image and its execution image are computationally indistinguishable (More details are shown in [35]).
From the description of Definition 1, we can conclude that a cryptographic protocol is secure, if and only if its simulated execution image and its corresponding execution image are computationally indistinguishable. Note that, the data exchanged and the information calculated when the protocol is run are also included in the execution image of this cryptographic protocol. The following lemmas may also be adopted to prove the proposed protocols' security. We refer the reader to find more proof details of Lemmas 1 and 2 in reference [21].
To prove the security of our protocols, the following lemmas may also be adopted.

Lemma 2.
If a variable x ∈ Z N is added by a uniformly random integer from Z N , we can conclude that r + x is not only independent from x but also uniformly random.

Theorem 1.
The SC designed in Section 4.1 is secure under semi-honest model.

Proof.
Since the security of the Mul has been proved in [26], in the following we just prove that the security of the steps shown from line 1 to line 7.
Here, we use Π CSP (SC) to denote the execution image of CSP, i.e, Π CSP (SC) = Besides, both α and α are uniformly picked from [0, 1]. Therefore, α and α are computationally indistinguishable. According to the above analysis, we could conclude that Π CSP (SC) is indistinguishable from Π S CSP (SC). In a similar way, we can prove that Π ESP (SC) is computationally indistinguishable from Π S ESP (SC). Since we have proved the security of the steps shown from line 1 to line 7 in SC, and the Mul is secure as well. Thus, we can draw a conclusion that the proposed SC is secure under the semi-honest model according to Lemma 1.

Security of Privacy-Preserving Decision Tree Evaluation Scheme
Theorem 2. The designed privacy-preserving decision tree evaluation scheme is secure, and also can protect the query data and classification result secure from an outside active adversary.
Proof. Followed the similar idea shown in Section 5.1, we can easily prove the security of our SDTE algorithm under the semi-honest model. Note that, the calculations SU made in query request issuing stage and Step 7 in SDTE are locally, thus this part is obliviously secure.
In Step 1, the CSP and ESP cooperate to run the SC to get the additive shares of comparison results between the threshold with the feature elements. As we know, the security of SC has been proven in Theorem 1. In Step 2, the ESP just encrypts several integers with Paillier cryptosystem and sends the ciphertext to CSP. Since the Paillier cryptosystem is semantically secure, the ciphertext is indistinguishable from random numbers in Z N 2 . Therefore, the simulated image is indistinguishable from its actual execution image. Similarly, we can prove the security of Step 3, 4, 5, 6. Therefore, according to Lemma 1, we could draw a conclusion the proposed SDTE algorithm is secure under the assumption of the semi-honest model.
Next, we can show that our scheme is secure under the threat model in Section 3.2. For an active adversary A, the data obtained from eavesdropping the communication channel between CSP and ESP are all partial shares. These data are either random numbers or the addition results of random numbers. According to Lemma 2, they are both uniformly random to A. Thus, we can conclude that our scheme is secure under the threat model in Section 3.2.

Performance Analysis and Comparison
In this section, we first test the SC and then the whole scheme on five real-world UCI machine learning datasets. After that, we compare our work with the most related works in this area.

Experiment Analysis
We test our scheme on two personal computers which run Ubuntu 16.04 with Intel Core i5-8300H CPU 2.30 GHz six-core processor and 8 GB RAM memory. These two machines act as the CSP and ESP respectively. The system is written by C code by GNU MP library [36] .
Firstly, we test our SC with varied bit length. From Figure 4, we can see that both the communication and computation costs grow with the increase of the bit length. Moreover, we also conduct several experiments about our scheme on five widely-used datasets from UCI repository (UC Irvine Machine Learning Repository [37]), i.e., breast-cancer, heat-disease, housing, credit-screening, and spambase. The application domain of these datasets contains credit rating and breast cancer diagnosis. To start with, we use standard Matlab tools (classregtree and TreeBagger) on these dataset to train a model. The performance of the SUs in our scheme is the same as the clients in Zheng et al.'s [20], since they all just spit the data into additive shares. Therefore, we mainly test the performance of two clouds in our designed scheme. The details of our experimental results are shown in Table 2. Here, n is the dimension of a query vector, d is the depth of a tree and m is the number of nodes in a tree. In our experiment, we set the bit length of an integer is 64. Note that the same experimental datasets are used compared with the works in reference [16,20,22]. We show the specific results in Table 2.

Performance Comparison and Analysis
From experimental results presented in Table 2, we could clearly see both communication and computational overheads of ours are much smaller than Wu et al.'s work [16]. Compared with the most recent work, i.e., Zheng et al.'s work [20], our scheme's computation and communication costs of small datasets are larger. However, when scaled up the large dataset, i.e., spambase, both the computation and communication costs of ours are less than theirs. Specifically, the computational cost of ours is nearly 1/11 of theirs, and the communication cost is almost 1/709 of theirs. The reason behind it is that the communication and computation costs of their work are exponential with the depth of the tree, while ours grows linearly with the number of the nodes in the tree. When dealing with the sparse but deep tree, too many dummy nodes are added in Zheng et al.'s scheme.
From the above experimental analysis, we can conclude there are mainly two factors that influence the performance of our scheme, i.e, the number of nodes m in a tree and the bit length of an integer l. In the real world setting, most of the time l can be smaller than 64. Moreover, since m comparison can be paralleled, the performance of our designed scheme can be further greatly improved.

Comparative Analysis
A detailed comparison between our designed scheme with most related works about privacy-preserving decision tree evaluation is made in this section. Based on fully homomorphic cryptosystem (FHE), Bost et al. [15] recently presented one privacy-preserving decision tree evaluation scheme. Such a scheme is not efficient because of the heavy computation cost of FHE. Wu et al. [16] [38], Tueno et al. [18] followed the similar idea to improve the efficiency. With the help of secret sharing technology (SS), Cock et al. [39] designed one privacy-preserving decision tree scheme. This work is efficient, but a trusted third party is needed. However, all the work proposed cannot protect the tree model well. Either the m or d is leaked to the client. Moreover, these methods cannot support offline users. Most recently, Liang et al. [19] and Liu et al. [22] use searchable symmetric encryption (SSE) and SS, AHE respectively to design new privacy-preserving decision tree evaluation schemes. Unfortunately, the running time grows exponentially with m in [19]. As in the scheme in [39], a trusted third party is also needed in [22]. Zheng et al. designed a decision tree evaluation scheme on the two-cloud model, based on additive secret sharing technology, which is quite similar to ours. In their scheme, the depth of the tree model is not protected. Both their computation and communication costs grow exponentially with d.
We propose two kinds of secure decision tree evaluation schemes. One of the methods leaks the number of decision nodes, and the other one can protect it with statistical security. For these works supporting offline users, i.e., [19,20], the computation cost of the servers or clouds is exponential with either with d or m, meaning that the performance of these works decreases quickly when dealing with deep trees or large trees. Moreover, we have shown that the building blocks in [22] is not secure enough compared with our work. The computation and communication costs of our SDTEI and SDTEII just grow linearly with the number of decision nodes m. We also note that our SDTEI also leaks the number of nodes of the tree. However, in our scheme, this information is just leaked to ESP, not to the SU. This is different from the scheme in [20], where d is leaked to the cloud and the SU simultaneously. We make a more detailed comparison which is shown in Table 3.

Related Work
This work is related to the Secure Multiparty Computation (SMC) [25] and privacy-preserving data mining [14]. With SMC, several parties can cooperate with each other to compute a function without revealing their private inputs. We also note that in SMC, no trusted third party is needed for secure computation. This kind of technology has been widely used in many secure outsourced computation areas [31,40,41]. Recently, SMC and homomorphic cryptosystem [23] have been adopted to build privacy-preserving data mining classifiers [15,20,42]. This work mainly focus on the privacy-preserving data mining issues, particularly decision trees, for the IoT.
The data security and privacy issues about decision trees were first proposed by [13,14]. These earlier researches just focus on privacy-preserving decision tree training. Brickell et al. first consider the privacy-preserving decision tree evaluation issues [43]. They proposed a remote diagnosis system, which adopts both the Garble Circuit (GC) and Homomorphic Encryption (HE).
Recently, Bost et al. [15] creatively proposed to represent a decision tree as a polynomial, and its calculation result is the classification label. Based on such an idea, they designed a privacy-preserving decision tree evaluation scheme through fully homomorphic encryption (FHE). Unfortunately, since the FHE is not that efficient, their scheme cannot be scaled up to a large dataset. Wu et al. [16] proposed another privacy-preserving decision tree evaluation scheme that adopts the more efficient additive homomorphic encryption (AHE) technology. In their scheme, at the end of the execution, clients are required to run an Oblivious Transfer (OT) to obtain prediction results. Tai et al. [17] further improved the efficiency of Wu et al.'s work. In Tai et al.'s designed scheme, linear functions are used to represent the costs of the decision tree, which voids the computation of a high-degree polynomial. Based on the additive secret sharing (SS) technology, Cock et al. [39] designed one efficient privacy-preserving decision tree evaluation scheme in a commodity-based model. When worked on relatively small trees, it is efficient but cannot deal with large trees efficiently. Joye et al. [38] designed a scheme also based on [16]'s basic idea. In their scheme, a new secure comparison protocol is proposed. They also improved the total number of comparisons needed in the evaluation process. Most recently, through denoting decision trees as arrays, Tueno et al. [18] designed one decision tree evaluation system, achieving the sub-linear complexity of trees' sizes. Nevertheless, the works mentioned above cannot fully protect the tree model's information. For instance, either the depth or the number of nodes or even both of them is revealed. In addition, offline clients are not supported in any of the above works. Frequent computation and communication with the server are needed for clients in these schemes, which is not useful for the resource-constrained users in IoT.
Most recently, Liang et al. [19] and Liu et al. [22] respectively proposed new privacy-preserving decision tree evaluation schemes, which can fully protect the user's and cloud's privacy. However, the computation cost of scheme in [19] grows exponentially with m. Liu et al.'s SC cannot achieve perfect security to ESP. Moreover, in their scheme, a third party is needed for the distribution of the keys. Zheng et al. [20] proposed a similar decision tree evaluation scheme on the two-cloud model, based on additive secret sharing technology. Their setting is a little different from ours. In their setting, the decision tree model is outsourced by some providers. Even though their scheme can provide offline users, the tree model is not well protected, i.e., the depth of the tree is leaked to the cloud and the client. Moreover, when scaled up to large trees, their computation and communication costs are larger than ours. In this work, we propose two kinds of secure decision tree evaluation schemes. One of them leaks the number of decision nodes to ESP while the other one can protect this information with statistical security. Moreover, in our proposed scheme, service users can be offline during the evaluation which can greatly reduce computation and communication overheads of them, which is important for the IoT users.

Conclusions and Future Work
In this paper, we designed an efficient but highly secure decision tree evaluation scheme for the cloud, which is based on additive secret sharing technology and Paillier cryptosystem. Our scheme is built on a widely-used two-cloud model but without a trust third party. During the evaluation process, service users can be offline. After the evaluation, clouds infer nothing of the client query and the prediction results, and at the same time service, users cannot infer anything about the tree. According to the experimental results on building blocks and the real-world dataset, we can conclude that our scheme is quite efficient.
In this work, we mainly focus on the privacy issues in the decision tree evaluation. In our future work, we will try to further improve the efficiency of decision tree evaluation on a large dataset. In addition, the decision tree training is a more challenging job. We will try to design a privacy-preserving decision tree training scheme on the cloud without help from the client. Moreover, in the future, we will also try to extend the building blocks designed to include more privacy-preserving data mining algorithms, such as the random forest.