On Using Linear Diophantine Equations for in-Parallel Hiding of Decision Tree Rules

Data sharing among organizations has become an increasingly common procedure in several areas such as advertising, marketing, electronic commerce, banking, and insurance sectors. However, any organization will most likely try to keep some patterns as hidden as possible once it shares its datasets with others. This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach to hide critical classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques, which limit the usability of the data, since the raw data itself is readily available for public use. We propose a look ahead technique using linear Diophantine equations to add the appropriate number of instances while maintaining the initial entropy of the nodes. This method can be used to hide one or more decision tree rules optimally.


Introduction
Privacy preserving data mining [1] is a relatively recent research area aimed at alleviating issues related to the use of data mining algorithms and the privacy of data subjects and the information or knowledge hidden in those data piles. Agrawal and Srikant [2] were the first to consider the induction of decision trees from anonymized data that were adequately corrupted with noise to survive from privacy attacks. The general strand of knowledge hiding research [3] has led to specific algorithms for hiding classification rules, such as the addition of noise, for example, by a data swap process [4].
A key element of privacy preserving data mining concerns individual data privacy and aims to shield the individual integrity of database records in order to prevent the re-identification of individuals or characteristic groups of data inference attacks. This forms the subject of this paper, which deals with the protection of sensitive patterns arising from the application of data mining techniques. Of course, all privacy preservation approaches strive to maintain data information quality.
The primary representative of statistical methods [5] adopts a parsimonious downgrading technique to determine whether additional confidentiality is worth the loss of functionality associated with not downgrading data. Reconstruction techniques involve the reconstruction of the public dataset [6,7] from the non-sensitive rules produced by C4.5 [8] and RIPPER [9] algorithms. Perturbation-based techniques involve the modification of transactions to only support non-sensitive rules [10], the removal of tuples associated with sensitive rules [11], the suppression of specific attribute values [12], and the redistribution of tuples supporting sensitive patterns to maintain the ordering of the rules [13]. Another interesting approach is machine learning classification over encrypted data [14], in which a private decision tree classifier allows the server to traverse a binary decision tree using the client's input, such that the server does not learn the input x and the client does not learn the structure of the tree and the thresholds at each node. A recent work [15] proposes privacy-preserving decision tree evaluation protocols which hide the sensitive inputs from the counterparty using an additively homomorphic encryption (AHE), which are similar to the ElGamal encryption procedure.
In our previously published works [16,17], we proposed a series of techniques to adequately protect the disclosure of sensitive patterns of knowledge in the mining of classification rules. We aimed to hide sensitive rules without compromising the information value of the whole dataset. After an expert selected the sensitive rules, the class labels at the tree node corresponding to the tail of the sensitive pattern were modified (Swap-and-Add pass) to eliminate the gain achieved by the information metric that caused the split. By preserving the class balance of every node across the sensitive path (the path from the root until the leaf that we want to hide), we could assure that we would not have any change in the hierarchy order of the particular path due to changes in entropy of the nodes along the path. We then set the values of non-class attributes, appropriately adding new instances along the path to the root if necessary, so that non-sensitive patterns remained as unaffected as possible. This approach is very important because the sanitized dataset can be subsequently published and, even, shared with the dataset owner's competitors, as can be the case with retail banking [18]. In this paper, we extend these works by formulating a generic look ahead technique that takes into account the structure of the decision tree all the way from an affected leaf to the root. The main contribution of this paper is to improve the Swap-and-Add pass by following a look ahead approach instead of a greedy approach, which was previously used. This technique can be accomplished by using linear Diophantine equations and, importantly, can handle in parallel any number of hiding requests by determining the overall minimum amount of added instances.
The rest of this paper consists of three sections: Section 2 describes the dataset operations we employ to hide a rule while attempting to affect the decision tree minimally; in Section 3 we present how the new look ahead approach can be applied in parallel hiding requests; and Section 4 discusses further research issues and concludes the paper.

Materials and Methods
We chose decision trees for our research since we are primarily interested in techniques that are applicable to "comprehensible" models, and this eventually leads to rules, trees, and other graphical models (Bayesian nets for example). However, the interpretability of rules and trees has to do with how widespread they are and with the scope to associate the masking of metric quality both regarding verboseness as well as the impact on understanding and the accuracy of concealment.
The transparency of the decision tree model is one of its great advantages. Unlike other decision-making models, the decision tree explicitly identifies all possible alternatives and traces each alternative in one view, allowing easy comparison between the different alternatives. One of the main benefits of decision tree analysis is its ability to assign specific values to the problems, decisions, and results of each decision. This reduces ambiguity related to decision-making. Every possible decision scenario finds representation through a clear fork and node, allowing all possible solutions to be seen in one view. A decision tree also allows data to be partitioned at a much deeper level, which is not easily achieved with other decision-making classifiers, such as logistic regression. The decision tree illustrates the problem graphically and various alternatives in a simple and easy-to-understand format that requires no explanation. Decision trees divide data into illustrations that are easy to understand based on rules easily understood. Simple mathematics based on entropy concepts can easily replicate the reasons for the rules produced in a decision tree. Figure 1 below shows a baseline problem, which assumes a binary decision tree representation, with binary-valued symbolic attributes (X, Y, and Z) and binary classes (C1 and C2). Hiding R3 implies that the splitting in node Z should be suppressed, hiding R2 as well.
A first approach to hide R3 would be to remove all the instances of the leaf corresponding to R3 from the training data and to retrain the tree from the resulting dataset. However, this action can lead to a significant restructuring of the tree which also affects other parts of the tree. Entropy 2019, 21 3 of 22 A first approach to hide R3 would be to remove all the instances of the leaf corresponding to R3 from the training data and to retrain the tree from the resulting dataset. However, this action can lead to a significant restructuring of the tree which also affects other parts of the tree.
Another approach would be to turn the direct parent of the R3 leaf into a new leaf. This would not, however, modify the actual dataset. An opponent could thus recover the original tree.
To achieve hiding by minimally modifying the original dataset, we may interpret "minimally" concerning dataset changes or whether the sanitized decision tree produced via hiding is syntactically close to the original. Measuring minimality in how one modifies decision trees has been studied regarding heuristics that guarantee or approximate the impact of changes [19][20][21].
Hiding in Z, however, changes the statistics along with the path from Z to the root. Since the splitting along this path depends on these statistics, the relative ranking of the attributes may change if we use the same induction algorithm on the modified dataset. To avoid ending with an entirely different tree, we first used a bottom-up pass (Swap-and-Add) to change the class label of instances at the leaves, and then added new instances on the root path to preserve the critical statistics on the intermediate nodes.
Subsequently, we use a top-down pass (Allocate-and-Set) to complete the newly added instance specification. These two passes help us firstly hide all sensitive rules and secondly keep the sanitized tree close to the structure of the original decision tree. We note that the above two procedures had been fully described in previously published works [16,17].

Adding Instances to Preserve the Class Balance Using Linear Diophantine Equations: A Proof of Concept and an Indicative Example
The Swap-and-Add pass aims to ensure that node statistics change without threatening classvalue balances in the rest of the tree. Using Figure 2 as an example, we show the original tree with class distributions of instances across edges.

Figure 1.
A binary decision tree, before (left) and after (right) hiding and the associated rule sets.
Another approach would be to turn the direct parent of the R3 leaf into a new leaf. This would not, however, modify the actual dataset. An opponent could thus recover the original tree.
To achieve hiding by minimally modifying the original dataset, we may interpret "minimally" concerning dataset changes or whether the sanitized decision tree produced via hiding is syntactically close to the original. Measuring minimality in how one modifies decision trees has been studied regarding heuristics that guarantee or approximate the impact of changes [19][20][21].
Hiding in Z, however, changes the statistics along with the path from Z to the root. Since the splitting along this path depends on these statistics, the relative ranking of the attributes may change if we use the same induction algorithm on the modified dataset. To avoid ending with an entirely different tree, we first used a bottom-up pass (Swap-and-Add) to change the class label of instances at the leaves, and then added new instances on the root path to preserve the critical statistics on the intermediate nodes.
Subsequently, we use a top-down pass (Allocate-and-Set) to complete the newly added instance specification. These two passes help us firstly hide all sensitive rules and secondly keep the sanitized tree close to the structure of the original decision tree. We note that the above two procedures had been fully described in previously published works [16,17].

Adding Instances to Preserve the Class Balance Using Linear Diophantine Equations: A Proof of Concept and an Indicative Example
The Swap-and-Add pass aims to ensure that node statistics change without threatening class-value balances in the rest of the tree. Using Figure 2 as an example, we show the original tree with class distributions of instances across edges.
We used the information gain as the splitting heuristic. In order to hide the leaf which corresponds to the nine positive instances (to the right of N0), we changed the nine positive instances to negative ones and denoted this operation by (−9p,+9n). As a result, the parent node N0 becomes a one-class node with minimum (zero) entropy. All nodes located upwards to node N0 until the root N4 also absorb the (−9p, +9n) operation ( Figure 3). This conversion would leave node N1 with 49p+46n instances. However, since its original 58p+37n distribution contributed to N1's splitting attribute AN1, which in turn created N0 (and then 9p), we should preserve the information gain of AN1, since the entropy of a node depends only on the ratio p:n of its instance classes (Lemma 1). We used the information gain as the splitting heuristic. In order to hide the leaf which corresponds to the nine positive instances (to the right of N0), we changed the nine positive instances to negative ones and denoted this operation by (−9p,+9n). As a result, the parent node N0 becomes a one-class node with minimum (zero) entropy. All nodes located upwards to node N0 until the root N4 also absorb the (−9p, +9n) operation ( Figure 3). This conversion would leave node Ν1 with 49p+46n instances. However, since its original 58p+37n distribution contributed to N1's splitting attribute AN1, which in turn created N0 (and then 9p), we should preserve the information gain of AN1, since the entropy of a node depends only on the ratio p:n of its instance classes (Lemma 1). To maintain the initial ratio of node N1 (58p:37n), an appropriate number of positive and negative instances should be added to node N1, and this addition process should be extended to the tree root by accumulating all instance requests from below at each node and by adding instances to maintain the node statistics locally, propagating these changes to the root.

Lemma 1. The entropy of a node only depends on the ratio of its instance classes. (The proof is in Appendix A)
In our previous published work [16,17], the above procedure was greedy, mostly solving the issue for only one (tree) level of nodes, which often resulted in a non-minimum number of added instances, whereas a look ahead based solution would be able to take into account all levels up to the  We used the information gain as the splitting heuristic. In order to hide the leaf which corresponds to the nine positive instances (to the right of N0), we changed the nine positive instances to negative ones and denoted this operation by (−9p,+9n). As a result, the parent node N0 becomes a one-class node with minimum (zero) entropy. All nodes located upwards to node N0 until the root N4 also absorb the (−9p, +9n) operation ( Figure 3). This conversion would leave node Ν1 with 49p+46n instances. However, since its original 58p+37n distribution contributed to N1's splitting attribute AN1, which in turn created N0 (and then 9p), we should preserve the information gain of AN1, since the entropy of a node depends only on the ratio p:n of its instance classes (Lemma 1). To maintain the initial ratio of node N1 (58p:37n), an appropriate number of positive and negative instances should be added to node N1, and this addition process should be extended to the tree root by accumulating all instance requests from below at each node and by adding instances to maintain the node statistics locally, propagating these changes to the root.

Lemma 1. The entropy of a node only depends on the ratio of its instance classes. (The proof is in Appendix A)
In our previous published work [16,17], the above procedure was greedy, mostly solving the issue for only one (tree) level of nodes, which often resulted in a non-minimum number of added instances, whereas a look ahead based solution would be able to take into account all levels up to the

Lemma 1. The entropy of a node only depends on the ratio of its instance classes. (The proof is in Appendix A)
To maintain the initial ratio of node N1 (58p:37n), an appropriate number of positive and negative instances should be added to node N1, and this addition process should be extended to the tree root by accumulating all instance requests from below at each node and by adding instances to maintain the node statistics locally, propagating these changes to the root.
In our previous published work [16,17], the above procedure was greedy, mostly solving the issue for only one (tree) level of nodes, which often resulted in a non-minimum number of added instances, whereas a look ahead based solution would be able to take into account all levels up to the root. Furthermore, the new ratios of the nodes (p:n) were not the same as they were before the change, thus propagating ratio changes, whose impact could only be quantified in a compound fashion by inspecting the final tree, hampering our ability to investigate the behaviour of this heuristic in a detailed fashion. Therefore, we used linear Diophantine equations as the formulation technique of the problem of determining how many instances to add; as we will show, this technique deals with both issues in one go.

Definition 1. A Diophantine equation is a polynomial equation where the coefficients are integers, and the solutions are integers. The most basic Diophantine equation is the linear case and is of the following form:
ax + by = c where a, b, c ∈ Z Referring to the example in Figure 3, let (x 1 , y 1 ) be the number of positive and negative instances, respectively, that must be added to node N1 (49p,46n) in order to maintain its initial ratio (58p,37n), as shown in Figure 2. This can be expressed with the following equation: The above equation is equivalent to the following linear Diophantine equation: Similarly, let (x 2 , y 2 ), (x 3 , y 3 ), (x 4 , y 4 ) be the corresponding number of positive and negative instances that have to be added to nodes N2, N3 and N4.
The corresponding linear Diophantine equations for nodes N2, N3 and N4 are: The general solutions of the above four (1-4) linear Diophantine equations are given below (k ∈ Z): From the infinite pairs of solutions for every linear Diophantine equation, we choose the pairs are the minimum natural numbers that satisfy the following condition. (C1): . Condition (C1) ensures that we have selected those pairs of solutions which have the minimum total sum of positive and negative instances that have to be added in each node, starting from the lowest node up to the root of the decision tree, given that every addition of instances propagates upwards in a consistent manner (i.e., if one adds some instances at a lower node, one cannot have added fewer instances in an ancestor node).
With this technique, we can determine precisely the minimum number of instances that must be added to each node to maintain the initial ratios of every node.
For this example, the pairs of solutions that are both minimum and satisfy the condition (C1) are Based on the above solutions, we must add 67 positive and 28 negative instances to N1, which leads to a ratio of ( 116p 74n ). These new instances propagate upwards; therefore, on N2 we do not need to add any positive instances, but we do need to add 100 (=128 − 28) negative instances, which leads to a ratio of ( 116p 274n ). Similarly, for N3 we should add 294 (=361 − 67) new positive and no negative instances. Finally, for N4 we should add 189 (=550 − 361) new positive and 322 (=450 − 128) new negative instances. With this look ahead technique, we know from the very beginning the minimum number of instances that we must add in order to maintain the exact values of initial ratios.
We note that the solutions of Diophantine Equation (4), corresponding to the node N4 (root), determine the total number of instances that should be added to our dataset to maintain the same ratios. If we change the N4 ratio slightly (in our case let be changed to ( 540p 460n ) instead of ( 541p 459n )), then we have a different Diophantine equation that results in a smaller number of additional instances. In our example, the new Diophantine Equation (5) and the corresponding set of solutions are shown below: For this example, the pairs of solutions that are both minimum and satisfy the condition C1 are Therefore, we have to add 700 new instances instead of 1000 that we had to add before. Of course, now we do not have the same ratio p:n for node N4, but something very close to it. In other words, the method of linear Diophantine equations helped us to make a holistic tree-wide trade-off between the number of added instances and the degree of precision to which we approximate the original node's ratio.

Fully Specifying Instances
For the newly added instances, setting the values of some attributes is only a partial instance specification because we did not fix those instance values for any other attribute, other than those present in the path from the root to the node where the addition of the instances occurred. Unspecified values must be set in such a way to ensure that competing attributes do not displace currently selected attributes at all nodes; this is what the Allocate-and-Set pass does.
Concerning Figure 2 and the 9n instances added due to N1 via the N2-N1 branch, these instances did not have their values set for A N1 and A N2 . These must be set accordingly to minimize the possibility that A N2 is displaced from N2, since (at N2) any of the attributes A N0 , A N1 or A N2 (or any other) can be selected. Those 9n instances were added to help guarantee the existence of N1.
As in the bottom-up pass, we needed the information gain of A N2 to be sufficiently large to prevent competition from A N0 or A N1 at node N2, but not too large to threaten A N3 . We started with the best possible allocation of values to attribute A N2 , and gradually explored directing some values along the N2-N1 branch, stopping when the information gain for A N2 was lower than the information gain for A N3 . We use the term two-level hold-back to refer to this technique because it covers two levels of the tree. This approach takes advantage of convexity property of the information gain difference function (Lemma 2).
The Allocate-and-Set pass examines all four distribution combinations of all positive and all negative instances to one branch, selecting the one that maximizes the information gain difference and then moving along the slope that reduces the information gain, until we do not exceed the parent's information gain. Following this, the recursive specification was performed all the way to the tree fringe.

Hiding in Parallel: Grouping of Hiding Requests
By processing hiding requests serially, each entails the full cost of updating the instance population. By knowing them in advance, we only consider every node once in the bottom-up and once in the top-down pass. We express that dealing with all hiding requests in parallel leads to the minimum number of new instances by The formula states that for a tree T, after a parallel (Tp) hiding process of all rules (leaves) in R, the number of instances (|T|) is the optimum along all possible orders of all serial (Ts) hiding requests drawn from R. A serial hiding request is carried out by selecting a leaf to be hidden, after which the remaining leaves are treated recursively. Lemma 3. When serially hiding two non-sibling leaves, the number of new instances to be added to maintain the max:min ratios is larger or equal to the number of instances that would have been added if the hiding requests were handled in parallel. (The proof is in Appendix A)

Results
In this section we demonstrate an example in which two hiding requests were handled in parallel, using the proposed look ahead technique of linear Diophantine equations. In Figure 4, we show the original tree with class distributions on nodes.
We used the information gain as the splitting heuristic. To hide the leaf which corresponds to the ten positive instances (to the left of N0), we changed the ten positive instances to negative ones and denoted this operation by (−10p, +10n). As a result, the parent node N0 became a one-class node with minimum (zero) entropy. All nodes located upwards of node N0 until the root N4 also absorbed the (−10p, +10n) operation ( Figure 5). This conversion left N1 with 48p + 47n instances. But, as its initial 58p + 37n distribution contributed to N1's splitting attribute, AN1, which in turn created N0 (and then 10p), we preserved the information gain of AN1, since the entropy of a node only depends on the ratio p:n of its instance classes.
To hide the leaf which corresponds to the five negative instances (to the right of N0'), we changed the five negative instances to positive ones and denote this operation by (+5p, −5n). As a result, the parent node N0' became a one-class node with minimum (zero) entropy. All nodes located upwards to node N0' until the root N4 also absorbed the (−10p, +10n) operation ( Figure 5). The intersection node N3 and the root N4 were affected by (−5p, +5n), which is the total outcome of the two operations from the two subtrees below N3. To hide the leaf which corresponds to the five negative instances (to the right of N0'), we changed the five negative instances to positive ones and denote this operation by (+5p, −5n). As a result, the parent node N0' became a one-class node with minimum (zero) entropy. All nodes located upwards to node N0' until the root N4 also absorbed the (−10p, +10n) operation ( Figure 5). The intersection node N3 and the root N4 were affected by (−5p, +5n), which is the total outcome of the two operations from the two subtrees below N3.
This conversion left Ν1' with 125p + 45n instances. But, as its initial 120p + 50n distribution contributed to N1's splitting attribute, AN1', which in turn created N0' (and then 5n), we preserved the information gain of AN1', since the entropy of a node only depends on the ratio p:n of its instance classes.
In order to maintain the ratio of nodes N1 and N1', we had to add an appropriate number of positive and negative instances to N1, N1' and extend this addition process to the tree root, by accumulating at each node all instance requests from below and by adding instances locally to maintain the node statistics, propagating these changes to the tree root.   To hide the leaf which corresponds to the five negative instances (to the right of N0'), we changed the five negative instances to positive ones and denote this operation by (+5p, −5n). As a result, the parent node N0' became a one-class node with minimum (zero) entropy. All nodes located upwards to node N0' until the root N4 also absorbed the (−10p, +10n) operation ( Figure 5). The intersection node N3 and the root N4 were affected by (−5p, +5n), which is the total outcome of the two operations from the two subtrees below N3.
This conversion left Ν1' with 125p + 45n instances. But, as its initial 120p + 50n distribution contributed to N1's splitting attribute, AN1', which in turn created N0' (and then 5n), we preserved the information gain of AN1', since the entropy of a node only depends on the ratio p:n of its instance classes.
In order to maintain the ratio of nodes N1 and N1', we had to add an appropriate number of positive and negative instances to N1, N1' and extend this addition process to the tree root, by accumulating at each node all instance requests from below and by adding instances locally to maintain the node statistics, propagating these changes to the tree root.  This conversion left N1' with 125p + 45n instances. But, as its initial 120p + 50n distribution contributed to N1's splitting attribute, AN1', which in turn created N0' (and then 5n), we preserved the information gain of AN1', since the entropy of a node only depends on the ratio p:n of its instance classes.
In order to maintain the ratio of nodes N1 and N1', we had to add an appropriate number of positive and negative instances to N1, N1' and extend this addition process to the tree root, by accumulating at each node all instance requests from below and by adding instances locally to maintain the node statistics, propagating these changes to the tree root.
Let (x 1 , y 1 ) be the number of positive and negative instances, respectively, that should be added to node N1 to maintain its initial ratio. This can be expressed with the following equation: The above equation is equivalent to the following linear Diophantine equation: Similarly, let (x 2 , y 2 ), x 1 , y 1 , (x 2 , y 2 ), (x 3 , y 3 ), (x 4 , y 4 ) be the corresponding number of positive and negative instances that should be added to nodes N2, N1', N2', N3 and N4.
(C1): x 1 ≤ x 2 and y 1 ≤ y 2 and x 1 ≤ x 2 and y 1 ≤ y 2 (C2): x 2 + x 2 ≤ x 3 ≤ x 4 and y 2 + y 2 ≤ y 3 ≤ y 4 Condition (C1) ensures that we have selected the optimum path from the leaves up to the intersection node N3 of the decision tree.
Condition (C2) ensures that we have selected the optimum path from one level below the intersection node N3 (N2, N2') up to the root. Based on the above solutions, we should add to N1 68 positive and 27 negative instances. These new instances propagate upwards; therefore, at N2 we did not need to add any positive instances, but we needed to add 100 (=127 − 27) negative instances. In the same manner, we needed to add to N1' seven positive and ten negative instances. These new instances propagate upwards; therefore, on N2' we needed to add 86 (=93 − 7) positive instances and 26 (=36 − 10) negative instances.
Similarly, for N3, which is an intersection node, we should add the corresponding instances from its children ( Based on this example, we observe that by using this technique we can handle more than one hiding request without any increase in the number of instances that need to be added. An algorithm that processes in parallel n hiding requests is described in detail in Appendix B. We have also developed a prototype to demonstrate the validity of our arguments on a small scale example [22]. This implementation demonstrates the use of Diophantine equations on our technique and consists of only one part of our proposed method. For that reason, there is no need at this stage to use real datasets.

Brief Discussion and Conclusions
We have presented a new look ahead technique for deciding how many instances to add to a decision tree to hide a specific rule. By using linear Diophantine equations instead of the previously used greedy approach, our heuristic allows one to specify which decision tree leaves should be hidden, and then intelligently add instances to the original dataset so that the next time one tries to build the decision tree (with the same induction algorithm), the to-be-hidden nodes will have disappeared, as the instances corresponding to these nodes will have been absorbed by their neighbors.
Two fully-fledged examples of the proposed approach have been presented: One for a single request and the other for two parallel hiding requests. Also, we have introduced an algorithm for n parallel hiding requests for this look ahead technique.
With regards to performance aspects, besides speed, the issue of assessing the similarity of the original tree to the one produced after the above procedure has been applied must also be considered. Another issue of substantial importance is syntactic similarity [23] (comparing the data structures-or parts thereof-themselves) or semantic similarity (comparing against reference datasets), which will also help settle questions of which heuristics work better and which not.
As the number of instances to be added is a primary index of the heuristic's quality, a reasonable direction for investigation is to determine the appropriate ratio values, which result in smaller integer solutions of the corresponding linear Diophantine equations, but at the same time do not deviate too much from the structure of the original tree. This suggests the adoption of approximate ratios instead of exact ones, and also raises the potential to investigate the trade-off between dataset increase and tree similarity further.
The medium-term development goal is to have the proposed technique implemented as a standard data engineering service to accommodate hiding requests, coupled with an appropriate environment where one could specify the importance of each hiding request. On research and development aspects, the most pressing questions are related to the ability to handle multi-valued (also numeric) attributes and multi-class trees. From a research perspective, the most important issue relates to whether the problem can be framed in terms of (integer) constraints, so that we may be able to turn to the repertoire of constraint satisfaction techniques, and to whether one can devise metrics of tree similarity (syntactic or semantic) that utilize the locality of the operations employed by the proposed method. The apparent rise in interest of privacy-preserving solutions suggests that systems with theoretical backing can be expected to appear increasingly often.

Conflicts of Interest: The authors declare no conflict of interest.
Appendix A Lemma 1. The entropy of a node only depends on the ratio of its instance classes.
Proof. Let E be the entropy of a node with p positive and n negative instances, with p : n = a. We assume that a ≥ 1.
p i : The probability of class i E = − ∑ p i log 2 p i = − n p + n log 2 n p + n − p p + n log 2 p p + n = p=an = − n an + n log 2 n an + n − an an + n log 2 an an + n =

Lemma 2. Distributing new class instances along only one branch maximizes information gain.
Proof. Let G(i) be the function that represents the information gain after the addition of k new positive instances at a node, and the distribution of i of these nodes to the left child and of k − i to the right child.
Taking the first derivative of G(i) we have Then we find the second derivative of G(i) The proof of Lemma 2 is completed due to a theorem that states "A convex function on a closed bounded interval attains its maximum at one of its endpoints." Lemma 3. When serially hiding two non-sibling leaves, the number of new instances to be added to maintain the max:min ratios is larger or equal to the number of instances that would have been added if the hiding requests were handled in parallel.
Proof. Let a parent node have p positive and n negative instances, with p : n = r ≥ 1, and let two hiding requests, each from a different branch, propagate to that node, requiring that p L , n L (from the left child) and p R , n R (from the right child) instances be added. Now assume that to maintain the p:n ratio, we must add (parallel) p X or n X instances to parent node instead of adding first p 1 or n 1 (in order to control the change due to left child) and then p 2 or n 2 (to control the change due to right child). First, we create a Table A1 with all (32) possible cases and then we have to prove Lemma 3 for each of these cases. Table A1. All possible cases to compare parallel and l serially addition for new instances.

Parallel Serially
The reason that we selected option (I), i.e., a+p X I b is that p X I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I I ≥ p X I ( * ). If we had selected option (I I), i.e., b a+p X I I , we would have case (II-2).
If p X I ≥ p 1 , then this case is impossible because all terms in the left-hand side in the above equation are positive. If p X I < p 1 then we have p 1 + n 2 > p X I Q.E.D.
The reason that we selected option (I), i.e., a+p X I b is that p X I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I I ≥ p X I ( * ). If we had selected option (I I), i.e., b a+p X I I , we would have case (II-4), for which it had been proved that p 1 + n 2 > p X I I ( * * ) From (*), (**) above, we have p 1 + n 2 > p X I Q.E.D.
• Case (I-5): a + p X I b = a b + n 1 + n 2 ⇔ ⇔ ab + a(n 1 + n 2 ) + bp X I + p X I (n 1 + n 2 ) = ab ⇔ ⇔ a(n 1 + n 2 ) + bp X I + p X I (n 1 + n 2 ) = 0 This case is impossible because all the terms in the left-hand side in the above equation are positive. • Case (I-6): The reason that we selected option (I), i.e., a+p X I b is that p X I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I = min p X I , p X I I , n X I I I , n X IV , meaning that n X IV ≥ p X I ( * ). If we had selected option (IV), i.e., b+n X IV a we would have case (IV-6), i.e. b + n X IV a = b + n 1 + n 2 a ⇔ n X IV = n 1 + n 2 ( * * ) From (*), (**) above, we have n 1 + n 2 > p X I Q.E.D.

•
Case (I-7): a + p X I b = a + p 2 b + n 1 ⇔ ⇔ ab + an 1 + bp X I + n 1 p X I = ab + bp 2 ⇔ ⇔ an 1 + b p X I − p 2 + n 1 p X I = 0 If p X I ≥ p 2 then this case is impossible, since the left-hand side in the above equation is positive.
If p X I < p 2 then we have p 2 + n 1 > p X I Q.E.D.
The reason that we selected option (I), i.e., a+p X I b is that p X I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I I ≥ p X I ( * ). If we had selected option (I I), i.e., b a+p X I I we would have case (II-8), for which it had been proved that The reason that we selected option (I I), i.e., b a+p X I I is that p X I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ p X I I ( * ). If we had selected option (I), i.e., we would have the case (I-1), and for that it had been proven that From (*), (**) above, we have p 1 + p 2 ≥ p X I I Q.E.D.
The reason that we selected option (I I), i.e., b a+p X I I is that p X I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ p X I I ( * ). If we had selected option (I), i.e., we would have the case (I-3), and for that it had been proved that From (*), (**) above, we have p 1 + n 2 ≥ p X I I Q.E.D. • Case (II-4): b a + p X I I = b + n 2 a + p 1 ⇔ ⇔ ab + bp 1 = ab + an 2 + bp X I I + n 2 p X I I ⇔ ⇔ an 2 + n 2 p X I I + b p X I I − p 1 = 0 If p X I I ≥ p 1 then this case is impossible, since the left-hand side in the above equation is positive.
If p X I I < p 1 then we have p 1 + n 2 > p X I I Q.E.D.
The reason that we selected option (I I), i.e., b a+p X I I is that p X I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I I = min p X I , p X I I , n X I I I , n X IV , meaning that n X I I I ≥ p X I I ( * ). If we had selected option (I I I), i.e., a b+n X I I I we would have case (III-5), and for that it had been proved that n 1 + n 2 = n X I I I ( * * ) From (*), (**) above, we have n 1 + n 2 ≥ p X I I Q.E.D.
• Case (II-6): b a + p X I I = b + n 1 + n 2 a ⇔ ⇔ ab = ab + a(n 1 + n 2 ) + bp X I I + p X I I (n 1 + n 2 ) ⇔ ⇔ a(n 1 + n 2 ) + bp X I I + p X I I (n 1 + n 2 ) = 0 This case is impossible because all the terms in the left-hand side in the above equation are positive.
The reason that we selected option (I I), i.e., b a+p X I I is that p X I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, p X I I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ p X I I ( * ). If we had selected option (I), i.e., we would have the case (I-7), and for that it had been proved that p 2 + n 1 > p X I ( * * ) From (*), (**) above, we have p 2 + n 1 ≥ p X I I Q.E.D.
⇔ ab + bp 2 = ab + an 1 + bp X I I + n 1 p X I I ⇔ ⇔ an 1 + n 1 p X I I + b p X I I − p 2 = 0 If p X I I ≥ p 2 then this case is impossible, since the left-hand side in the above equation is positive.
If p X I I < p 2 then we have p 2 + n 1 > p X I I Q.E.D. • Case (III-1): a b + n X I I I = a + p 1 + p 2 b ⇔ ⇔ ab = ab + b(p 1 + p 2 ) + an X I I I + n X I I I (p 1 + p 2 ) ⇔ ⇔ b(p 1 + p 2 ) + an X I I I + n X I I I (p 1 + p 2 ) = 0 This case is impossible because all the terms in the left-hand side in the above equation are positive. • Case (III-2): a b + n X I I I = b a + p 1 + p 2 The reason that we selected option (I I I), i.e., a b+n X I I I is that n X I I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X I I I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I I ≥ n X I I I ( * ). If we had selected option (I I), i.e., b a+p X I I we would have case (II-2), and for that it had been proved that p X I I = p 1 + p 2 ( * * ) From (*), (**) above, we have p 1 + p 2 ≥ p X I I Q.E.D.
• Case (III-3): a b + n X I I I = a + p 1 b + n 2 ⇔ ⇔ ab + an 2 = ab + bp 1 + an X I I I + p 1 n X I I I ⇔ ⇔ bp 1 + p 1 n X I I I + a n X I I I − n 2 = 0 If n X I I I ≥ n 2 then this case is impossible, since the left-hand side in the above equation is positive.
If n X I I I < n 2 then we have n 2 + p 1 > n X I I I Q.E.D.
The reason that we selected option (I I I), i.e., a b+n X I I I is that n X I I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X I I I = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ n X I I I ( * ). If we had selected option (I), i.e., a+p X I b we would have case (I-4), and for that it had been proved that p 1 + n 2 > p X I ( * * ) From (*), (**) above, we have p 1 + n 2 ≥ n X I I I Q.E.D. • Case (III-5): a b + n X I I I = a b + n 1 + n 2 ⇔ n X I I I = n 1 + n 2 Q.E.D.
• Case (III-6): The reason that we selected option (I I I), i.e., a b+n X I I I is that n X I I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X I I I = min p X I , p X I I , n X I I I , n X IV meaning that n X IV ≥ n X I I I ( * ). If we had selected option (IV), i.e., b+n X IV a we would have the case (IV-6), and for that it had been proved that n 1 + n 2 = n X IV ( * * ) From (*), (**) above, we have n 1 + n 2 ≥ n X I I I Q.E.D.
• Case (III-7): a b + n X I I I = a + p 2 b + n 1 ⇔ ⇔ ab + an 1 = ab + bp 2 + an X I I I + p 2 n X I I I ⇔ ⇔ bp 2 + p 2 n X I I I + a n X I I I − n 1 = 0 If n X I I I ≥ n 1 then this case is impossible, since the left-hand side in the above equation is positive.
If n X I I I < n 1 then we have n 1 + p 2 > n X I I I Q.E.D.
The reason that we selected option (I I I), i.e., a b+n X I I I , is that n X I I I is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X I I I = min p X I , p X I I , n X I I I , n X IV meaning that p X I I ≥ n X I I I ( * ). If we had selected option (I I), i.e., b a+p X I I we would have case (II-8), and for that it had been proved that p 2 + n 1 > p X I I ( * * ) From (*), (**) above, we have p 2 + n 1 > n X I I I Q.E.D.
The reason that we selected option (IV), i.e., b+n X IV a , is that n X IV is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X IV = min p X I , p X I I , n X I I I , n X IV meaning that p X I I ≥ n X IV ( * ). If we had selected option (I I), i.e., b a+p X I I we would have case (II-1), and for that it had been proved that p 1 + p 2 ≥ p X I I ( * * ) From (*), (**) above, we have p 1 + p 2 ≥ n X I I I Q.E.D. • Case (IV-2): b + n X IV a = b a + p 1 + p 2 ⇔ ⇔ ab = ab + b(p 1 + p 2 ) + an X IV + n X IV (p 1 + p 2 ) ⇔ ⇔ b(p 1 + p 2 ) + an X IV + n X IV (p 1 + p 2 ) = 0 This case is impossible because all the terms in the left-hand side in the above equation are positive.
The reason that we selected option (IV), i.e., b+n X IV a is that n X IV is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore,n X IV = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ n X IV ( * ). If we had selected option (I), i.e., a+p X I b we would have case (I-3), and for that it had been proved that p 1 + n 2 > p X I ( * * ) From (*), (**) above, we have p 1 + n 2 ≥ n X IV Q.E.D. • Case (IV-4): b + n X IV a = b + n 2 a + p 1 ⇔ ⇔ ab + an 2 = ab + bp 1 + an X IV + p 1 n X IV ⇔ ⇔ bp 1 + p 1 n X IV + a n X IV − n 2 = 0 If n X IV ≥ n 2 then this case is impossible, since the left-hand side in the above equation is positive.
If n X IV < n 2 then we have n 2 + p 1 > n X IV Q.E.D. • Case (IV-5): b + n X IV a = a b + n 1 + n 2 The reason that we selected option (IV), i.e., b+n X IV a is that n X IV is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X IV = min p X I , p X I I , n X I I I , n X IV meaning that n X I I I ≥ n X IV ( * ). If we had selected option (I I I), i.e., a+p X I b we would have case (III-5), and for that it had been proved that n X I I I = n 1 + n 2 ( * * ) From (*), (**) above, we have n 1 + n 2 ≥ n X IV Q.E.D. • Case (IV-6): b + n X IV a = b + n 1 + n 2 a ⇔ n X IV = n 1 + n 2 Q.E.D.
• Case (IV-7): b + n X IV a = a + p 2 b + n 1 The reason that we selected option (IV), i.e., b+n X IV a is that n X IV is the minimum number of instances to add in order to maintain the ratio in the parent node. Therefore, n X IV = min p X I , p X I I , n X I I I , n X IV , meaning that p X I ≥ n X IV ( * ). If we had selected option (I), i.e., a+p X I b we would have case (I-7), and for that it had been proven that p 2 + n 1 > p X I ( * * ) From (*), (**) above, we have p 2 + n 1 > p X I ≥ n X IV Q.E.D. • Case (IV-8): b + n X IV a = b + n 1 a + p 2 ⇔ ⇔ ab + an 1 = ab + bp 2 + an X IV + p 2 n X IV ⇔ ⇔ bp 2 + p 2 n X IV + a n X IV − n 1 = 0 If n X IV ≥ n 1 then this case is impossible, since the left-hand side in the above equation is positive.
If n X IV < n 1 then we have n 1 + p 2 > n X IV Q.E.D.
An algorithm that handles, in parallel, n hiding requests We present an algorithm in this section which implements the look ahead technique by using linear Diophantine equations in the Swap-and-Add pass.
For every node N, we initialize a flag indicating what needs to be done as follows: N.affected:= DO_NOTHING We also initialize a hiding set with all leaves selected for hiding as follows: For all L in hiding-set L.affected:= HIDE Other possible values for the affected flag are: MAKE_LEAF (if possible, for parents of leaves which will be hidden), and ADJUST_RATIO (for internal nodes on the path from the hidden lead to the root).
The algorithm traverses the decision tree in a depth-first-search fashion. It pushes nodes into the classic DFS stack, so that all nodes (including leaves to be hidden) are popped and processed before their ancestors; this allows us to propagate the need for change upwards.