XOR-Based Codes for Private Information Retrieval with Private Side Information

We consider the problem of Private Information Retrieval with Private Side Information (PIR-PSI), wherein a user wants to retrieve a file from replication based non-colluding databases by using the prior knowledge of a subset of the files stored on the databases. The PIR-PSI framework ensures that the privacy of the demand and the side information are jointly preserved, thereby finding potential applications when multiple files have to be downloaded spread across different time-instants. Although the capacity of the PIR-PSI setting is known, we observe that the underlying capacity achieving code construction uses Maximum Distance Separable (MDS) codes thereby contributing to high computational complexity when retrieving the demand. Pointing at this drawback of MDS-based PIR-PSI codes, we propose XOR-based PIR-PSI codes for a simple yet non-trivial setting of two non-colluding databases and two side information files at the user. While our codes offer substantial reduction in complexity when compared to MDS based codes, the code-rate marginally falls short of the capacity of the PIR-PSI setting. Nevertheless, we show that our code-rate is strictly higher than that of XOR-based codes for PIR with no side information, thereby implying that our codes can be useful when downloading multiple files in a sequential manner, instead of applying XOR-based PIR codes on each file.


I. Introduction
Private Information Retrieval (PIR) deals with the design of queries to a database so as to provide privacy to the messages downloaded by the user.A trivial way of achieving information-theoretic privacy is to download all the messages stored in the database so that the identity of the demand, i.e., the message the user wants, will be unknown.However, it is well known that this increases the download cost substantially.Ever since the problem of PIR was first introduced in [1], various methods have been introduced to efficiently retrieve the demand with privacy, including the code constructions that achieve the capacity of the classic PIR problem [2].
Since the contribution of [2], numerous variants of the PIR problem have gained attention [3]- [5].Among them, an important variant is the problem of PIR with side information.In this setting, the user already knows one or more messages in the database, and she uses this information to reduce the download cost compared to that without side information.The work on PIR with side information began with the cache-aided PIR in [7].Other crucial works on PIR-SI were followed in [9]- [11].A prominent variant of PIR with side information is the problem of PIR with private side information (PIR-PSI), wherein the privacy of both the demand and the side information are jointly preserved, i.e., the database should not know which message is being queried by the user and which side information is available at her.The application of PIR-PSI has practical significance when users query multiple messages privately from a database.Suppose a user wants to query three messages A, B, C, sequentially with full privacy.One way is to query these three messages separately using the capacity-achieving fully private scheme [2].Alternatively, after retrieving A using [2], the user can use the prior information on A to download B with a reduced download cost compared to downloading B without side information.Subsequently, with the side information on A and B, she can further reduce the download cost for retrieving C.This way, as the number of side information messages increases, the download cost for a particular demand can potentially decrease.For these applications, PIR-PSI protocol ensures that demand is downloaded with the help of side information, while jointly preserving the privacy of both the demand and the side information.
A solution for PIR-PSI was first developed by Kadhe et al. in [8] for the single database setting.Chen et al. further extended it in [12] for multiple databases in colluding and noncolluding environments.Both their schemes achieved capacity, however, both the schemes for single database and multiple databases mentioned in [8] and [12], respectively, rely on Maximum Distance Separable (MDS) codes.From the viewpoint of implementation, it is well known that any MDS-coded scheme would have high computational complexity compared to counterpart code that is constructed using XOR bit additions.For example, consider the code for two non-colluding databases with three messages A, B, C, with a length of eight bits per message.Let A be the demand and C be the side information for a particular user.As mentioned in [12,Section 4.1.1],the query would initially be in the form of Sun-Jafar's capacity-achieving scheme mentioned in [2], with four bits of A retrieved out of seven bits queried from a database.This query of seven bits would be converted to a (7,13) systematic MDS code, e.g., a Reed-Solomon code, and the six bits of the non-systematic part would be downloaded.With the help of the side information C, the user can retrieve four demand bits from these six bits, thereby improving the rate.Here, the encoding operation for Reed Solomon code would require 91 finite field multiplications and 78 finite field additions, which in turn are computation-intensive tasks compared to executing four XOR bit additions in Sun-Jafar's capacity-achieving scheme.In general, any MDS code for PIR-PSI would depend on the message length L, as seen above, and L itself depends on N K (where N is the number of databases and K is the number of messages).Therefore, the computational complexity will exponentially increase as the number of messages increases.
Capacity-achieving techniques mentioned in [8] and [12] are essential to the PIR-PSI problem, however, from the viewpoint of practicality, for a large number of messages, the number of computations would get prohibitively large to create a huge latency in downloading the demand.Therefore, existing PIR-PSI codes are not amenable to implementation in applications that demand low-latency downloads.On the other hand, using a fully private low complexity scheme like [2] would not be a good idea as this would affect the rate for not exploiting the side information.This implies that one needs to think about constructing PIR-PSI codes based only on XOR computations, and yet providing rates strictly more than [2], preferably achieving the capacity of the PIR-PSI problem.

A. Contributions
Inspired by the problem statement discussed above, we make the following contributions in this paper: • We present the first XOR-based code construction for the PIR-PSI problem.Our code construction is applicable for the scenario when K messages are replicated across N = 2 non-colluding databases, and the user wishes to retrieve a message with M = 2 side information at her side.
• Owing to the use of XOR-based queries, we show that our codes offer substantial reduction in the decoding complexity compared to the MDS based counterparts of [12].However, the rate of our codes marginally fall short of the capacity of the PIR-PSI problem for the setting of N = 2 non-colluding databases with M = 2 side information at the user.On the other hand, the rates of our codes are strictly higher than that of the fully private codes [2] thereby advertising themselves as a preferred choice when sequentially downloading multiple messages from the databases.For N = 2 and M = 2, a comparison between existing solutions for PIR-PSI and our solution is captured in Table I.
• For the proposed code construction, we prove the joint privacy property by explicitly showing that the query to one of the databases can be fixed, and then the query to the other database can be modified in such a way that any combination of side information can be used to retrieve any demand from the two databases.

MDS
High.Exponential as (higher than coding K increases.Finite field our code) multiplication involved.Our

XOR
Low. Bit-wise XOR involved (lower than our code) Table I The rest of the paper concerns the following sections.Section II presents the problem statement and related notations.Section III presents the code construction, whereas Section IV provides the proof for joint privacy of the XOR-based codes for arbitrary values of K. Section V exemplifies the privacy proof for the case when K = 7.Finally, some directions for future research are presented in Section VI.

II. Problem Statement
Consider two replicated non-colluding databases, N 1 and N 2 with K messages namely X 1 , X 2 , . . ., X K−1 , X K .All the K messages are of size L bits which are independent and uniformly distributed.Therefore, we denote the i-th file X i as X i = [X i,1 , X i,2 , . . ., X i,L ].Among the K messages, let X γ , for γ ∈ [K], be the demand message for a user who already has M = 2 side information messages X α and X β , such that α, β ∈ [K]\{γ} satisfying α = β.In this model the messages other than X γ , X α and X β are referred to as byproducts.The user wishes to leverage the knowledge of X α and X β to retrieve X γ by downloading fewer bits from each database compared to retrieving X γ using the scheme in [2] without using the side information.At the same time, the user also wants to jointly preserve the privacy of the demand X γ and the side information messages X α and X β .i.e., the indexes γ, α and β should be unknown to both databases.In the context of this work, the code for retrieving X γ should be such that each database provides the user a set of XOR additions of the bits of the K messages.This way, the user would be able to retrieve X γ by performing simple XOR bit additions on the downloaded bits while using the prior knowledge of X α and X β .
Towards constructing an XOR-based code for PIR-PSI, we make the following definitions.In order to retrieve the demand X γ using the side information X α and X β , every combination of message bits submitted to a database is referred to as codeword, the set of codewords submitted to a database is called a query, and the union of queries submitted to N 1 and N 2 is referred to as the code.When designing the codewords of a query it is important to find the right set of XOR combinations of the K message indexes, and then choose the appropriate bit locations of the message indexes in the XOR combinations.To formally describe the XOR combinations of the message indexes, we introduce the following definitions.
Definition 1: For a finite set M = {M 1 , M 2 , . . ., M P } containing P distinct variables, we define a mapping Φ such that Φ(M) = P i=1 M i if P ≥ 1, and Φ(M) = φ if M is empty set.
Definition 2: With X = {X 1 , X 2 , . . ., X K }, we define a power set based skeleton structure P (X ) = {Φ(V) | ∀ V ∈ P S(X )}, where Φ(•) is as introduced in Definition 1, and P S(X ) is the power set of X .
From Definition 2, it is clear that the queries (without specifying the bit locations of each message) submitted to N 1 and N 2 must be subsets of P (X ) \{φ}, wherein the + operator in Φ is treated as XOR operation.Henceforth, throughout the paper, the queries submitted to N 1 and N 2 to retrieve X γ using the side information X α and X β are denoted by C 1 and C 2 , respectively, and the overall code is denoted by (C 1 , C 2 ).To propose a formal design criteria to choose the subsets of P (X ) \{φ} as queries, we define a singleton block in a query as the set of codewords that consist of only a single message bit.Similarly, we define an n-tuple sum block in a query, for 2 ≤ n < K, as the set of codewords that consist of XOR bit additions of n different messages.

A. Design Criteria for XOR-based PIR-PSI Codes
For N = 2, M = 2 and any K ≥ 3, let (C 1 , C 2 ) be an XOR-based code to retrieve X γ using the side information X α and X β The code (C 1 , C 2 ) is said to be XOR-based PIR-PSI code if keeping the query to N 1 (or N 2 ) unchanged, it is possible to design a query to N 2 (or N 1 ) such that 1) Condition 1: any demand can be retrieved from N 1 and N 2 using any two side information messages.2) Condition 2: the structure of the new query to N 2 (or N 1 ) is same as that of C 2 (or C 1 ), namely: (i) the number of singleton and n-tuple sum blocks is the same, for any 2 ≤ n ≤ K − 1, and (ii) the frequency distribution of the message bits across all the codewords in the query is the same as that of C 2 (or C 1 ).We take a two-step approach to design codes satisfying the above criteria.First, we present the queries of the code for a given demand X γ and a pair of side information X α and X β , and prove the correctness of the construction in retrieving X γ .Subsequently, we present a rigorous proof to show that keeping the query to N 1 (or N 2 ) unchanged, new queries to N 2 (or N 1 ) can be constructed by satisfying Condition 1 and Condition 2. We show that the rate of code is more than that of [2], implying that our codes can be used when sequentially downloading multiple files.The construction procedure for XOR-based PIR-PSI code is explained in the next section.

III. XOR-Based PIR Codes with Private Side Information
Among the K messages, let X 1 be the demand, and X 2 and X 3 be the side information.With M = 2, our construction is applicable only for K ≥ 3.For K = 3, the code construction is trivial with rate one since an XOR version of all the files can be downloaded.For K > 3, the ingredients and the instructions provided in the next two sections must be followed to obtain the queries for N 1 and N 2 .Although the individual bits of the K files will be used to retrieve the L bits of X 1 , first, our construction provides a way to place the file index in the query, and then describes a way to choose the specific bits of each file in the query.Along with the steps for ingredients and code construction, a running-example for K = 4, with messages A, B, C, D is also presented, wherein A plays the role of demand, i.e., X 1 , B and C play the role of side information, i.e., X 2 , and X 3 .

A. Ingredients and Construction Strategy
With X = {X 1 , X 2 , . . ., X K−1 }, we construct P (X ) = {Φ(V) | ∀ V ∈ P S(X )}, where Φ(•) is as introduced in Definition 1, P S(X ) is the power set of X .In the context of the running-example with K = 4, we have X = {A, B, C}, and therefore, P ({A, B, C}) is given in Table II.
Owing to the use of power set, the elements of P (X ) are unique, generating a total of 2 K−1 elements by using 2 K−2 copies of each message.Towards converting P (X ) into a query, we will allocate distinct indexes for each copy of the message, thereby resulting in 2 K−2 unique bits of a message.This will ensure that the query at this stage will contain K − 1 messages each containing L = 2 K−2 bits.In order to prepare the desired query with K messages, we need to add 2 K−2 copies of the message X K to the existing elements of P (X ).In the context of the example, as seen in Table II, 4 bits of A, B, C are present in P ({A, B, C}).Therefore, 4 copies of D should be added at different positions.In the next section, we provide a set of instructions to add X K .
, where operator can be defined on two sets M 1 and M 2 as such that φ + φ = φ, α + φ = α and φ + β = β.For the example with K = 4, it is straightforward to verify that applying operator on P ({A, B}) and P ({C}), as given in Table III, gives P ({A, B, C}) given in Table II.

Column 1 Column 2 C C
Table IV 3. Add X K to all the entries of Column 1. Leave Column 2 unaltered.Through this step, out of the 2 K−2 copies, 2 K−3 − 1 copies of X K are added.For the example, the two columns are as shown in Table V 5. Skip this step if K <= 5 since largest value of n for the n-tuple sum block is 2.This is already addressed in the previous step.
• If K > 7, perform operation between {X 1 , X 2 } and all the entries in the n-tuple sum block, for n ∈ {4, 5, . . ., K − 3} of Column 1, and append the result in Column 1'.Similarly, perform operation between {φ, X 1 + X 2 } and all the entries in the n-tuple sum block, for n ∈ {3, 4, . . ., K − 4}, of Column 2, and append the result in Column 2'.However, for the (K − 2)-tuple sum block in Column 1, perform operation with {φ, X 1 + X 2 } and append the result in Column 1'.Similarly, for the (K − 3)-tuple sum block in Column 2, perform operation {X 1 , X 2 } and append the result in Column 2'.This step is not applicable to the running-example since K = 4.At the end of this step, both Column 1' and Column 2' contain 2 K−2 − 2 elements each owing to the operation.With this, we highlight that the union of Column 1'and Column 2' has generated 2 K−1 − 4 elements of P ({X 1 , X 2 , . . ., X K−1 }).This does not include {φ, X 1 , X 2 , X 1 + X 2 } since φ of P ({X 3 , X 4 , . . ., X K−1 }) was excluded when constructing Column 1 and Column 2 in Step 2. Furthermore, since X K was already added to Column 1 (containing 2 K−3 − 1 elements), at the end of Step 5, Column 1' contains 2 K−2 − 2 copies of X K .This implies that only two more copies of X K are to be added to ensure that each message has 2 K−2 copies.In the example, Step 3 added one copy of D, and Step 4 made it two copies.Now two more copies are remaining.
6. From {φ, X 1 , X 2 , X 1 + X 2 }, omit φ, and add X K to X 2 .This generates {X 1 , X 2 + X K , X 1 + X 2 }.Place these elements in a new column, referred to as Column 3. Note that one more copy of X K must be added to achieve 2 K−2 copies.In the example, Column 3 is given in Table VII.

Column 3
A B+D A+B Table VII 7. Form the query to N 1 by taking the union of all the elements in Column 1', Column 2' and Column 3. By construction, this set contains X 1 +X 3 coming from Column 2'.Therefore, add the last remaining copy of X K to X 1 + X 3 , and update it as X 1 + X 3 + X K .Thus, we have a total of 2 K−1 − 1 elements in this query constructed by adding 2 K−2 copies of K files {X 1 , X 2 , . . ., X K }.Finally, in the query to N 1 , provide distinct indexes to every copy of a message thereby ensuring that every bit of a message is used only once.Overall, the query is a set of linear combinations of In the example, the query to N 1 after following the above steps are given in Table VIII (without indexes) and Table IX (with indexes).

QUERY TO N1 WITHOUT INDEXES
Table IX Before we present the procedure to construct the query to N 2 , we present special structures in the query defined as the known byproduct combination and the unknown byproduct combination, and then present some results on their structure.Known byproduct combination are the bit combinations of byproducts in a codeword that does not contain the demand index.Formally, if X 1 is the demand, X 2 , X 3 are the side information, and the format of the codeword is H + W , where H ∈ P ({X 2 , X 3 }) and W ∈ P ({X 4 , X 5 , . . ., X K })\{φ}, then W is the known byproduct combination.Informally, this is the combination of byproduct messages that can be retrieved from the database.Unknown byproduct combinations are the bit combinations of the byproducts in a codeword that contains the demand index with or without the side information message bits.Formally, if X 1 is the demand, X 2 , X 3 are the side information, and the codeword is of the form Informally, this is the combination of byproduct messages that cannot be retrieved from the database due to the unknown demand index that is along with it.
Proposition 1: If the message combination W ∈ P ({X 4 , X 5 , . . ., X K })\{φ} appears as an unknown byproduct in the query to N 1 , then it also exists as a known byproduct, however, with different index values on each message.
Proof: By definition, if W is an unknown byproduct, then it appears in the query along with the demand X 1 .Since W may appear alone or along with the side information messages, we shall denote the query associated with the unknown byproduct as From the code construction, U must be equal to X 1 +X 3 +X K that was added in Step 7, or it must belong to either Column 1' or Column 2' of Step 4 and Step 5.If U = X 1 + X 3 + X K , then the corresponding known byproduct is available in Column 3 of Step 6.However, if U is available in Column 1', then the corresponding known byproduct is also in Column 1' because the elements of Column 1' are generated by performing operation with either {X 1 , X 2 } or {φ, X 1 + X 2 }.The same argument is also applicable if U is available in Column 2'.Finally, the index values used on the known byproduct are different from that of unknown byproduct since every copy of a message is assigned different indexes as per Step 7.This completes the proof.

8.
To generate query to N 2 , the following instructions must be followed.Copy the structure of N 1 as it is without the indexes of bits.For the demand X 1 , the first 2 K−2 bits are already queried in N 1 .Therefore, give the next 2 K−2 numbers as the indexes of X 1 in N 2 .Use the same index number on each message of the side information as that of the query in N 1 .From Proposition 1, the query to N 1 produces a symmetric sequence of known and unknown byproduct combinations.List the unknown and known byproduct combinations in two separate columns in the ascending order of n-tuple length following the lexicographical order and ascending order of indexes of bits.For a given byproduct combination of messages in the unknown column, an identical byproduct combination exists in the known column, however, with the difference that the index values used by the unknown combination is different from the known combination.To assign index values on the byproduct messages of N 2 , for a given byproduct combination, use the index values of the unknown combination of N 1 on the known combination of N 2 , and vice-versa.This ensures that all the byproduct messages can be indexed using one-to-one mapping between the unknown and the known column.
For the given example, in the fourth and fifth bit of Table IX, D 1 and D 2 are with side information, and therefore, they become the known byproduct bits.Similarly, in the sixth and seventh bit of the query to N 1 , D 3 and D 4 are with the demand, and therefore, they become unknown byproduct bits, as listed in Table X.To generate the query for N 2 , the structure of N 1 is copied without the indexes.The indexes of the demand are {5, 6, 7, 8} in N 2 .The indexes of B, C are maintained as they are.Based on the one-to-one mapping in Table X, D 3 and D 1 are swapped, and so are D 4 and D 2 .Finally, the query for N 2 is as shown in Table XI.This completes the code construction.
Theorem 1: For N = 2 and any K > 3, with the knowledge of side information X 2 and X 3 , the proposed code construction can retrieve 2 K−2 bits of X 1 per database by downloading 2 K−1 − 1 bits per database.Thus, the rate of the code is Proof: From the code construction, the query to N 1 and N 2 are obtained by adding 2 K−2 bits of X K to the power set structure P ({X 1 , X 2 , . . ., X K−1 })\{φ}.Since the cardinality of P ({X 1 , X 2 , . . ., X K−1 })\{φ} is 2 K−1 − 1, and every message in P ({X 1 , X 2 , . . ., X K−1 })\{φ} appears 2 K−2 times, the rate of the code is as given in (1).In the rest of the proof, we show that every bit of X 1 can be retrieved from N 1 and N 2 using the side information.If a bit of X 1 appears in the form of then this bit can be retrieved since X 1 and X 2 are known.On the other hand, if a bit of X 1 appears in the form of X 1 +f (X 2 , X 3 )+W on database N 1 (or N 2 ), where W ∈ P ({X 4 , X 5 , . . ., X K })\{φ}, then from Step 8 of the code construction this bit can also be retrieved by using the side information and the corresponding known byproduct of W , which is downloaded from N 2 (or N 1 ).This completes the proof.
Although we provided a running-example for construction with K = 4, we present another example with K = 7 in the next section.The example for K = 7 required later for the purpose of proving joint privacy.

C. Example for K=7
Consider 7 messages A, B, C, D, E, F, G. Let A be the demand and B and C be the side information known, and D, E, F and G are the byproducts.The step-by-step construction of query to retrieve A from N 1 and N 2 is shown below.Construct the skeleton power set structure P ({A, B, C, D, E, F }) for K − 1 = 6 messages.A total of 2 K−1 = 2 7) Form the query to the database N 1 by taking the union of all the elements in Column 1', Column 2' and Column 3. Add G to A+C, and update it as A+C +G.In the query to N 1 , provide distinct indexes to every copy of a message as seen below.Arranging the query in lexicographical order, we get the following query.
Table XIX: Query to N 1 with index on each message 8) The unknown byproduct combinations and the known byproduct bits of byproduct combinations from database N 1 are shown below.As per Proposition 1, observe that their structures are symmetric.To generate the query for N 2 , the structure of N 1 is copied without the indexes.The indexes of A in N 1 are from 1 to 32.For the indexes of A in N 2 , use the indexes from 33 to 64.The indexes of B, C are maintained as they are.Based on the one-to-one mapping in Table XX, unknown and known byproduct combinations are swapped.Finally, the query for N 2 is as shown in Table XXI.Overall, 64 bits of A are retrieved from both the databases by downloading a total of 126 bits.Thus, the rate of the code is 32  63 .Note that this rate is in between that of [12] and [2] which had a rate of 64 127 and 16 31 respectively for the same parameters.unknown byproduct combinations using the existing query at N 1 .Note that these byproduct combinations will involve the bits of messages other than that of X K , X i , X j .Subsequently, we pick the query submitted to N 1 , and then apply a suitable transformation between {X 1 , X 2 , X 3 } and {X 1 , X 2 , . . ., X K−1 } such that the known and the unknown byproduct combinations of the modified query are symmetric.Finally, if the known (or the unknown) byproduct combination in the query to N 1 is available as an unknown (or known) byproduct in the modified query, we swap their indexes, otherwise, we perform suitable modifications to the modified query such that the demand X K can be retrieved.Given that the proposed code construction is applicable when X 1 is the demand and X 2 , X 3 are the side information, the same set of transformations is applicable on the query to N 1 for the side information messages within the following classes, namely: (i) the side information is one of {X For each of these cases, the set of transformations that must be carried on the query submitted to N 1 is presented in the third column of Table XXII, whereas the set of manipulations that must be applied after the transformation is listed in the fourth column of Table XXII.It is important to note that after applying the transformations in the third column, a subset of the unknown and known byproduct combinations of the query to N 1 would be symmetric with that of the transformed query.On those subsets, the known and unknown byproduct combinations must be swapped similar to the case when X K plays the role of a side information.Finally, when X K is a byproduct, for a given demand and side information messages, we propose a sequence of transformations between the messages in the query to N 1 to ensure that the known and the unknown byproduct combinations match as much as that with that of the query to N 1 .For the cases when the known and unknown byproduct combinations do not match, we propose modifications on the transformed query so as to retrieve all the bits of the demand.Table XXIII lists the instructions to obtain the query at N 2 for the case when X K is one of the byproducts.
In both Table XXII and Table XXIII, the operator ⇋ represents swapping the elements on either side of the ⇋ operator, i.e., swap all the position of element in left hand side of ⇋ with the one in its right hand side in the given query.For example, A ⇋ B implies that swap all positions of A and B in the given query.If the ⇋ operator has two juxtaposed elements in either side, swap first and second element of the left side with first and second element of the right side, respectively.Similarly, the operator ⇒ represents the replacement of the codeword in the left of the ⇒ operator with the one in the right.For example, A + B ⇒ C + D implies that, replace the codeword A + B with C + D in the given query.In summary, the first row of Table XXII, shows that when X K is a side information, any demand can be retrieved with any side information pair in N 2 while keeping the query at N 1 unaltered and making necessary changes in N 2 without altering the structure of code in N 1 .Similarly, Table XXII and Table XXII show that the same is possible when X K is the demand and X K is one of the byproduct, respectively.This shows that the query obtained from the code construction algorithm in Section III can be used to retrieve any demand with any pair of side information in N 2 while keeping the query at N 1 unaltered.

A. G as a side information
Recall that the queries for K > 3, M = 2 is formed by adding 2 K−2 copies of X K (in this case G for the example in Section III-C) to P ({X 1 , X 2 , . . ., X K−1 })\{φ} (in this case the power set structure P ({A, B, C, D, E, F })\{φ}).Since G is the side information, it can be removed from downloaded bits, and therefore, the privacy proof of the code is directly dependent on the structure of P ({A, B, C, D, E, F })\{φ}.For any given demand and any side information from the remaining five messages, structure of P ({A, B, C, D, E, F })\{φ} guarantees that the known byproduct combinations and the unknown byproduct combinations in the query to N 1 are symmetric.Hence, the query submitted to N 2 follows the same structure as that of N 1 except that • The index values of the demand will change from 33 to 64 instead of 1 to 32.
• The known and unknown byproduct combinations are swapped similar to the proposed construction.
• The index values of all the side information can be retained as they were in N 1 .

B. G as the demand
When G is the demand, the two side information messages can come from {A, B, . . ., E, F } in K−1 2 = 6 2 = 15 ways.The 15 combinations are {AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF }, implying that the two letters juxtaposed next to each other are the side information messages.These 15 combinations are grouped into five different types, namely: {AC, BC}, {AB, AD, AE, AF }, {BD, BE, BF }, {CD, CE, CF } and {DE, DF, EF }.The reason for this classification is attributed to the fact that the query to N 1 was constructed assuming A as the demand and B, C as the side information, and therefore, when we have to design the new query to N 2 , it should match with the existing query to N 1 .

1) One of {AC, BC} as side information and G as the demand
When either AC or BC are the side information messages, the known and unknown byproduct combinations are symmetric.With the side information AC, the pattern of unknown and known byproduct combinations without the indexes are given in Table XXIV.Therefore, the demand G can be retrieved by swapping the indexes of the known and unknown byproduct combinations.Similar to the code construction, when submitting the query to N 2 , the indexes on side information must be the same, whereas the indexes of the demand on N 2 must be new.In the case when BC are the side information messages, the pattern of the known and the unknown byproduct combinations remains symmetric, and therefore, the query to N 2 is similar to that when AC were the side information messages.The only exception is that the role of A and B gets swapped as A becomes a byproduct and B becomes a side information.Thus, all the bits of G can be retrieved from both databases without altering the structure of query to N 1 when one of {AC, BC} are the side information messages.

UNKNOWN
2) One of {AB, AD, AE, AF } as side information and G as the demand In the case when one of {AB, AD, AE, AF } are the side information, the known and unknown byproduct combinations are almost symmetric with one exception that a singleton bit of byproduct C goes to the unknown side from the known side.For instance, when AB are the side information, the known and unknown byproduct combinations are as shown in the first two columns of Table XXV.Therefore, the query to N 2 should somehow accommodate this extra unknown bit without changing the structure.To construct a query to N 2 , we use the query to N 1 and make appropriate modifications as discussed hereafter.For exposition, we take the case when AB are the side information.We already know that the query to N 1 gives a symmetric known-unknown byproduct combinations for side information AC and demand G.We pick the query submitted to N 1 , and propose a simple transformation of B ⇋ C, i.e., swapping all positions of B with C, to generate a new query.Due to the transformation, it is clear that this new query gives a symmetric known-unknown byproduct combinations when G is the demand and AB are the side information.This pattern of unknown and known byproduct combinations are shown in the third and the fourth columns of Table XXV.We now provide modifications on the new query after the transformation B ⇋ C so that we can use as our query to N 2 .To help this cause, the original query to N 1 for K = 7 (without indexes) with numbering for each codeword is given in Table XXVI Since one bit of C moved to the unknown side from the known side in the existing query to N 1 , we pick the new query (after the transformation) and then swap the singleton bit A ( at bit number 1) and the bit C from the two-tuple sum C + G (bit number 10).This ensures that the extra unknown bit C is retrieved in bit number 1 while still being able to retrieve the demand G bit in bit number 10 since A is a side information.This sequence of operations from taking a copy of the query to N 1 , the transformation of B ⇋ C, and the final exchange in bit positions 1 and 10 are displayed in the last three columns of Table XXVII.The rows of Table XXVII indicated in red are the ones that get modified.Thus, the last column of this table forms the query to N 2 when AB are the side information.Of course, swapping of the known-unknown byproducts indexes is required wherever they are symmetric.In general, when handling the other cases of {AD, AE, AF } as the side information, the procedure for obtaining query for N 2 is similar to that when AB is the side information.We pick the query submitted to N 1 and perform the transformation of C with D, E, F for the cases AD, AE, AF , respectively.The modifications made to the new query after the transformation is the same as that when AB were the side information.Finally, the bit numbers would change according to the new position of codeword C + G.

BIT NUMBER
3) One of {BD, BE, BF } as side information and G as the demand In the case when one of {BD, BE, BF } are the side information, the known and unknown byproduct combinations are almost symmetric with two exceptions that a two-tuple sum of byproduct combination, A + C goes to the unknown side from the known side while a bit of byproduct A goes to the known side from the unknown side.For instance, when BD are the side information, the known and unknown byproduct combinations are as shown in the first two columns of Table XXVIII.Therefore, the query to N 2 should somehow accommodate these deviations without changing the structure.To construct a query to N 2 , we use the query to N 1 and make appropriate modifications as discussed hereafter.For exposition, we take the case when BD are the side information.We already know that the query to N 1 gives a symmetric known-unknown byproduct combinations for side information AC and demand G.We pick the query submitted to N 1 , and perform a transformation of A ⇋ B and C ⇋ D, to generate a new query.Due to the transformation, it is clear that this new query gives a symmetric known-unknown byproduct combinations when G is the demand and BD are the side information.This pattern of unknown and known byproduct combinations are shown in the third and the fourth columns of Table XXVIII.Since a bit of A + C is moved to the unknown side from the known side in the existing query to N 1 , there is only one known bit of A + C available to retrieve G.But the query after transformation uses 2 bits of A + C (bit number 39 and 45) to retrieve 2 bits of G as seen in the third column of the Table XXIX.Since we have one extra bit of A available, to get the extra bit of A + C, the bit C + G in bit number 12 is replaced with bit B + G. G is still retrievable since B is side information.The bit of C freed from bit number 12 along with extra bit of A contributes to the second A + C bit.Since one more B is added to the query, the singleton bit of B is replaced with demand bit G as seen in bit number 1 in the table below.Since one extra demand bit is retrieved, the bit number 38 is changed from so that the extra unknown bit of A + C is also acquired.Since one bit of C was removed in bit number 12, this change will bring back the uniform distribution of bits while keeping the structure of query at N 2 maintained as that of N 1 .This sequence of operations from taking a copy of the query to N 1 , the transformation of A ⇋ B and C ⇋ D, and the final exchange in bit positions 1, 12, and 38 are displayed in the last three columns of Table XXIX.Thus, the last column of this table forms the query to N 2 when BD are the side information.Of course, swapping of the known-unknown byproducts indexes is required wherever they are symmetric.In general, when handling the other cases of {BE, BF } as the side information, the procedure for obtaining query for N 2 is similar to that when BD were the side information.We pick the query submitted to N 1 and perform the transformation between A with B, and then between C and E (or F ), for BE (or BF ) as the side information.The modifications made to the new query after the transformation when BD were the side information can be reproduced to retrieve G from N 2 .

BIT NUMBER
4) One of {CD, CE, CF } as side information and G as the demand In the case when one of {CD, CE, CF } are the side information, there is a drastic change in the known and unknown byproduct combinations compared to the previous cases when of G is the demand.For instance, when CD are the side information the singleton bits of byproducts A and B are unknown only one time whereas they are known for three times.The two-tuple sum block combinations of byproducts except A + B are unknown only one time but are known three times whereas A + B is unknown three times but known only one time.All the three-tuple sum blocks combinations of byproducts are unknown three times while they are known only once.The four-tuple sum block combinations of byproducts are known three times while they are unknown only once.The known and unknown byproduct combinations are as shown in Table XXX.The query to N 2 should somehow accommodate these deviations without changing the structure.To construct a query to N 2 , we use the query to N 1 and make appropriate modifications as discussed hereafter.For exposition, we take the case when CD are the side information.We already know that the query to N 1 gives a symmetric known-unknown byproduct combinations for side information AC and demand G.We pick the query submitted to N 1 , and perform the transformation A ⇋ D, to generate a new query.Due to the transformation, it is clear that this new query gives a symmetric known-unknown byproduct combinations when G is the demand and CD are the side information.Since one bit of A+B moves towards the unknown side from the known side, there is only one bit of A+B available in known bits to retrieve G.But the new query after transformation uses 2 bits of A + B(bit number 39 and 43) to retrieve 2 bits of G as seen in the third column of the Table XXXI.This is rectified by swapping bit G in bit number 39 with F , thereby retrieving the extra bit of A + B + F required in the unknown whilst removing the necessity of second known bit of A + B. This increments and decrements the bit count (total number of bits of a message) of F and G respectively by 1.The extra bit of A + B required is obtained by swapping bit E in the bit number 19 with B since all the two-tuple sum combinations except A + B are unknown only once.This increments and decrements the bit count of B and E respectively by 1.Since the bit A + F is unknown only once, the bit number 29, which is the second query for A + F is changed to B + D + G to use the extra known bit B along with side information D to retrieve G.This neutralises the differences occurred in bit count of F and G and also increments the bit count of B and D and decrements the bit count of A and C respectively by 1. B has a net increment of 2 in the bit count.This increment of D and decrement of C is neutralised in bit number 56 by swapping D with C. Now the four-tuple sum bit A+B +E +F is known three times and is unknown only once.Therefore B and side information C are swapped in bit numbers 49 and 62. Bit number 49, which was initially querying a bit of A + B + E + F will now query the extra unknown bit of A + E + F .Bit number 62 was supposed to be the second query for G bit using A + E + F but all the threetuple sum bits have only one known bit.This A + E + F gets replaced by the extra known bit of A + B + E + F .The extra unknown bit of three-tuple sum bit B + E + F is retrieved by swapping G with F in bit number 40.This increments and decrements the bit count of F and G respectively by 1. Bit numbers 15 and 23 which queries the second bit A and B + E respectively are together used to retrieve the extra unknown bit of A + B + E since A and B +E are unknown only once.Bit numbers 57-59 uses three-tuple sums A+B +E, A+B +F and B + E + F to retrieve G for the second time but they are known only once.Therefore these bits are obtained by combining the bits {A, B + E}, {A, B + F } and {B, E + F } since A, B and all the two-tuple sum blocks except A + B are known one extra time.B + D of bit number 38 is replaced with A + E and B of bit number 41 is replaced with A so that the extra two-tuple sum bits, A + E and A + F will be used to retrieve G.This neutralises the bit count of B and E while increments the bit count of A by 2 and decrements the bit count of D by 1.Since bit count of A was already lagging 1 bit behind, this step increases it by 2 bits to give a net increment of 1 for A. C of bit number 33 is replaced with B and C + A of bit number 28 is replaced with B + G so that the extra two-tuple sum bits, B + E and B + F will be used to retrieve G.This neutralises the bit count of G and A while decrements the bit count of D by 1.This also increments and decrements the bit count of B and C respectively by 2. The B from bit number 10 is swapped with side information D, while D itself in bit number 1 is swapped with G.The G bit and B bit of bit numbers 14 and 2 are swapped with C to obtain the second unknown bit of F and thereby neutralising B, C and G. B is swapped with E in both bit numbers 6 and 9 to retrieve extra unknown bit E and only unknown bit of E + F .This increments and decrements the bit count of E and B respectively by 2. E in bit numbers 21 and 31 are swapped with G and B respectively and F in bit number 31 is swapped with D to retrieve extra unknown bit of F and only unknown bit of B. This neutralises D, E and F while increments B and G one time with B having a net decrement of 1 bit.G of bit number 13 is swapped with A to retrieve the only known bit of A + E while A of bit number 30 is swapped with the last known bit of B to retrieve G which neutralises the bit count of B and G and achieving the uniform distribution of bits while keeping the structure of query at N 2 maintained as that of N 1 .This sequence of operations from taking a copy of the query to N 1 , the transformation of A ⇋ D, and the final exchange in bit positions are displayed in the last three columns of Table XXXI.Thus, the last column of this table forms the query to N 2 when CD are the side information.Of course, swapping of the known-unknown byproducts indexes is required wherever they are symmetric.In general, when handling the other cases of {CE, CF } as the side information, the procedure for obtaining query for N 2 is similar to that when CD are the side information.We pick the query submitted to N 1 and perform the transformation between A with D. The modifications made to the new query after transformation when CD were the side information can be reproduced to retrieve G from N 2 .

BIT NUMBER
5) One of {DE, DF, EF } is the side information and G is the demand Compared to the previous case (Section V-B4), this case has one more singleton bit of A going to the known side from the unknown side and one bit of A + C going from the known side to the unknown side for all side information from {DE, DF, EF }.A minor modification to the previous code for N 2 seen in Table XXXI can incorporate the change required.Since the fourth column of Table XXXI gave the query to N 2 for demand G and side information CD, swap CD with the desired side information from {DE, DF, EF } to obtain a new query.Since one more A goes to the known side, all 4 bits of A are known.Therefore a side information in the bit number 15 in the fourth column of Table XXXI which queries the only unknown A can be replaced with C to make it retrieve A + C. Bit number 38 which retrieves G using A + C can be now used to retrieve G using the fourth known A bit by swapping C with the side information swapped before.Therefore all bits of G can be retrieved from both databases when one of {DE, DF, EF } is the side information without altering the structure of query from N 1 .

C. G as a byproduct
G can be byproduct in (the demand can come from {A, B, C, D, E, F = 5 2 = 10) = 6×10 = 60 ways.This is subdivided into 4 different cases.1) A is the demand When A is the demand and BC are the side information, the known and unknown byproduct combinations are symmetric.The demand can be retrieved similar to the operation mentioned in step 8 of the code construction in Section III.Now when one of {BD, BE, BF } are the side information there is a small difference from symmetry.In the unknown side one singleton bit of C and G gets removed(doesn't go to the known side) and one bit of C + G gets added to unknown(doesn't get removed from known).The structure of code remains the same as that of N 1 since the C and G of the extra unknown C + G bit can be obtained from the individual querying itself since one singleton bit of C and G is removed from unknown.Now when one of {CD, CE, CF } are the side information, the known and unknown byproduct combinations are similar to the case when G was the demand and one of {CD, CE, CF } were the side information (Section V-B4).A simple transformation of A ⇋ G in the code structure provided in Section V-B4 will provide the necessary query to N 2 .Now when one of {DE, DF, EF } is the side information, the known and unknown byproduct combinations are similar to the case of {CD, CE, CF } with a a small difference.The difference is same as {BD, BE, BF } case, in the unknown side one singleton bit of C and G gets removed(doesn't go to the known side) and one bit of C + G gets added to unknown(doesn't get removed from known).The structure of code remains the same as {CD, CE, CF } since the C and G of the extra unknown C + G bit can be obtained from the individual querying itself since one singleton bit of C and G is removed from unknown.This completes all side information cases for A as a demand and G as one of the byproducts.Therefore all cases with A as a demand is retrievable.
2) B is the demand When B is the demand, the cases are similar to A as the demand except for the side information set {DE, DF, EF }.When B is the demand and AC are the side information, the known and unknown byproduct combinations are symmetric.The demand can be retrieved similar to the operation mentioned in step 8 of the code construction in Section III.Now when one of {AD, AE, AF } are the side information there is a small difference from symmetry.In the unknown side one singleton bit of C and G gets removed(doesn't go to the known side) and one bit of C + G gets added to unknown(doesn't get removed from known).The structure of code remains the same as that of N 1 since the C and G of the extra unknown C + G bit can be obtained from the individual querying itself since one singleton bit of C and G is removed from unknown.Now when one of {CD, CE, CF } are the side information, the known and unknown byproduct combinations are similar to the case when G was the demand and one of {CD, CE, CF } were the side information (Section V-B4).A simple transformation of B ⇋ G in the code structure provided in Section V-B4 will provide the necessary query to N 2 .Now when one of {DE, DF, EF } is the side information, the known and unknown byproduct combinations are similar to the case when G was the demand and one of {DE, DF, EF } were the side information (Section V-B5) with a small difference.The unknown bit A + C is replaced with just A while the known bit A + G is replaced with A + C + G.When G was the demand and one of {CD, CE, CF } were the side information, for obtaining second known bit of A + B + E we had to combine individual bits of A and B + E as explained in Section V-B4.Similar operation was performed for A + B + C in Section V-B5.Since second A+C +G(which is analogous of A+B+C from Section V-B5) bit is available as known, we can use it directly to query demand B thereby freeing individual bits of A and C + G. Now this free known bit C + G is used in place of retrieving B with singleton G.This frees a singleton G and this along with the singleton A freed before is used to retrieve B with bit A + G (since the only known A + G was converted to A + C + G).Finally the query bit that retrieves A + C is now used to retrieve the new unknown A bit.This completes all side information cases for B as a demand and G as one of the byproducts.Therefore all cases with B as a demand is retrievable.
3) C is the demand When C is the demand and one of {AB, AD, AE, AF } are the side information, the known and unknown byproduct combinations are similar to the case when G was the demand and one of {AB, AD, AE, AF } were the side information (Section V-B2).The query for N 2 can be obtained similar to that case with a transformation of C ⇋ G.When one of {BD, BE, BF } are the side information, the known and unknown byproduct combinations are similar to the case when G was the demand and one of {BD, BE, BF } were the side information (Section V-B3).The query for N 2 can be obtained similar to that case with a transformation of C ⇋ G.When one of {DE, DF, EF } are the side information, the known and unknown byproduct combinations is similar to the one when B was the demand and one of {DE, DF, EF } were the side information (Section V-C2) with some minor differences.In the unknown side, one bit of A + G is removed while one bit of A and 2 bits of G are included.In the known side 2 bits of B + G are replaced by just B while one bit of A + B is replaced by A + B + G.For instance, when EF are the side information,the known and unknown byproduct combinations are as shown in the first two columns of Table XXXII The query to N 2 should somehow accommodate these deviations without changing the structure.For exposition, we take the case when EF are the side information.To construct a query to N 2 , we use the query to N 2 from Table XXXI and make appropriate modifications as discussed hereafter.To this query we perform the transformation CD ⇋ EF , to generate a new query.To this query we perform the modifications mentioned as in the case of G as the demand and one of {DE, DF, EF } as the side information to obtain a new query.To this query we perform a transformation B ⇋ G to obtain a new query.To this query we perform the modifications mentioned as in the case of B as the demand and one of {DE, DF, EF } as the side information to obtain a new query.Note that this query is exactly the query to N 2 for the case of B as the demand and EF are the side information.To this query we perform a transformation of B ⇋ C as seen in third column of Table XXXIII.Note that the third column of Table XXXIII is not obtained by direct transformation of B ⇋ C in N 1 .Rather, it is obtain after the execution of various steps as discussed earlier.Since one bit of A + G is removed from the unknown, bit number 7 which was initially querying A + G can now be used to query one extra bit of G. Bit number 2 which was just a combination of side information E + F can be used to query second extra bit of G. Bit number 8 and 28 are modified to query C with two extra known singleton B bits since one B + G bit in known set is removed.Since bit number 8 which queried B + G which was required to obtain third bit of unknown A + B + G is modified, the third bit of A + B + G is queried by modifying bit number 38.This modification removes the retrieval of one bit of demand using A + D. This is neutralised by modifying bit number 41 to remove query of demand with B + G (since 2 bits of B + G was removed from known set) and replace it with A + D. Since this A + B + G is queried directly, A retrieved in bit number 11 which was supposed to be the A in A + B + G can be now used to retrieve the extra singleton bit A which came to the unknown side.Finally bit number 53 which was the bit that used A + B to query C can now be used to query A + B + D + G(since one bit of A + C gets removed from known) while bit number 61 which was initially supposed to query A + B + D + G is now used to retrieve C with the extra A + B + G bit that is available in the known set.In general, when handling the other cases of {DE, DF } as the side information, the procedure for obtaining query for N 2 is similar to that when EF are the side information.We pick the query submitted to N 2 for the case of B as the demand and {DE, DF } as the side information respectively and perform the transformation between B with C. The modifications made to the new query after transformation when EF were the side information can be reproduced to retrieve B from N 2 .
4) One of {D, E, F } is the demand When one of {D, E, F } is the demand, the known and unknown byproduct combinations are symmetric if either A or B is one of the side information.The demand can be retrieved similar to the operation mentioned in step 8 of the code construction in Section III.For other side information pairs which doesn't have either A or B in side information,the known and unknown byproduct combinations are same as the previous case where C was the demand and one of {DE, DF, EF } were side information with two exceptions.One bit of A goes to

Table XII :
Power set structure with K = 7With the message G playing the role of X K as per the construction in Section III, following are the steps to add 32 copies of G to the existing elements of P ({A, B, C, D, E, F }).

Table XIV :
Outcome of Step 2 with K = 7 3) Add G to all the entries of Column 1. Leave Column 2 unaltered.sum block and the three-tuple sum block of Column 1 and place the result in Column 1'.Similarly, perform operation between {X 1 , X 2 } = {A, B} and all the entries in the singleton block and two-tuple sum block of Column 2 and place the result in Column 2'.
operation between {φ, X 1 + X 2 } = {φ, A + B} and all the entries in the two-tuple

Table XVI :
Outcome of Step 4 with K = 7 5) Since K = 7 in this case, perform operation between {A, B} and all the entries in the n-tuple sum block, for n ∈ {4, 5, . . ., K − 2} of Column 1, and append the result in Column 1'.Similarly, perform operation between {φ, A + B} and all the entries in the n-tuple sum block, for n ∈ {3, 4, . . ., K − 3}, of Column 2, and append the result in Column 2'.

Table XVII :
Outcome of Step 5 with K = 7 6) From {φ, A, B, A + B}, omit φ, and add G to B. This generates {A, B + G, A + B}.Place these elements in a new column, referred to as Column 3.

Table XXVII :
Final query to N 2 given in the fourth column.

Table XXVIII :
Pattern of unknown and known byproducts for SI BD and after A ⇋ B and C ⇋ D.

Table XXIX :
Final query to N 2 given in the fourth column.

Table XXXI :
Final query to N 2 given in the fourth column.

Table XXXII :
. Pattern of unknown and known byproducts for SI EF

Table XXXIII :
Query to N 1 , the transformation of B ⇋ C to query of N 2 for the case of B as the demand and EF are A+B+D+G Final query to N 2 given in the fourth column.