PKCHD: Towards A Probabilistic Knapsack Public-Key Cryptosystem with High Density

: By introducing an easy knapsack-type problem, a probabilistic knapsack-type public key cryptosystem (PKCHD) is proposed. It uses a Chinese remainder theorem to disguise the easy knapsack sequence. Thence, to recover the trapdoor information, the implicit attacker has to solve at least two hard number-theoretic problems, namely integer factorization and simultaneous Diophantine approximation problems. In PKCHD, the encryption function is nonlinear about the message vector. Under the re-linearization attack model, PKCHD obtains a high density and is secure against the low-density subset sum attacks, and the success probability for an attacker to recover the message vector with a single call to a lattice oracle is negligible. The infeasibilities of other attacks on the proposed PKCHD are also investigated. Meanwhile, it can use the hardest knapsack vector as the public key if its density evaluates the hardness of a knapsack instance. Furthermore, PKCHD only performs quadratic bit operations which conﬁrms the efﬁciency of encrypting a message and deciphering a given cipher-text.


Introduction
A public key cryptosystem (PKC), a concept introduced by Diffie and Hellman in their landmark paper [1], is a critical cryptographic primitive in the area of network and information security.Traditional PKCs such as RSA [2] and ElGamal [3] suffer from the same drawback of relatively low speed, which hampers the further applications of public-key cryptography and also motivates the cryptographers to design faster PKCs.Among the first public-key schemes, knapsack-type cryptosystems were invented as fast PKCs.Due to the high speed of encryption and decryption and their NP-completeness, they were considered to be the most attractive and the most promising for a long time.However, some attacks lowered the initial enthusiasm and even announced the premature death of trapdoor knapsacks.
Three reasons clarify the insecurities of the additive knapsack-type cryptosystems.Firstly, as observed in [21], these systems are basically linear.Secondly, for some of them, the trapdoor information is easy to recover.In particular, some systems use the size conditions to disguise an easy knapsack problem that make them vulnerable to simultaneous Diophantine approximation attacks [22].Thirdly, the densities of some systems are not high enough.Coster et al. [20] showed that, if the density is <0.9408 • • • , a single call to a lattice oracle will lead to polynomial time solutions.
Like the aforementioned, to design a secure knapsack-type PKC, we must ensure that • in the system, the encryption function is nonlinear about the message vector; • to disguise the easy knapsack problem, the size conditions should be excluded; • the encryption function must be non-injective.A cipher-text must have so many preimages that it is computationally infeasible for the attacker to list all the preimages.
It is believed in [23] that, if someone invents a knapsack cryptosystem that fully exploits the difficulty of the knapsack problem, with a high density and a difficult-to-discover trapdoor, then it will be a system better than those based on integer factorization and discrete logarithms.Can such a knapsack-type PKC satisfying the requirements above be developed, or, in other words, may any efficient yet straightforward constructions have been overlooked?In this paper, we will try to provide an affirmative answer.
Based on a new easy knapsack-type problem, a probabilistic knapsack public-key cryptosystem with high density (PKCHD) is proposed, which has the following properties: • PKCHD is a probabilistic knapsack-type PKC.
• The multivariate polynomial encryption function is nonlinear about the message vector, and its degrees are controlled by the randomly-chosen small integers.• The secret key is disguised via Chinese remainder theorem (CRT) rather than the size conditions.Thus, PKCHD is secure against simultaneous Diophantine approximation attacks.• The density of PKCHD is sufficiently high under the relinearization attack model.A cipher-text has too many plaintexts for the attacker to enumerate all of them in polynomial time.• If its density evaluates the hardness of a knapsack instance, PKCHD can always use the hardest knapsack vector as the public-key.• The attacker has to solve at least two hard number-theoretic problems, namely integer factorization and simultaneous Diophantine approximation problems, to recover the trapdoor information.• PKCHD is more efficient than RSA [2] and ElGamal [3].The encryption and the decryption of the system only perform O(n 2 ) bit operations.
The rest of the paper is organized as follows.In Section 2, we give some preliminaries on concepts and definitions about lattices, low-density subset sum attacks, and simultaneous Diophantine approximation.The easy knapsack-type problems are presented in Section 3, as well as several examples to make the problems more understandable.The detailed description of the proposed PKCHD is given in Section 4. Section 5 discusses the performance related issues and specifies the parameter selection.Section 6 discusses several attacks on our system including key-recovery attacks, low-density attacks, and simultaneous Diophantine approximation attacks.The security of the system is carefully examined in this section.Section 7 gives some concluding remarks.

Preliminaries
Throughout this paper, the following notations will be used: -R, the field of real numbers.
-Z, the ring of integers; Z + , the set of all positive integers.-Z n = {0, • • • , n − 1}, the complete system of least nonnegative residues modulo n; Z * n , the reduced residue system modulo n. - -|A|, the cardinality of a set A.
-|a| 2 , the binary length of an integer a.
r , the smallest integer greater than or equal to r.
Throughout this paper, we also adopt some customary parlance.For example, when we say a value is negligible, we mean that the value is a negligible function v(k) : N → [0, 1], i.e., for any polynomial p(•), there exists k 0 ≥ 1 such that v(k) < 1/p(k) for any k > k 0 .The length of a vector means its norm (L 1 , L 2 or L ∞ norm).

Lattice
A lattice is a discrete additive subgroup of R n .An equivalent definition is that a lattice consists of all integral linear combinations of a set of linearly independent vectors, i.e., In the lattice theory, three important algorithmic problems are the shortest vector problem (SVP), the closest vector problem (CVP) and the smallest basis problem (SBP).The SVP asks for the shortest non-zero vector in a given lattice L. Given a lattice L and a vector v, the CVP is to find a lattice vector s minimizing the length of the vector v − s.Then, the SBP aims at finding a lattice basis minimizing the maximum of the lengths of its elements.The problems are of special significance in complexity theory and cryptology.The SVP can be approximated by solving SBP.No polynomial-time algorithm is known for the three problems.The best polynomial time algorithms to solve the SVP achieve only slightly sub-exponential factors, and are based on the LLL algorithm [25].
Before 1996, the lattice theory only applies to cryptanalysis [14,[18][19][20][21][22][26][27][28][29], especially in breaking some knapsack cryptosystems.However, positive applications of the lattice theory in cryptology [30][31][32][33] have been witnessed in the last ten years.Some cryptographers even introduce the knapsack cryptosystems into the lattice-based cryptosystems due to the applications of lattice reduction algorithms in breaking the knapsack-type cryptosystems.For example, Sakurai [34] viewed the lattice-based cryptosystems as the revival of the knapsack trapdoors.More negative and positive applications of the lattice theory in cryptology can be found in [34,35].
The SVP and CVP are widely believed as difficult problems.However, interestingly, experimental results showed that lattice reduction algorithms behave much more nicely, especially in the low-dimensional (<300) lattices, than was expected from the worst-case proved bounds.When the dimension of a lattice is low, the lattice reduction algorithms can serve as a lattice oracle (SVP or CVP oracle).Therefore, to make a PKC invulnerable to lattice attacks, generally, the dimension is required to be sufficiently high (>500) without reducing the practicability, e.g., NTRU [32].In this paper, a new method of constructing knapsack-type cryptosystem is presented.The dimension of the lattice underlying the cryptosystem is low (about 150), and it is still secure against lattice attacks under some reasonable assumptions.

Low-Density Subset Sum Attacks
Given a cargo vector A = (a 1 , • • • , a n ) and an integer s, the 0-1 knapsack problem or more precisely the subset-sum problem is to determine a binary vector X = (x 1 , • • • , x n ) such that the scalar product of A and X is s.More generally, we define the general knapsack problem or compact knapsack problem as to find a vector Note that Equation ( 1) is linear about the variable X.However, when the linearity restriction is removed and a new function f quadratic about X is defined such that f (X) = s, i.e., XAX T = ∑ n i=1 ∑ n j=1 a ij x i x j = s, we call it a matrix cover problem.Especially when the matrix A is diagonal, A = diag(a 1 , • • • , a n ), the matrix cover problem turns out to find the vector This problem is called a quadratic knapsack problem.These problems had been used to construct knapsack-type PKCs [4,7,12].
In a compact knapsack cryptosystem, the public key of the system is a cargo vector An important characteristic of a knapsack cryptosystem is the density of the cryptosystem.A cryptosystem's density has a great effect on its security against lattice-based attacks such as low-density subset-sum attack and on whether it can be used to generate digital signatures for data origin authentication purposes.In a high density cryptosystem, almost all the messages can be signed.Informally, the density of a knapsack cryptosystem is defined as the fraction of the signable messages among all the messages [36], or the density is approximately the information rate, which is the ratio of the number of bits in plaintext message over the average number of bits in cipher-text [23].Now, we provide the formal definition of density.
Definition 1 (Density [37]).The density d of the compact knapsack problem (2) is defined by where C max = k ∑ n i=1 a i is the maximum value of the cipher-text in the system and e i = |m i | 2 = log 2 (k + 1) .
We want to give two remarks about the definition here.Firstly, log 2 (k + 1) bits are needed to represent the k + 1 integers in [0, k].Thus, we set e i = log 2 (k + 1) .Secondly, some different definitions can be found in the literature.For example, Orton [7] defined the density of equation ( 2) as However, Ref. [37] gave a smaller density definition than that given in [7].Thus, we adopt the smaller definition.
When the density d of a knapsack problem is too low, there exists an efficient reduction from the knapsack problem to the SVP over a lattice.Coster et al. [20] showed that, if d < 0.9408 • • • , which is the improvement of the earlier bound 0.6463 • • • [19], then the knapsack problem can be easily solved in a non-negligible probability with a single call to a lattice oracle.Given a knapsack system A = (a 1 , • • • , a n ) and a sum s = ∑ n i=1 a i x i ; the basic idea of the low-density attack [20] runs as follows.The attacker constructs a matrix at first using the public key, where N > √ n/2.The integral combinations of the row vectors which contains enough information for the attacker to solve a solution to s = ∑ n i=1 a i x i .The length of f is relatively small.The short vector f can be found with non-negligible probability by using lattice basis reduction algorithms.
In fact, even if we design a knapsack system with the density close to 1 and >0.9408 • • • , we cannot claim that it is secure against low-density subset sum attacks.Let the length of the message vectors be bounded by r and N(n, r) be the number of integral lattice points with length at most r in the n-dimensional sphere of radius r centered at the origin.Assume that the lattice points in the sphere have the same length and that the lattice reduction algorithms can find a lattice point in the sphere.Thus, the lattice point output by the lattice reduction algorithm is exactly the message vector with a probability Pr = 1/N(n, r).However, if the density is slightly greater than > 0.9408 • • • , N(n, r) is bounded by a constant O(1) or a polynomial function O(p(n)).In such a case, the probability Pr = 1/N(n, r) is non-negligible.This is why Omura et al. [26] showed that the low-density attack can be applied to Chor-Rivest [5] and Okamoto-Tanaka-Uchiyama cryptosystems [38].

Simultaneous Diophantine Approximation
The simultaneous Diophantine approximation problem is a basic problem in Diophantine approximation theory, which has found uses both in cryptanalysis [22,28] and cryptography [39].The problem is defined as follows.
Informally speaking, this problem asks for a set of fractions with a common and relatively small denominator approximating the given set of real numbers.There is a solution to the simultaneous Diophantine approximation problem if Q ≥ ε −n , but no efficient algorithm is found.However, when viewed as a problem involving lattices, the problem can be approximated by lattice basis reduction algorithms.Note that the integral linear combinations of the row vectors of the matrix form a lattice L. Lattice basis reduction algorithms can be applied to the lattice L to output a reduced basis.The shortest vector b in the reduced basis can be used to approximate the simultaneous Diophantine approximation problem.Since b ∈ L, there exist integers p 1 , • • • , p n and q such that b Since b is short, each p i − qr i is small, which is equivalent to saying that |r i − p i /q| is also small.Thus, {p i /q} is a set of fractions, with a common denominator q, approximating {r i }.This informal demonstration reveals the relation between lattice reduction algorithms and the simultaneous Diophantine approximation problem.

Easy Knapsack-Type Problems
Knapsack-type PKCs always follows a common design morphology [9], that is: • Construct an easy instance P[easy] from an intractable problem P.
• Shuffle P[easy] to make the resultant problem P[shuffle] seemingly-hard and indistinguishable from P. • P[shuffle] is published as the encryption key.The information s by means of which P[shuffle] is reduced to P[easy] is kept as the secret key.• The authorized receiver knowing s solves P[easy] to recover a message, whereas the task for the attacker is to solve P[shuffle].
In the knapsack public-key cryptography, several kinds of easy knapsack problems have been considered, e.g., super-increasing sequences [4], the cargo vectors used in the Graham-Shamir cryptosystem [40] and the knapsack sequences [41] used for attacking a knapsack-type cryptosystem [16] based on Diophantine equations.In this section, we propose several new easy knapsack problems, which can be viewed as the generalizations of those problems presented in [42,43].

An Easy Compact Knapsack Problem
Simultaneous compact knapsack problem is considered in this section: given the sums (s 1 , s 2 ) ∈ (Z + ) 2 and two cargo vectors Without loss of generality, in this paper, we always assume that gcd(a The following theorem gives an easy instance of the simultaneous compact knapsack problem.

Theorem 1. Given two cargo vectors
Denote by c i and d i the gcd of the first i components of A and B, respectively, i.e., c i = gcd(a can be solved in polynomial (in n) time.Furthermore, the problem has at most one solution.
. Thus, we can invert a n and obtain x n ≡ s 1 a −1 n (mod c n−1 ) .Similarly, we get x n ≡ s 2 b −1 n (mod d n−1 ).Then, we can determine a unique x n ∈ Z λ n according to CRT, where If the unique x n obtained is greater than k − 1, we can conclude that the simultaneous compact knapsack problem has no solutions.Otherwise, we determine an Suppose that the values of Note that Equation ( 6) modulo c i−1 gives It is easy to verify that gcd(a i , otherwise, the simultaneous compact knapsack problems (4) and ( 5) have no solutions.By inverting a i /c i , we obtain according to Equation ( 8) Similarly, we can deduce that problems (4) and ( 5) have no solutions or have a congruence From ( 9) and ( 10), we can determine a unique x i ∈ Z λ i according to the CRT, where 4) and ( 5) have solutions, we can determine a unique With the determined values of x 2 , • • • , x n , we get and If a 1 |r 1 and b 1 |r 2 , respectively, and the two quotients are identical, i.e., we set x 1 = r; otherwise, we deduce that the problems (4) and ( 5) have no solutions.Even if the unique values of x 1 , • • • , x n have been determined, we cannot claim that they are the solutions to (4) and (5).We need to verify whether x 1 , • • • , x n satisfy (4) and ( 5).If yes, then 4) and ( 5); otherwise, (4) and ( 5) have no solutions.
To determine each x i , we need to solve two modular equations by using CRT.This problem can be solved only by computing 2n modular equations.Thus, the simultaneous compact knapsack problems ( 4) and ( 5) can be solved in polynomial (in n) time.If the problem has solutions, each x i is uniquely determined according to CRT.Thus, the simultaneous compact knapsack problem has at most one solution.
However, a high-density knapsack-type cryptosystem can not be designed based on this easy knapsack problem.It should be generalized in some other way.

Generalization of the Simultaneous Compact Knapsack Problem
Before generalizing the simultaneous compact knapsack problem, we first introduce some useful notations to make the discussion more convenient.Given I ⊂ Z, K ⊂ Z + and J = {j = (j 1 , j 2 )|j 1 , j 2 ∈ Z + }, we use I K to denote the set {i k |i ∈ I, k ∈ K}. ∀j = (j 1 , j 2 ) ∈ J, and I K mod j represents the set {i k mod j = (i k mod j 1 , i k mod j 2 )|i ∈ I, k ∈ K}.Generally speaking, we have the following inequalities: The second "≤" holds in that it is possible for different i 1 , i 2 and k 1 , k 2 to give an identical i 1 and i k 2 2 mod j also can give rise to the same value.
Definition 3. If ∀j ∈ J, I K mod j = I K = |I| × |K|, we call set I a truly-distinguishable (T-DIST) modulo the set J under the indices of K; if ∀j ∈ J, I K mod j = I K < |I| × |K|, we call the set I pseudo-distinguishable (P-DIST) modulo the set J under the indices of K; If ∃j ∈ J, I K mod j < I K , we call the set I indistinguishable (IND) modulo the set J under the indices of K.If different (i 1 , k 1 ) and (i 2 , k 2 ) result in the same i k 1 1 ≡ i k 2 2 (mod j), we call the 3-tuples ((i 1 , k 1 ), (i 2 , k 2 ), j) a collision.In particular, the collisions in the case of P-DIST are called trivial collisions; The collisions in the case of IND are called non-trivial collisions.
Consider the definitions, in the case of T-DIST, no collisions occur.Thus, given the i k mod j, we can uniquely determine the corresponding (i, k).In the case of P-DIST, when a collision occurs, we only can determine a unique value r from i k mod j.However, there exist at least two integer pairs . Let c i and d i respectively denote the gcd of the first i components of A and B, and with x i ∈ I and k i ∈ K, can be solved in polynomial (in n) time.Furthermore, the problem has at most one solution in Proof.Note that |I|, |K| = O(1), and we can construct a table of I Modulo J under the Indices of K in polynomial time.Its query operations can be carried out in polynomial time.
The proof of the theorem is analogous to that of Theorem 1.The only distinction is: in Theorem 1, we use CRT to determine a unique x i ∈ Z λ i ; whereas, in Theorem 3, when we obtain a unique we look up the table to construct and determine a unique x i and x k i i .It can be concluded that, if the simultaneous Diophantine equations have solutions, there exists only one solution.The problem can be solved in polynomial (in n) time.
Algorithm 1 formalizes the computational method of solving the simultaneous Diophantine Equation (11).
2) If no, output "No Solutions" and exit; 3) Otherwise, determine and store the values of x n and x k n n .
If no entries in T match (l 1i , l 2i ), exit with "No Solutions"; Otherwise, determine and store the unique x i and Otherwise, output "No Solutions" and exit.
3) Solve x 1 from x k 1  1 , and store x 1 and x k 1 1 .5 Decide whether ∑ n i=1 a i x k i i = s 1 and ∑ n i=1 b i x k i i = s 2 .1) If yes, output X = (x 1 , • • • , x n ) and exit; 2) Otherwise, output "No Solutions" and exit.
The requirement "T-DIST" is not necessary.In fact, if I is P-DIST modulo the set J under the indices of K, Theorem 3 and hence Algorithm 1 also work.In such a case, each x k i i is uniquely determined, whereas some values of x i are not uniquely determined.Now, we give the following theorem.

Theorem 4. Given two cargo vectors
. Denote by c i and d i the gcd of the first i components of A and B, respectively.
with x i ∈ I and k i ∈ K, can be solved in polynomial (in n) time.Furthermore, it has at most one solution in x

The Proposed PKCHD Cryptosystem
This section derives the proposed PKCHD, a probabilistic knapsack-type cryptosystem.The public information consists of two sets I, K ⊂ Z + , |I|, |K| = O(1), and n ∈ Z + , the dimension of a message vector.Let The cryptographic algorithm consists of three sub-algorithms: key generation, encryption and decryption.

Randomly choose two cargo vectors
n , and denote by c i and d i the gcd of the first i components of A and B, respectively.Let The randomly-chosen A and B must satisfy the following condition: Randomly choose two prime numbers p = q such that Let N = pq.Compute the vector E = (e 1 , • • • , e n ) according to CRT, Compute w = e −1 n (mod N).The public encrypting vector is The secret key consists of p, q and e n .When decrypting a cipher-text, the receiver stores the values of c i , d i .

Decryption
To decipher a cipher-text c, the receiver firstly computes s p and s q by From Equations ( 12) and ( 13), we know that According to the key generation algorithm and Theorem 3, we know that Equation ( 18) are easy simultaneous Diophantine equations.The receiver can recover the message M by solving Equation (18) according to Algorithm 1.

Remarks
Even though the parameter N is not an RSA integer, the system works.The "T-DIST" requirement for the cargo vectors A and B in Con is not necessary.In fact, if A and B meet the following requirement, Con * : I is P-DIST modulo the set J under the indices of K.
The cipher-text will not be uniquely deciphered.The sender can add some redundant information to the message vector so that the receiver can pick out the exact message from all the plaintexts he deciphers.
Alternatively, both of them can agree on an encoding method by means of which the messages are encoded as plaintext vectors so that no collision occurs in all the encoded plaintext vectors.
Proof.According to Theorem 2, we only need to show that I is P-DIST modulo the set W under the indices of K, which can be proved by verifying that for every (w 1 , w 2 ) ∈ W, Take (1,51) as an example, 16,25,27,36,49,13,23,12, 37}.
In fact, J gives all the 48 integer pairs j = (u, v) with uv < 100 such that I is P-DIST modulo the set {(u, v)} under the indices of K = {1, 2, 3}.
We randomly choose two cargo vectors where According to Theorem 5, the generated vectors A and B meet the requirement of Con * .We also generate RSA integers N = pq with p, q primes and p ≥ 343 ∑ n i=1 a i , q ≥ 343 ∑ n i=1 b i .We compute the public vector F according to Equations ( 14) and (15).The message M is split into n = 150 blocks with each block m i ∈ I.When generating G = (g 1 , • • • , g n ), we should note that, if m i = 2, the corresponding g i = 2.The cipher-text is computed as The decryption is the same as Equations ( 17) and (18).However, if we compute m g i i = 4, we should decipher m i into 4 rather than 2. When confronted with some m g i i = 0 or 1, we can uniquely determine m i = 0 or 1 (Of course, g i is not uniquely determined).Thus, the message can be uniquely recovered.
One observation that we also want to point out here is that the proposed implementation can be modified as a deterministic encryption algorithm.We can develop an encoding algorithm which encodes messages into an n-dimensional vector In such a case, the decryption also works.After deciphering a cipher-text into a Y ∈ (M G ) n , the receiver can decode Y to recover the message.Of course, the modification is of no special significance both in efficiency and for security.However, it will be very useful for us to discuss the low-density attacks on our system.

Performance and Parameter Specifications
This section specifies the parameter selection, analyzes the performance related issues, i.e., the key generation, the computational complexity of the encryption and decryption algorithms, the public key size and the information rate.

Parameter Specifications
p and q should be slightly greater than µ ∑ n i=1 a i and µ ∑ n i=1 b i , respectively.When generating the public and secret keys, |I|, |K| = O(1) is not necessarily required.However, this requirement does improve the efficiency of decryption.To decrypt a cipher-text, n table-query operations are needed by the receiver.If |I|, |K| = O(1), the table only includes |I| × |K| = O(1) rows, which makes the table-query operations more efficient.In order to make the data sizes of the public and secret keys acceptable, we should require that ∀i ∈ I, k ∈ K, |i| 2 , |k| 2 = O(1).From Equations ( 12) and ( 13), we know that, if the lengths of i and k are relatively large, then the length of N and hence the lengths of the public and secret keys will be very large.It makes the proposed PKCHD system impractical.
If factoring the generated modulus N is hard, N can be published without compromising the security.However, if the sender knows N, he can encrypt a message vector M by which results in the reduction of the bit-length of the cipher-text.The public vector F can be permuted and re-indexed for increased security.
Remark.The public key size of the proposed system is about (n − 1)|N| 2 .Thus, the considerable public data size may be a burden for realizing the PKC.In fact, the public key of a PKC is stored in a certificate issued by the trusted third party.However, if the public key is too large, at the certificate, we can save a hashed value instead of the public key.To encrypt a message, the sender asks the intended receiver for the public key F. If the public key F sent by the receiver matches the hashed value stored at the receiver's certificate, the sender conceives that the vector F is exactly the public key F of the receiver and then uses it to encrypt the message.This method is suggested in [4] to compress the public key data size.

Algorithm 2. Generating the secret cargo vectors A, B
1 Given I and K, compute a set J ⊂ Z + 2 such that I is P-DIST modulo K under the indices of J.
Given I and K, the set J consisting of integer pairs can be generated by doing exhaustive computation for all the integer pairs (u, v) with the product uv bounded by a small constant (for example, 100).On the basis of Theorem 6, the generated vectors A and B really satisfy the requirement of Con * .Theorem 6. Generated by Algorithm 2, the secret cargo vectors A and B are subject to Con * .
Proof.Let c i and d i denote the gcd of the first i components of A and B, respectively.To prove that A and B are subject to Con * , we only need to show that, for each i = 2, • • • , n, the (c i−1 /c i , d i−1 /d i ) belong to the generated set J.

It is easy to verify that
Similarly, as desired.
In Algorithm 2, s i and t i should be carefully chosen to guarantee that the generated a i and b i are not too large and always have the same binary length.For example, we can choose those s i and t i with lengths and Thus, Note that p and q are slightly greater than µ ∑ n i=1 a i = 343 ∑ n i=1 a i and µ ∑ n i=1 b i = 343 ∑ n i=1 b i , and that u i v i < 100.Then, for each f i , the length is The two estimations from Equations ( 22) and ( 23) are critical for examining the effects of the low-density subset sum attacks on the implementation of the proposed cryptosystem.
To defend against multiple transmission attacks, one way is frequently changing the secret/public keys.However, since the proposed PKCHD cryptosystem requires an RSA modulus, we prefer a slight modification to it in practical use.Here, we can randomly choose two coprime numbers p and q, calculate the modulus N = pq and keep it secret.Notice that p and q are not necessarily primes.

Computational Complexity
In this section, we evaluate the computational complexity of the proposed PKCHD cryptosystem by analyzing the costs for encrypting a message and decrypting a cipher-text.Since the length of f i is bounded by O(n) (see Equation ( 22)), encrypting a message (Equation ( 16)) needs n − 1 multiplications and additions, and n exponentiations.(1) Generally, the computation for the n − 1 additions is inexpensive; (2) as pointed out earlier, the lengths of m i ∈ I and g i ∈ K are bounded by O(1).It takes O(n) bit operations to perform the n exponentiations.Naturally, the binary length of m Thus, the computational complexity for carrying out the n − 1 multiplications is given by O(n 2 ).Consequently, the computational complexity for message encryption is O(n 2 ).
To decrypt a cipher-text, the receiver should do a modular multiplication in (17) and solve the easy simultaneous Diophantine equations in (18).For the modular multiplication, O((|N| 2 ) 2 ) = O(n 2 ) bit operations are required.To solve the Diophantine Equations ( 18) for M, the receiver only needs O(n) division, subtraction, multiplication and table-query operations.Generally, the O(n) divisions and multiplications are the most costly.The bit lengths of the two integers involved in a division (or a multiplication) are respectively bounded by O(n) and O(1).Thus, the computational complexity for doing the O(n) division, subtraction, multiplication and table-query operations is O(n 2 ).Thence, the computational complexity of the decryption algorithm is also O(n 2 ).
Compared with the traditional asymmetric encryption primitives RSA [2] and El Gamal [3], the proposed PKCHD cryptosystem has improvement in efficiency.For instance, both the encryption and decryption of the proposed PKCHD cryptosystem are only of quadratic bit complexity, whereas RSA [2] and El Gamal [3] reach cubic regarding the security parameter (If the length of the encryption exponentiation e of RSA is bounded by O(1), for example, e = 3 or 2 17 + 1, the encryption only performs O(log 2  2 N) bit operations).To make the comparison more concrete, we take the encryption of the proposed implementation, for example.If n = 150, from (23), we have = 149 • 963 • 9 ≈ 1.3 × 10 6 bit operations are required to finish the encryption.The computational cost is only about 1.3 × 10 6 /1024 2 ≈ 1.24 times that of a standard RSA-1024 modular multiplication.

Information Rate
The information rate ρ of a cryptosystem is defined as the ratio of the binary length of the message to that of the cipher-text.In the proposed PKCHD cryptosystem, the information rate turns out to be ρ = 3n log 2 C max .
We need to evaluate the binary length of C max .Note that Thus, the information rate is evaluated by .
When n = 150, the information rate ρ is about 0.46.

Security Analysis
Suppose that the attacker is trying to cryptanalyze the proposed PKCHD cryptosystem.Given a ciphertext c, the attacker has two methods to attack the proposed cryptosystem.The one is to solve the cracking problem [44], that is, determine the unique message vector M = (m 1 , • • • , m n ) according to his knowledge about the public information and the enciphering function (16) such that ( 16) is satisfied for some small integers g 1 , • • • , g n .The other method is to solve the trapdoor problem, that is, reverse the basic mathematical construction of the trapdoor in a PKC.If the attacker finds an efficient algorithm for the trapdoor problem, he will also have an algorithm for the cracking problem.This section investigates the hardness for the attacker to solve the cracking problem and the trapdoor problem.To make our discussion more concrete, we only consider the attacks on the implementation described in Section 4.

Brute Force Attacks
One straightforward way to attack the system is to solve (19) To determine whether (19) has a solution, and if so, to find it, the attacker can compute all the ∑ n i=1 f i m However, note that M G = 19, so the brute force attack will take on the order of 19 n steps.A better method is to compute and sort each of the sets and then scan S 1 and S 2 , looking for a common element.If a common element s = ∑ n/2 i=1 f i m The entire procedure takes on n19 n/2 steps [24].For the proper parameters n, the attack is computationally infeasible.

Low-Density Attack
Low-density subset sum attacks only apply to a linear multivariate equation.Note that the encryption function ( 19) is nonlinear about the message vector M, so the low-density attacks cannot be used to cryptanalyze the proposed cryptosystem directly.The attacker can re-linearize the encryption function.By setting y i = m g i i ∈ M G , the attacker obtains a linear function from the encryption function (19), Notice that the problem ( 25) is not a standard compact knapsack problem.Analogous to the case of the standard knapsack problem, the known best method for solving the problem (25) seems to be the "Brute Force Attacks" given by Ref. [24].However, if the attacker wants to use low-density attacks to recover the corresponding message from a given cipher-text c, he cannot ensure that the solution to (25) belongs to M G .The attacker can solve the problem ( 25) by solving the compact knapsack problem defined below, The attacker looks forward to finding a solution Y = (y 1 , • • • , y n ) to ( 26) using the low-density attacks.Now we assume that the attacker has found such a solution Y to the compact knapsack problem (26).
If every y i ∈ M G , then the attacker can simply solve n equations y i = m g i i to recover the message M. Thus, we call the vector Y a message plaintext since it contains enough information about the message M. On the contrary, if there exists a y i ∈ M G , then Y contains little information about M and hence is useless for the attacker to decipher the cipher-text.Because the vector Y is also a solution to (26), we call the vector Y a plaintext vector.In other words, in the relinearization attack model, we view the plaintext space as {0, • • • , 343} n and the message plaintext space as (M G ) n .The difference between the two sets {0, • • • , 343} n − (M G ) n is the redundant information added to the messages, or, equivalently, we pick out some elements as the message plaintexts from the whole plaintext space.This method has been used in the Chor-Rivest [5] and Okamoto-Tanaka-Uchiyama [38] schemes.In their schemes, only those vectors whose Hamming weight is exactly h are the message plaintexts.Now, we begin to investigate the effects of the powerful low-density attacks on the security of the proposed PKCHD.When applied to a specific knapsack instance, the low-density attacks depend on the density of the knapsack.To estimate the density of the compact knapsack problem (26) using the definition of (3), we must evaluate all the e i = |m i | 2 and C max .The estimation of C max is given in (24)  and each e i = |m i | 2 = log 2 (343 + 1) = 9, so the density is If we choose n = 150, the density is about 1.38 > 0.9408 • • • .If the public vector F is evaluated via (22), we can give the lower bound of the density.According to (22) and (24), we can evaluate Thus, the density is lower-bounded by .
In the case of n = 150, the lower bound is about 1.3 > 0.9408 • • • .If we adopt the definition of density given in [7], the estimation will be ever larger.
With an appropriate choice of the parameters, the PKCHD can obtain a high density even under the worst case scenario.However, we cannot claim its security against low-density subset-sum attacks only by an argument based on density.In the knapsack-type cryptographic history, so many cryptosystems have been broken by the powerful low-density attacks.Even those cryptosystems with high density such as Chor-Rivest [5] and Okamoto-Tanaka-Uchiyama [38] schemes were also shown to be vulnerable to low-density attacks [26,27].Thus, we must be cautious to claim the proposed PKCHD's security against the low-density attacks.Other lattice-based attacks on the system also need to be well examined.If we have shown that the proposed cryptosystem is invulnerable to the known lattice attacks, we think that the security of the cryptosystem against the lattice-reduction-based attacks should be convincing.

On the Number of Plaintext Vectors That a Cipher-Text Has
The low-density subset-sum attacks always assume that the practical lattice reduction algorithms can serve as an SVP oracle at least in the cases of low-dimensional lattices.In fact, lattice reduction algorithms perform well in practice, and some current experimental records can be found in [27].Thus, we assume that lattice reduction algorithms can obtain the shortest vector in a lattice with low dimension.Meanwhile, another fact is that the encryption function of the proposed PKCHD is non-injective under the relinearization attack model.Thence, for a given cipher-text c, 0 ≤ c ≤ 343 ∑ n i=1 f i , there are many preimages Y such that ( 26) is satisfied.The lengths of the preimages are bounded by the length r of the vector Y max = (343, • • • , 343).Thus, all the preimages are the lattice points in the n-dimensional sphere of radius r centered at the origin.The number N(n, r) of the lattice points in the sphere is exactly the number of the preimages corresponding to a given cipher-text c.Furthermore, all the preimages almost have the same length.No evidence shows that the message is the shortest vector among all the plaintext vectors.In fact, Refs.[42,43] have given a small example in which the message plaintext is not the shortest vector no matter what norms are used.Thus, the lattice reduction algorithms just find a random vector in the N(n, r) preimages.We use an assumption to formalize what we have discussed.
Unif: Given a cipher-text c, the vector output by the lattice reduction algorithms is uniformly distributed over the N(n, r) plaintext vectors.
Theorem 7.Under the assumption Unif, the probability δ of the lattice algorithms finding out the message vector is negligible.
Proof.Based on the assumption Unif, we can conclude that δ = 1/N(n, r).Therefore, N(n, r) needs to be evaluated.Since Ref. [27] presented the estimation of the upper bound of N(n, r), to complete this proof, the lower bound is required.Notice that the expected number N(n, r) should be the ratio of the number of all the plaintext vectors to that of the possible cipher-texts, i.e., The evaluation of the number of the preimages that a cipher-text has is somewhat rough.However, it suffices to show the non-injectivity of the encryption function under the relinearization attack model.Thence, another way of evaluating the number of the preimages is presented.Note that any vector 26) must be a solution to the modular knapsack problem defined below, It is easy to verify that this problem is equivalent to the following simultaneous compact knapsack problem, To solve the problem, the method given in Theorem 1 is preferred.According to CRT, a unique y i modulo 100 and 0 ≤ y i ≤ 343, we can determine at least three values for each y i .Finally, there are at least 3 n vectors Y = (y 1 , • • • , y n ) for which a given cipher-text c can be determined.Of course, not all the vectors are the solutions to (26).However, even if a small amount of the vectors satisfy (26), it suffices to show that a given cipher-text c has exponentially many plaintext vectors.Now, a small example (see Table 1) is used to illustrate what we have discussed.To simplify the discussion, we set I = {0, 1, 2, 3}, K = {1, 2, 3}, and n = 9.In this case, the cipher-text c = 44190990551868 has ten preimages Ys under the relinearization attack model.However, there exists only one message plaintext vector Y 1 = (4, 27, 3, 27, 2, 27, 0, 1, 4) amongst all the ten preimages.The left nine preimages Y 2 , • • • , Y 10 are the plaintext vectors.Thus, we conclude that the low-density subset sum attack will find the message plaintext vector Y 1 with a probability δ = 1  10 under the assumption Unif.Additionally, the message plaintext vector Y 1 is not the shortest non-zero vector in the lattice involved in the low-density subset sum attack no matter what norms are used.If we use (20) to encrypt the message, the encryption function even has 237 preimages in all, which are not listed in Table 1 for space limitations.In this case, the parameter n is too small to achieve practical security.However, if a relatively large n (e.g., 150) is chosen, the number of the preimages of a given cipher-text will be very large.This is what we have claimed in the proof of Theorem 7.  (10,5,12,19,19,7,10,1,4) (5, 12, 9, 13, 9, 27, 10, 1, 4), (18, 6, 4, 25, 13, 4, (3,2,5,23,0,12,12,11,24) 6.1.4.On Reducing to the CVP Nguyen and Stern [27] found that the knapsack problem also can be reduced to the CVP.Note that the solutions of n ∑ i=1 z i f i = 0 (28) form an (n − 1)-dimensional linear space over R. Thus, the integral solutions of (28) form an (n − 1)dimensional lattice L. Given a cipher-text c, we can compute by using an extended Euclidean algorithm • • • , y n ) be a plaintext vector (not necessarily the message plaintext vector).Then the vector u In addition, u is fairly close to the vector X = (x 1 , • • • , x n ).Thus, the closest vector u ∈ L to X is expected to be found by accessing the CVP-oracle.Thus, X − u is a plaintext vector.However, we should observe that the success probability of the reduction depends on the number N(n, r) of integer points in the (n − 1)-dimensional spheres.According to Theorem 7, we can conclude that the closest vector output by the CVP-oracle is the exact message plaintext vector with a negligible probability.
Furthermore, the cryptanalysis of low-weight knapsacks [26,27] does not compromise the security of the system in which the low-weight vectors are not selected as message vectors.Until now, it is safe to claim the security of the cryptosystem against the known lattice-based attacks including low-density subset-sum attacks.

On Solving the Trapdoor Problem
When we discuss the cracking problem, we only consider the infeasibility of the attacker's solving (19) regardless of the structure of the public vector F = ( f 1 , • • • , f n ).In other words, the public vector F = ( f i , • • • , f n ) is considered to be indistinguishable from a randomly generated n-dimensional vector.However, ( 19) is only a seemingly-hard compact knapsack problem.If the public key reveals enough information for the attacker to reverse the basic mathematical construction of the trapdoor in the proposed PKCHD system, then he also can serve as an authorized receiver to decipher any cipher-text.Thus, the key recovery attacks on the cryptographic scheme also need to be carefully studied.

Simultaneous Diophantine Approximation Attack
Most of the knapsack-type cryptosystems use size conditions to disguise an easy knapsack problem.The designer randomly generates an easy knapsack problem, y = ∑ n i=1 a i x i , x i ∈ [0, 2 b − 1], and chooses a modulus m and a multiplier w, gcd(m, w) = 1.He uses the size condition m > (2 b − 1) ∑ n i=1 a i to disguise the easy cargo vector A = (a 1 , • • • , a n ) as a seemingly-hard knapsack sequence The size condition can be utilized by the simultaneous Diophantine approximation attack to obtain some useful information about (w, m).See [22,28] for more information about the relationship between the simultaneous Diophantine approximation problem and cryptanalytics.
The trapdoor of the proposed PKCHD system is disguised using CRT, which involves no size conditions.Thus, launching a simultaneous Diophantine approximation attack cannot find valuable information about the trapdoor.Even though the size condition has been used in (13), the attacker must peel off the outmost shuffle in ( 14) and (15) if he wants to launch a simultaneous Diophantine approximation attack.Unfortunately, it is also a difficult task.

Known N Attack
The exact value of N is assumed to be known by the attacker, and he wants to learn some information about the secret key.A straightforward way is to search for e n and factor N to recover the trapdoor information.To evaluate to what extent the attacker can succeed, we must decide whether the public key F = ( f 1 , • • • , f n ) and N provide the attacker with enough information to compromise the cryptosystem.If the public vector F is indistinguishable from a random-chosen n-dimensional vector F * over Z N (In fact, only the first n − 1 components of F * are randomly chosen, and the last components of F * must be 1.Otherwise, it makes no sense to say that the public vector F is indistinguishable from a random-chosen n-dimensional vector in that f n = 1).We can conclude that the public key F and N provide no useful information for the attacker to recover the secret key.In other words, it is impossible for the attacker to retrieve the integer e n ∈ Z N from a random n-dimensional vector F.
According to Algorithm 2, the only distinction between the generated a i , b i and a random integer with the same binary length is: when i is small enough, the generated a i , b i are smooth integers (i.e., it only contains small prime factors), whereas a random integer may not be.However, the public vector F is scrambled by ( 14) and (15).At the same time, the smoothness of the two vectors A and B is also disguised.After the two shuffles ( 14) and ( 15), the only distinction disappears.Then, the generated vector F must be indistinguishable from those random n-dimensional vectors over Z N .Thus, the publication of N will not affect the security of the system.On the contrary, it will reduce the length of the cipher-text and improve on the transmitting efficiency.
The attacker cannot expect to recover the secret key by searching for the integer e n to make all the a i = f i e i (mod p) and b i = f i e i (mod q) smooth simultaneously, where i < n is a relatively small integer.In fact, the best way of retrieving the trapdoor seems to factor N at first and then recover the secret vectors A and B. It is easy to verify that a n w ≡ 1(mod p) and b n w ≡ 1(mod q), where w = e −1 n (mod N).If we write a −1 n and b −1 n for the inverse of a n (mod p) and b n (mod q) respectively, and set 15) modulo p and q result in Note that the vectors A and B are of some special structure.Therefore, if the modulus N is factored, the attackers will get some useful information from the integers f ip and f ip .To examine the potential threats against the proposed PKCHD cryptosystem, we consider a stronger assumption, that is, the attacker had factorized the modulus N.

Known p and q Attack
Now, we consider such a scenario that the attacker has factorized the modulus N = pq.It is easy for the attacker to compute the f ip 's and f iq 's.Then, for the attacker, the left task is just to recover a n and b n in that other a i and b i can be easily reconstructed via In addition, the gcd's c i and d i are easily determined by using the Euclidean algorithm.Thus, the secret key is recovered.
(a) Structural attack: In fact, if the attacker obtains two pairs (a i , f ip ) and (b j , f jq ), he can determine the exact values of a n and b n .Note that a 1 and b 1 have special structures (See Algorithm 2).If the attacker wants to launch a structural attack, i.e., he does exhaustive search for all the possible integer pairs (a 1 , b 1 ).Assume n = 150, the n − 1 integer pairs (u i , v i ) are randomly chosen with repetition permitted such that (u i , v i ) ∈ J = W ∪ W T .For each i, (u i , v i ) takes 48 possible values.Then, the number of possible choices for the pair (a 1 , b 1 ) is given in the following theorem.Theorem 8.When n = 150, the number t of choices for generating (a 1 , b 1 ) is t = ( 197 47 ).
Proof.If we denote the set J = {j i |i = 1, • • • , 48} and look at each j i as an apple with color i, then we are confronted with such an "apple" probability model: choose n = 150 apples from the 48 color of apples with repetition permitted.Now, we consider a line on which 197 dots are scattered.We choose 47 dots among the 197 dots and view them as boards.We denote the 47 boards as b i , i = 1, • • • , 47 from left to right.The dots on the left of b 1 are the apples with color 1, and the dots on the right of b 47 are the apples with color 48.These dots between board i and board i + 1 are the apples with color i + 1, for i = 1, • • • , 46.Thus, every choice of the 47 board corresponds to a choice of the integer pair (a 1 , b 1 ).We have t = ( 197 47 ) choices in total.Thus, we complete the proof.Since t = ( 197 47 ) ≈ 2 1025 , apparently, it is computationally infeasible for the attacker to try all the possibilities.
(b) Simultaneous Diophantine approximation attack: Without loss of generality, we let Divide the both sides of (29) by pa n , and we obtain Note that p ≈ 343 ∑ n j=1 a j ≈ 343na i ≈ 343n √ 76.1 n−1 .Thus, we have 21), ( 23) and ( 30).If we note again that a n ≈ p/(343n), we can claim that {l i /a n } is a set of fractions with a common and relatively small denominator a n approximating the set of fractions { f ip /p}.More formally, we can assume that these fractions l i /a n are the simultaneous Diophantine approximations of the fractions f ip /p.If there is an efficient algorithm to solve the problem, the attacker can retrieve the secret vector A = (a 1 , • • • , a n ).Using a similar method, he also can recover the vector Thus, the gcd's c i and d i are also obtained.Since the simultaneous Diophantine approximation problem is a widely-believed intractable problem, no efficient algorithm has been found for it.From the discussion above, it can be deduced that, to reconstruct the secret key, the attacker must search for the modulus N and then solve two hard number-theoretic problems, namely the integer factorization problem and the simultaneous Diophantine approximation problem.This is a property shared with the scheme presented in [39].

Generating the Hardest Knapsack Instances
It is general knowledge that the whole public key cryptography is based on the computational complexity theory.We may hope that the PKCs based on proven intractability assumptions, e.g., the knapsack problem, are unbreakable super-codes.However, the fact is not the case; many PKCs based on the NP-complete problems such as the knapsack problem and the multivariate quadratic polynomials [45] had been shown insecure.Fortunately, some PKCs based on unproven mathematics' assumptions remain unbroken.Following the work of [45], this phenomena can be explained as follows.The security of some of the integer-factorization-based PKCs or the discrete-logarithm-based PKCs is based not only on the hardness of factoring an integer or solving the discrete logarithm problem defined over some cyclic groups, but also on the key generation algorithms.For example, it may not be a difficult thing for factoring a randomly-chosen large integer in that the integer always contains some small prime factors.However, the RSA system does not use such easy-to-factor integers, and it always can select the hardest factorization problem as the basis for its security.The knapsack problem is shown to be NP-complete, but the computational complexity only deals with the worst-case complexity.If the use of the hardest knapsack instances is excluded in public key cryptography, we cannot expect a knapsack cryptosystem to be an unbreakable super-code.In fact, the knapsack problems with density <0.9408 • • • is shown easy to solve [20].Many cryptographers have pointed out that the knapsack instances with density greater than 1 cannot be used in public key cryptography in that the cipher-texts are not uniquely decipherable.Relatively, the room left for designing a secure knapsack cryptosystem is narrow.Further discussion about the relationship between knapsack cryptography and computational complexity refers to [36].
Schnorr and Euchner [29] had shown that the hardest knapsack instances are those with density d ≈ 1 + log 2 (n/2)/n, which is slightly larger than 1.The density of the proposed PKCHD is given in (27) To make a knapsack problem be the hardest, the cargo vector should be indistinguishable from the random vectors.In fact, we have shown that the public vector of the PKCHD system is indistinguishable from a randomly-chosen vector.Consequently, if the hardness of a knapsack instance is evaluated by its density, the PKCHD system always can use the hardest knapsack vector as the public key.

Provable Security Remarks
In public key cryptography, two typical methods are employed for security analysis.One is the provable security theory [46], the basic idea is to reduce the security of a PKC under some attack model to a mathematical hard problem.The other is to deliver the PKC to the cryptological community for attacks that is called enumerative security.Provable security has been widely accepted as a standard method for the security analysis of PKCs.However, due to the following considerations, in this study, we do not prefer provable security results about the proposed PKCHD cryptosystem.Firstly, we should note that almost all the provably secure PKCs are constructed from the number-theoretic problems, i.e., integer factorization and discrete logarithm problems.Secondly, provable security theory is not suitable for analyzing the security of those PKCs based on NP-complete problems.These PKCs are always constructed from an easy problem.Actually, the problem of reversing the encryption functions is only a seemingly-hard rather than a truly hard problem.It makes no sense to reduce the security of a PKC to a seemingly-hard problem.Thirdly, security analysis for a newly-designed trapdoor one-way function should be centered on the estimation of the hardness of reversing the encryption function and retrieving the trapdoor information.If no efficient algorithms have been found for a long time to compromise its security, we can assume its one-wayness and begin to consider adding paddings to it to make it obtain provable security objectives.
It will be a significant theoretical result if one can prove that reversing the encryption function is equivalent to solving the mathematical problems used in constructing the PKC.However, this is an extremely tough task [44].

Conclusions
Due to the performance advantages over other cryptosystems, the knapsack cryptosystems, as a typical class of PKCs, plays an important role in the wide variety of available cryptosystems.Especially, new knapsack-type cryptographic primitives have been developed in recent years, e.g., the non-injective knapsack cryptosystems [47], the knapsack Diffie-Hellman problem [48], and elliptic curve discrete logarithm based knapsack public-key cryptosystem [49].

-
gcd(a, b), the greatest common divisor of a and b; lcm(a, b), the least common multiple of a and b. -If gcd(a, b) = 1, a −1 mod b denotes the inverse of a modulo b. -a|b, a divides b. -a mod p, the least nonnegative remainder of a divided by p. -a = b mod N means that a is the least nonnegative remainder of b modulo N; a ≡ b (mod N) means that a and b are congruent modulo N. -For (a, b) ∈ (Z + ) 2 , and an integer m, m mod (a, b) denotes the 2-tuple (m mod a, m mod b).

Theorem 2 .
A set I is T-DIST (P-DIST, or IND respectively) modulo the set J under the indices of K iff I is T-DIST (P-DIST, or IND respectively) modulo the set J T under the indices of K, where J

Algorithm 1 .
Solving the simultaneous Diophantine equations 1 Construct a table T showing that I is T-DIST modulo J under the indices of K and store the table.2 Compute

Table 1 .
The non-injectivity of the encryption function under the relinearization attack model.