PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density

Ping, Yuan; Wang, Baocang; Tian, Shengli; Zhou, Jingxian; Ma, Hui

doi:10.3390/info10020075

Open AccessArticle

PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density

by

Yuan Ping

^1,2,*

,

Baocang Wang

^1,3,*,

Shengli Tian

¹,

Jingxian Zhou

² and

Hui Ma

¹

School of Information Engineering, Xuchang University, Xuchang 461000, China

²

Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China

³

Key Laboratory of Computer Networks and Information Security, Ministry of Education, Xidian University, Xi’an 710071, China

^*

Authors to whom correspondence should be addressed.

Information 2019, 10(2), 75; https://doi.org/10.3390/info10020075

Submission received: 22 January 2019 / Revised: 14 February 2019 / Accepted: 19 February 2019 / Published: 21 February 2019

Download Versions Notes

Abstract

:

By introducing an easy knapsack-type problem, a probabilistic knapsack-type public key cryptosystem (PKCHD) is proposed. It uses a Chinese remainder theorem to disguise the easy knapsack sequence. Thence, to recover the trapdoor information, the implicit attacker has to solve at least two hard number-theoretic problems, namely integer factorization and simultaneous Diophantine approximation problems. In PKCHD, the encryption function is nonlinear about the message vector. Under the re-linearization attack model, PKCHD obtains a high density and is secure against the low-density subset sum attacks, and the success probability for an attacker to recover the message vector with a single call to a lattice oracle is negligible. The infeasibilities of other attacks on the proposed PKCHD are also investigated. Meanwhile, it can use the hardest knapsack vector as the public key if its density evaluates the hardness of a knapsack instance. Furthermore, PKCHD only performs quadratic bit operations which confirms the efficiency of encrypting a message and deciphering a given cipher-text.

Keywords:

public key cryptography; knapsack problem; low-density attack; lattice reduction

1. Introduction

A public key cryptosystem (PKC), a concept introduced by Diffie and Hellman in their landmark paper [1], is a critical cryptographic primitive in the area of network and information security. Traditional PKCs such as RSA [2] and ElGamal [3] suffer from the same drawback of relatively low speed, which hampers the further applications of public-key cryptography and also motivates the cryptographers to design faster PKCs. Among the first public-key schemes, knapsack-type cryptosystems were invented as fast PKCs. Due to the high speed of encryption and decryption and their NP-completeness, they were considered to be the most attractive and the most promising for a long time. However, some attacks lowered the initial enthusiasm and even announced the premature death of trapdoor knapsacks.

Following the first knapsack system developed by Merkle and Hellman [4], many knapsack-type cryptosystems can be found. However, only a few of them are considered to be secure, including the most resistant one, the Chor–Rivest knapsack system [5,6]. In the literature, many techniques were developed and many trapdoors were found to hide information, i.e., using the 0–1 knapsack problem [4], compact knapsack problem [7], multiplicative knapsack problem [8,9], modular knapsack problem [10,11], matrix cover problem [12], group factorization problem [13,14], polynomials over

G F (2)

[15], Diophantine equations [16], complementing sets [17], and so on. However, almost all the additive knapsack-type cryptosystems are vulnerable to low-density subset sum attacks [18,19,20], GCD attack [21], simultaneous Diophantine approximation attack [22] or orthogonal lattice attack [14]. Additionally, Refs. [23,24] show the rise and fall of knapsack cryptosystems.

Three reasons clarify the insecurities of the additive knapsack-type cryptosystems. Firstly, as observed in [21], these systems are basically linear. Secondly, for some of them, the trapdoor information is easy to recover. In particular, some systems use the size conditions to disguise an easy knapsack problem that make them vulnerable to simultaneous Diophantine approximation attacks [22]. Thirdly, the densities of some systems are not high enough. Coster et al. [20] showed that, if the density is <

0.9408 \dots

, a single call to a lattice oracle will lead to polynomial time solutions.

Like the aforementioned, to design a secure knapsack-type PKC, we must ensure that

in the system, the encryption function is nonlinear about the message vector;
to disguise the easy knapsack problem, the size conditions should be excluded;
the encryption function must be non-injective. A cipher-text must have so many preimages that it is computationally infeasible for the attacker to list all the preimages.

It is believed in [23] that, if someone invents a knapsack cryptosystem that fully exploits the difficulty of the knapsack problem, with a high density and a difficult-to-discover trapdoor, then it will be a system better than those based on integer factorization and discrete logarithms. Can such a knapsack-type PKC satisfying the requirements above be developed, or, in other words, may any efficient yet straightforward constructions have been overlooked? In this paper, we will try to provide an affirmative answer.

Based on a new easy knapsack-type problem, a probabilistic knapsack public-key cryptosystem with high density (PKCHD) is proposed, which has the following properties:

PKCHD is a probabilistic knapsack-type PKC.
The multivariate polynomial encryption function is nonlinear about the message vector, and its degrees are controlled by the randomly-chosen small integers.
The secret key is disguised via Chinese remainder theorem (CRT) rather than the size conditions. Thus, PKCHD is secure against simultaneous Diophantine approximation attacks.
The density of PKCHD is sufficiently high under the relinearization attack model. A cipher-text has too many plaintexts for the attacker to enumerate all of them in polynomial time.
If its density evaluates the hardness of a knapsack instance, PKCHD can always use the hardest knapsack vector as the public-key.
The attacker has to solve at least two hard number-theoretic problems, namely integer factorization and simultaneous Diophantine approximation problems, to recover the trapdoor information.
PKCHD is more efficient than RSA [2] and ElGamal [3]. The encryption and the decryption of the system only perform $O (n^{2})$ bit operations.

The rest of the paper is organized as follows. In Section 2, we give some preliminaries on concepts and definitions about lattices, low-density subset sum attacks, and simultaneous Diophantine approximation. The easy knapsack-type problems are presented in Section 3, as well as several examples to make the problems more understandable. The detailed description of the proposed PKCHD is given in Section 4. Section 5 discusses the performance related issues and specifies the parameter selection. Section 6 discusses several attacks on our system including key-recovery attacks, low-density attacks, and simultaneous Diophantine approximation attacks. The security of the system is carefully examined in this section. Section 7 gives some concluding remarks.

2. Preliminaries

Throughout this paper, the following notations will be used:

-: $R$ , the field of real numbers.
-: $Z$ , the ring of integers; $Z^{+}$ , the set of all positive integers.
-: $Z_{n} = {0, \dots, n - 1}$ , the complete system of least nonnegative residues modulo n; $Z_{n}^{*}$ , the reduced residue system modulo n.
-: $\gcd (a, b)$ , the greatest common divisor of a and b; $lcm (a, b)$ , the least common multiple of a and b.
-: If $\gcd (a, b) = 1$ , $a^{- 1}$ mod b denotes the inverse of a modulo b.
-: $a | b$ , a divides b.
-: a mod p, the least nonnegative remainder of a divided by p.
-: $a = b \mod N$ means that a is the least nonnegative remainder of b modulo N; $a \equiv b (\mod N)$ means that a and b are congruent modulo N.
-: For $(a, b) \in {(Z^{+})}^{2}$ , and an integer m, m mod $(a, b)$ denotes the 2-tuple (m mod a, m mod b).
-: $u \neg \equiv v (\mod (a, b))$ means that $u \mod a \neq v \mod a$ or $u \mod b \neq v \mod b$ .
-: $| A |$ , the cardinality of a set A.
-: ${| a |}_{2}$ , the binary length of an integer a.
-: $⌈ r ⌉$ , the smallest integer greater than or equal to r.

Throughout this paper, we also adopt some customary parlance. For example, when we say a value is negligible, we mean that the value is a negligible function

v (k) : N \mapsto [0, 1]

, i.e., for any polynomial

p (\cdot)

, there exists

k_{0} \geq 1

such that

v (k) < 1 / p (k)

for any

k > k_{0}

. The length of a vector means its norm (

L_{1}

,

L_{2}

or

L_{\infty}

norm).

2.1. Lattice

A lattice is a discrete additive subgroup of

R^{n}

. An equivalent definition is that a lattice consists of all integral linear combinations of a set of linearly independent vectors, i.e.,

L = \{\sum_{i = 1}^{d} z_{i} b_{i} | z_{i} \in Z\},

where

b_{1}, \dots, b_{d}

are linearly independent over

R

. Such a set of vectors

\{b_{i}\}

is called a lattice basis.

In the lattice theory, three important algorithmic problems are the shortest vector problem (SVP), the closest vector problem (CVP) and the smallest basis problem (SBP). The SVP asks for the shortest non-zero vector in a given lattice L. Given a lattice L and a vector v, the CVP is to find a lattice vector s minimizing the length of the vector

v - s

. Then, the SBP aims at finding a lattice basis minimizing the maximum of the lengths of its elements. The problems are of special significance in complexity theory and cryptology. The SVP can be approximated by solving SBP. No polynomial-time algorithm is known for the three problems. The best polynomial time algorithms to solve the SVP achieve only slightly sub-exponential factors, and are based on the LLL algorithm [25].

Before 1996, the lattice theory only applies to cryptanalysis [14,18,19,20,21,22,26,27,28,29], especially in breaking some knapsack cryptosystems. However, positive applications of the lattice theory in cryptology [30,31,32,33] have been witnessed in the last ten years. Some cryptographers even introduce the knapsack cryptosystems into the lattice-based cryptosystems due to the applications of lattice reduction algorithms in breaking the knapsack-type cryptosystems. For example, Sakurai [34] viewed the lattice-based cryptosystems as the revival of the knapsack trapdoors. More negative and positive applications of the lattice theory in cryptology can be found in [34,35].

The SVP and CVP are widely believed as difficult problems. However, interestingly, experimental results showed that lattice reduction algorithms behave much more nicely, especially in the low-dimensional (<300) lattices, than was expected from the worst-case proved bounds. When the dimension of a lattice is low, the lattice reduction algorithms can serve as a lattice oracle (SVP or CVP oracle). Therefore, to make a PKC invulnerable to lattice attacks, generally, the dimension is required to be sufficiently high (>500) without reducing the practicability, e.g., NTRU [32]. In this paper, a new method of constructing knapsack-type cryptosystem is presented. The dimension of the lattice underlying the cryptosystem is low (about 150), and it is still secure against lattice attacks under some reasonable assumptions.

2.2. Low-Density Subset Sum Attacks

Given a cargo vector

A = (a_{1}, \dots, a_{n})

and an integer s, the 0–1 knapsack problem or more precisely the subset-sum problem is to determine a binary vector

X = (x_{1}, \dots, x_{n})

such that the scalar product of A and X is s. More generally, we define the general knapsack problem or compact knapsack problem as to find a vector

X = (x_{1}, \dots, x_{n})

with

x_{i} \in [0, 2^{b} - 1]

such that

\sum_{i = 1}^{n} a_{i} x_{i} = s .

(1)

Note that Equation (1) is linear about the variable X. However, when the linearity restriction is removed and a new function f quadratic about X is defined such that

f (X) = s

, i.e.,

X A X^{T} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} a_{i j} x_{i} x_{j} = s

, we call it a matrix cover problem. Especially when the matrix A is diagonal,

A = d i a g (a_{1}, \dots, a_{n})

, the matrix cover problem turns out to find the vector

X = (x_{1}, \dots, x_{n})

subject to

\sum_{i = 1}^{n} a_{i} x_{i}^{2} = s

. This problem is called a quadratic knapsack problem. These problems had been used to construct knapsack-type PKCs [4,7,12].

In a compact knapsack cryptosystem, the public key of the system is a cargo vector

A = (a_{1}, \dots, a_{n})

. A message

M = (m_{1}, \dots, m_{n})

with

m_{i} \in [0, k]

is encrypted into

s = \sum_{i = 1}^{n} a_{i} m_{i} .

(2)

An important characteristic of a knapsack cryptosystem is the density of the cryptosystem. A cryptosystem’s density has a great effect on its security against lattice-based attacks such as low-density subset-sum attack and on whether it can be used to generate digital signatures for data origin authentication purposes. In a high density cryptosystem, almost all the messages can be signed. Informally, the density of a knapsack cryptosystem is defined as the fraction of the signable messages among all the messages [36], or the density is approximately the information rate, which is the ratio of the number of bits in plaintext message over the average number of bits in cipher-text [23]. Now, we provide the formal definition of density.

Definition 1

(Density [37]). The density d of the compact knapsack problem (2) is defined by

d = \frac{\sum_{i = 1}^{n} e_{i}}{\log_{2} C_{\max}},

(3)

where

C_{\max} = k \sum_{i = 1}^{n} a_{i}

is the maximum value of the cipher-text in the system and

e_{i} = {| m_{i} |}_{2} = ⌈ \log_{2} (k + 1) ⌉

.

We want to give two remarks about the definition here. Firstly,

⌈ \log_{2} (k + 1) ⌉

bits are needed to represent the

k + 1

integers in

[0, k]

. Thus, we set

e_{i} = ⌈ \log_{2} (k + 1) ⌉

. Secondly, some different definitions can be found in the literature. For example, Orton [7] defined the density of Equation (2) as

d = \frac{n ⌈ \log_{2} (k + 1) ⌉}{\log_{2} \max a_{i}} .

However, Ref. [37] gave a smaller density definition than that given in [7]. Thus, we adopt the smaller definition.

When the density d of a knapsack problem is too low, there exists an efficient reduction from the knapsack problem to the SVP over a lattice. Coster et al. [20] showed that, if

d < 0.9408 \dots

, which is the improvement of the earlier bound

0.6463 \dots

[19], then the knapsack problem can be easily solved in a non-negligible probability with a single call to a lattice oracle.

Given a knapsack system

A = (a_{1}, \dots, a_{n})

and a sum

s = \sum_{i = 1}^{n} a_{i} x_{i}

; the basic idea of the low-density attack [20] runs as follows. The attacker constructs a matrix

V = (\begin{matrix} 1 & 0 & \dots & 0 & N a_{1} \\ 0 & 1 & \dots & 0 & N a_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & 1 & N a_{n} \\ \frac{1}{2} & \frac{1}{2} & \dots & \frac{1}{2} & - N s \end{matrix}) = (\begin{matrix} v_{1} \\ v_{2} \\ ⋮ \\ v_{n} \\ v_{n + 1} \end{matrix})

at first using the public key, where

N > \sqrt{n} / 2

. The integral combinations of the row vectors

v_{1}, \dots, v_{n + 1}

of V form an (

n + 1

)-dimensional lattice L. Suppose that

e = (e_{1}, \dots, e_{n})

is a solution to

s = \sum_{i = 1}^{n} a_{i} x_{i}

. Note that the vector

\begin{matrix} f = (f_{1}, \dots, f_{n}, 0) = (e_{1} + \frac{1}{2}, \dots, e_{n} + \frac{1}{2}, 0) = e_{1} v_{1} + \dots + e_{n} v_{n} + v_{n + 1} \in L, \end{matrix}

which contains enough information for the attacker to solve a solution to

s = \sum_{i = 1}^{n} a_{i} x_{i}

. The length of f is relatively small. The short vector f can be found with non-negligible probability by using lattice basis reduction algorithms.

In fact, even if we design a knapsack system with the density close to 1 and >

0.9408 \dots

, we cannot claim that it is secure against low-density subset sum attacks. Let the length of the message vectors be bounded by r and

N (n, r)

be the number of integral lattice points with length at most r in the n-dimensional sphere of radius r centered at the origin. Assume that the lattice points in the sphere have the same length and that the lattice reduction algorithms can find a lattice point in the sphere. Thus, the lattice point output by the lattice reduction algorithm is exactly the message vector with a probability

P r = 1 / N (n, r)

. However, if the density is slightly greater than

> 0.9408 \dots

,

N (n, r)

is bounded by a constant

O (1)

or a polynomial function

O (p (n))

. In such a case, the probability

P r = 1 / N (n, r)

is non-negligible. This is why Omura et al. [26] showed that the low-density attack can be applied to Chor–Rivest [5] and Okamoto–Tanaka–Uchiyama cryptosystems [38].

2.3. Simultaneous Diophantine Approximation

The simultaneous Diophantine approximation problem is a basic problem in Diophantine approximation theory, which has found uses both in cryptanalysis [22,28] and cryptography [39]. The problem is defined as follows.

Definition 2

(Simultaneous Diophantine approximation). The simultaneous Diophantine approximation problem is: given

n + 1

real numbers

r_{1}, \dots, r_{n}, ϵ > 0

, and an integer

Q > 0

, find integers

p_{1}, \dots, p_{n}

and

q : 0 < q \leq Q

, such that

|r_{i} - \frac{p_{i}}{q}| \leq \frac{ϵ}{q} .

Informally speaking, this problem asks for a set of fractions with a common and relatively small denominator approximating the given set of real numbers. There is a solution to the simultaneous Diophantine approximation problem if

Q \geq ε^{- n}

, but no efficient algorithm is found. However, when viewed as a problem involving lattices, the problem can be approximated by lattice basis reduction algorithms. Note that the integral linear combinations of the row vectors of the matrix

A = (\begin{matrix} 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & 1 & 0 \\ - r_{1} & - r_{2} & \dots & - r_{n} & ϵ / Q \end{matrix}) = (\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \\ a_{n + 1} \end{matrix})

form a lattice L. Lattice basis reduction algorithms can be applied to the lattice L to output a reduced basis. The shortest vector b in the reduced basis can be used to approximate the simultaneous Diophantine approximation problem. Since

b \in L

, there exist integers

p_{1}, \dots, p_{n}

and q such that

b = \sum_{i = 1}^{n} p_{i} a_{i} + q a_{n + 1} = (p_{1} - q r_{1}, \dots, p_{n} - q r_{n}, \frac{q ϵ}{Q}) .

Since b is short, each

p_{i} - q r_{i}

is small, which is equivalent to saying that

| r_{i} - p_{i} / q |

is also small. Thus,

\{p_{i} / q\}

is a set of fractions, with a common denominator q, approximating

{r_{i}}

. This informal demonstration reveals the relation between lattice reduction algorithms and the simultaneous Diophantine approximation problem.

3. Easy Knapsack-Type Problems

Knapsack-type PKCs always follows a common design morphology [9], that is:

Construct an easy instance P[easy] from an intractable problem P.
Shuffle P[easy] to make the resultant problem P[shuffle] seemingly-hard and indistinguishable from P.
P[shuffle] is published as the encryption key. The information s by means of which P[shuffle] is reduced to P[easy] is kept as the secret key.
The authorized receiver knowing s solves P[easy] to recover a message, whereas the task for the attacker is to solve P[shuffle].

In the knapsack public-key cryptography, several kinds of easy knapsack problems have been considered, e.g., super-increasing sequences [4], the cargo vectors used in the Graham–Shamir cryptosystem [40] and the knapsack sequences [41] used for attacking a knapsack-type cryptosystem [16] based on Diophantine equations. In this section, we propose several new easy knapsack problems, which can be viewed as the generalizations of those problems presented in [42,43].

3.1. An Easy Compact Knapsack Problem

Simultaneous compact knapsack problem is considered in this section: given the sums

(s_{1}, s_{2}) \in {(Z^{+})}^{2}

and two cargo vectors

A = (a_{1}, \dots, a_{n})

,

B = (b_{1}, \dots, b_{n}) \in {(Z^{+})}^{n}

, find a vector

X = (x_{1}, \dots, x_{n})

, such that

s_{1} = \sum_{i = 1}^{n} a_{i} x_{i},

and

s_{2} = \sum_{i = 1}^{n} b_{i} x_{i} .

The problem has a solution only if

\gcd (a_{1}, \dots, a_{n}) | s_{1}

and

\gcd (b_{1}, \dots, b_{n}) | s_{2}

. Without loss of generality, in this paper, we always assume that

\gcd (a_{1}, \dots, a_{n}) = \gcd (b_{1}, \dots, b_{n}) = 1

. The following theorem gives an easy instance of the simultaneous compact knapsack problem.

Theorem 1.

Given two cargo vectors

A = (a_{1}, \dots, a_{n})

and

B = (b_{1}, \dots, b_{n})

. Denote by

c_{i}

and

d_{i}

the gcd of the first i components of A and B, respectively, i.e.,

c_{i} = \gcd (a_{1}, \dots, a_{i})

,

d_{i} = \gcd (b_{1}, \dots, b_{i})

. If

2 \leq k \leq λ_{i} = lcm (c_{i - 1} / c_{i}, d_{i - 1} / d_{i})

,

i = 2, \dots, n

, the following simultaneous compact knapsack problem

\sum_{i = 1}^{n} a_{i} x_{i} = s_{1},

(4)

\sum_{i = 1}^{n} b_{i} x_{i} = s_{2}, 0 \leq x_{i} \leq k - 1,

(5)

can be solved in polynomial (in n) time. Furthermore, the problem has at most one solution.

Proof.

Note that

c_{n - 1} | a_{i}, i = 1, \dots, n - 1

, so Equation (4) mod

c_{n - 1}

gives

a_{n} x_{n} \equiv s_{1} (\mod c_{n - 1}) .

Thus, we can invert

a_{n}

and obtain

x_{n} \equiv s_{1} a_{n}^{- 1} (\mod c_{n - 1}) .

Similarly, we get

x_{n} \equiv s_{2} b_{n}^{- 1} (\mod d_{n - 1}) .

Then, we can determine a unique

x_{n} \in Z_{λ_{n}}

according to CRT, where

λ_{i} = lcm (c_{n - 1} / c_{n}, d_{n - 1} / d_{n}) = lcm (c_{n - 1}, d_{n - 1}) \geq k

. If the unique

x_{n}

obtained is greater than

k - 1

, we can conclude that the simultaneous compact knapsack problem has no solutions. Otherwise, we determine an

x_{n}

,

0 \leq x_{n} \leq k - 1

.

Suppose that the values of

x_{i + 1}, \dots, x_{n}

,

i = n - 1, \dots, 2

have been determined, then

\sum_{j = 1}^{i} a_{j} x_{j} = s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j},

(6)

and

\sum_{j = 1}^{i} b_{j} x_{j} = s_{2} - \sum_{j = i + 1}^{n} b_{j} x_{j} .

(7)

Note that Equation (6) modulo

c_{i - 1}

gives

a_{i} x_{i} \equiv s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j} (\mod c_{i - 1}) .

It is easy to verify that

\gcd (a_{i}, c_{i - 1}) = c_{i}

and

\gcd (a_{i} / c_{i}, c_{i - 1} / c_{i}) = 1

. If

c_{i} | s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j}

, we have

\frac{a_{i}}{c_{i}} x_{i} \equiv \frac{s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j}}{c_{i}} \mod \frac{c_{i - 1}}{c_{i}};

(8)

otherwise, the simultaneous compact knapsack problems (4) and (5) have no solutions. By inverting

a_{i} / c_{i}

, we obtain according to Equation (8)

x_{i} \equiv \frac{s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j}}{c_{i}} {(\frac{a_{i}}{c_{i}})}^{- 1} \mod \frac{c_{i - 1}}{c_{i}} .

(9)

Similarly, we can deduce that problems (4) and (5) have no solutions or have a congruence

x_{i} \equiv \frac{s_{2} - \sum_{j = i + 1}^{n} b_{j} x_{j}}{d_{i}} {(\frac{b_{i}}{d_{i}})}^{- 1} \mod \frac{d_{i - 1}}{d_{i}} .

(10)

From (9) and (10), we can determine a unique

x_{i} \in Z_{λ_{i}}

according to the CRT, where

λ_{i} = lcm (c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) \geq k

. Thus, if (4) and (5) have solutions, we can determine a unique

x_{i}

:

0 \leq x_{i} \leq k - 1

.

With the determined values of

x_{2}, \dots, x_{n}

, we get

a_{1} x_{1} = s_{1} - \sum_{j = 2}^{n} a_{j} x_{j} \overset{def}{=} r_{1},

and

b_{1} x_{1} = s_{2} - \sum_{j = 2}^{n} b_{j} x_{j} \overset{def}{=} r_{2} .

If

a_{1} | r_{1}

and

b_{1} | r_{2}

, respectively, and the two quotients are identical, i.e.,

0 \leq \frac{r_{1}}{a_{1}} = \frac{r_{2}}{b_{1}} \overset{def}{=} r \leq k - 1,

we set

x_{1} = r

; otherwise, we deduce that the problems (4) and (5) have no solutions. Even if the unique values of

x_{1}, \dots, x_{n}

have been determined, we cannot claim that they are the solutions to (4) and (5). We need to verify whether

x_{1}, \dots, x_{n}

satisfy (4) and (5). If yes, then

X = (x_{1}, \dots, x_{n})

is a solution to (4) and (5); otherwise, (4) and (5) have no solutions.

To determine each

x_{i}

, we need to solve two modular equations by using CRT. This problem can be solved only by computing

2 n

modular equations. Thus, the simultaneous compact knapsack problems (4) and (5) can be solved in polynomial (in n) time. If the problem has solutions, each

x_{i}

is uniquely determined according to CRT. Thus, the simultaneous compact knapsack problem has at most one solution. □

However, a high-density knapsack-type cryptosystem can not be designed based on this easy knapsack problem. It should be generalized in some other way.

3.2. Generalization of the Simultaneous Compact Knapsack Problem

Before generalizing the simultaneous compact knapsack problem, we first introduce some useful notations to make the discussion more convenient. Given

I \subset Z

,

K \subset Z^{+}

and

J = {j = (j_{1}, j_{2}) | j_{1}, j_{2} \in Z^{+}}

, we use

I^{K}

to denote the set

{i^{k} | i \in I, k \in K}

.

\forall j = (j_{1}, j_{2}) \in J

, and

I^{K}

mod j represents the set

{i^{k} \mod j = (i^{k} \mod j_{1}, i^{k} \mod j_{2}) | i \in I, k \in K}

. Generally speaking, we have the following inequalities:

\forall j \in J, |I^{K} \mod j| \leq |I^{K}| \leq |I| \times |K| .

The second “≤" holds in that it is possible for different

i_{1}, i_{2}

and

k_{1}, k_{2}

to give an identical

i_{1}^{k_{1}} = i_{2}^{k_{2}}

, for example,

2^{2} = 4^{1}

; of course, two different

i_{1}^{k_{1}}

and

i_{2}^{k_{2}}

mod j also can give rise to the same value.

Definition 3.

If

\forall j \in J

,

|I^{K} \mod j| = |I^{K}| = |I| \times |K|

, we call set I a truly-distinguishable (T-DIST) modulo the set J under the indices of K; if

\forall j \in J

,

|I^{K} \mod j| = |I^{K}| < |I| \times |K|

, we call the set I pseudo-distinguishable (P-DIST) modulo the set J under the indices of K; If

\exists j \in J

,

|I^{K} \mod j| < |I^{K}|

, we call the set I indistinguishable (IND) modulo the set J under the indices of K. If different

(i_{1}, k_{1})

and

(i_{2}, k_{2})

result in the same

i_{1}^{k_{1}} \equiv i_{2}^{k_{2}} (\mod j)

, we call the 3-tuples

((i_{1}, k_{1}), (i_{2}, k_{2}), j)

a collision. In particular, the collisions in the case of P-DIST are called trivial collisions; The collisions in the case of IND are called non-trivial collisions.

Theorem 2.

A set I is T-DIST (P-DIST, or IND respectively) modulo the set J under the indices of K iff I is T-DIST (P-DIST, or IND respectively) modulo the set

J^{T}

under the indices of K, where

J^{T} = {(j_{2}, j_{1}) | (j_{1}, j_{2}) \in J}

.

Proof.

It suffices to note that

\forall j = (j_{1}, j_{2}) \in J

,

i_{1}^{k_{1}} \mod (j_{1}, j_{2}) = i_{2}^{k_{2}} \mod (j_{1}, j_{2})

iff

i_{1}^{k_{1}} \mod (j_{2}, j_{1}) = i_{2}^{k_{2}} \mod (j_{2}, j_{1})

. □

Consider the definitions, in the case of T-DIST, no collisions occur. Thus, given the

i^{k} \mod j

, we can uniquely determine the corresponding

(i, k)

. In the case of P-DIST, when a collision occurs, we only can determine a unique value r from

i^{k} \mod j

. However, there exist at least two integer pairs

(i_{1}, k_{1})

and

(i_{2}, k_{2})

such that

i_{1}^{k_{1}} = i_{2}^{k_{2}} = r

. A collision occurs in the case of IND iff

(i_{1}, k_{1}) \neq (i_{2}, k_{2})

,

i_{1}^{k_{1}} \neq i_{2}^{k_{2}}

and

i_{1}^{k_{1}} \mod j = i_{2}^{k_{2}} \mod j

.

Theorem 3.

Given two cargo vectors

A = (a_{1}, \dots, a_{n})

,

B = (b_{1}, \dots, b_{n})

and two sets

I, K \subset Z^{+}

with

| I |, | K | = O (1)

. Let

c_{i}

and

d_{i}

respectively denote the gcd of the first i components of A and B, and

J = {(c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) | i = 2, \dots, n}

. If I is T-DIST modulo the set J under the indices of K, the simultaneous Diophantine equations

\sum_{i = 1}^{n} a_{i} x_{i}^{k_{i}} = s_{1}, \sum_{i = 1}^{n} b_{i} x_{i}^{k_{i}} = s_{2},

(11)

with

x_{i} \in I

and

k_{i} \in K

, can be solved in polynomial (in n) time. Furthermore, the problem has at most one solution in

X = (x_{1}, \dots, x_{n})

.

Proof.

Note that

| I |, | K | = O (1)

, and we can construct a table of I Modulo J under the Indices of K in polynomial time. Its query operations can be carried out in polynomial time.

The proof of the theorem is analogous to that of Theorem 1. The only distinction is: in Theorem 1, we use CRT to determine a unique

x_{i} \in Z_{λ_{i}}

; whereas, in Theorem 3, when we obtain a unique

x_{i}^{k_{i}} \mod (c_{i - 1} / c_{i}, d_{i - 1} / d_{i})

, we look up the table to construct and determine a unique

x_{i}

and

x_{i}^{k_{i}}

.

It can be concluded that, if the simultaneous Diophantine equations have solutions, there exists only one solution. The problem can be solved in polynomial (in n) time.

Algorithm 1 formalizes the computational method of solving the simultaneous Diophantine Equation (11).

Algorithm 1. Solving the simultaneous Diophantine equations

1

Construct a table

T

showing that I is T-DIST modulo J under the indices of K and store the table.

2

Compute

l_{1 n} = s_{1} a_{n}^{- 1} (\mod c_{n - 1})

,

l_{2 n} = s_{2} b_{n}^{- 1} (\mod d_{n - 1})

.

1): Look up $T$ , decide an entry matching $(l_{1 n}, l_{2 n})$ .
2): If no, output “No Solutions" and exit;
3): Otherwise, determine and store the values of $x_{n}$ and $x_{n}^{k_{n}}$ .

3

For

i = n - 1, \dots, 2

1): Decide whether $c_{i}$ and $d_{i}$ divide $r_{1 i} = s_{1} - \sum_{j = i + 1}^{n} a_{j} x_{j}^{k_{j}}$ and $r_{2 i} = s_{2} - \sum_{j = i + 1}^{n} b_{j} x_{j}^{k_{j}}$ , respectively.
2): If no, output “No Solutions" and exit;
3): Otherwise, calculate $l_{1 i} = \frac{r_{1 i}}{c_{i}} {(\frac{a_{i}}{c_{i}})}^{- 1} \mod \frac{c_{i - 1}}{c_{i}}$ , $l_{2 i} = \frac{r_{2 i}}{d_{i}} {(\frac{b_{i}}{d_{i}})}^{- 1} \mod \frac{d_{i - 1}}{d_{i}} .$
If no entries in $T$ match $(l_{1 i}, l_{2 i})$ , exit with “No Solutions";
Otherwise, determine and store the unique $x_{i}$ and $x_{i}^{k_{i}}$ .

4

Check whether

c_{1} = a_{1}

divides

r_{11} = s_{1} - \sum_{j = 2}^{n} a_{j} x_{j}^{k_{j}}

and

d_{1} = b_{1}

divides

r_{21} = s_{2} - \sum_{j = 2}^{n} b_{j} x_{j}^{k_{j}}

and

r_{11} / a_{1} = r_{21} / b_{1}

1): If yes, set $x_{1}^{k_{1}} = \frac{r_{11}}{a_{1}} = \frac{r_{21}}{b_{1}};$
2): Otherwise, output “No Solutions" and exit.
3): Solve $x_{1}$ from $x_{1}^{k_{1}}$ , and store $x_{1}$ and $x_{1}^{k_{1}}$ .

5

Decide whether

\sum_{i = 1}^{n} a_{i} x_{i}^{k_{i}} = s_{1}

and

\sum_{i = 1}^{n} b_{i} x_{i}^{k_{i}} = s_{2}

.

1): If yes, output $X = (x_{1}, \dots, x_{n})$ and exit;
2): Otherwise, output “No Solutions" and exit.

The requirement “T-DIST" is not necessary. In fact, if I is P-DIST modulo the set J under the indices of K, Theorem 3 and hence Algorithm 1 also work. In such a case, each

x_{i}^{k_{i}}

is uniquely determined, whereas some values of

x_{i}

are not uniquely determined. Now, we give the following theorem.

Theorem 4.

Given two cargo vectors

A = (a_{1}, \dots, a_{n})

,

B = (b_{1}, \dots, b_{n})

and two sets

I, K \subset Z^{+}

with

| I |, | K | = O (1)

. Denote by

c_{i}

and

d_{i}

the gcd of the first i components of A and B, respectively. Let

J = {(c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) | i = 2, \dots, n}

. If I is P-DIST modulo the set J under the indices of K, the simultaneous Diophantine equations

\sum_{i = 1}^{n} a_{i} x_{i}^{k_{i}} = s_{1}, \sum_{i = 1}^{n} b_{i} x_{i}^{k_{i}} = s_{2},

with

x_{i} \in I

and

k_{i} \in K

, can be solved in polynomial (in n) time. Furthermore, it has at most one solution in

x_{1}^{k_{1}}, \dots, x_{n}^{k_{n}}

.

4. The Proposed PKCHD Cryptosystem

This section derives the proposed PKCHD, a probabilistic knapsack-type cryptosystem. The public information consists of two sets

I, K \subset Z^{+}

,

| I |, | K | = O (1)

, and

n \in Z^{+}

, the dimension of a message vector. Let

μ = \max i^{k}, i \in I and k \in K .

(12)

The cryptographic algorithm consists of three sub-algorithms: key generation, encryption and decryption.

4.1. Key Generation

Randomly choose two cargo vectors

A = (a_{1}, \dots, a_{n})

and

B = (b_{1}, \dots, b_{n}) \in {(Z^{+})}^{n}

, and denote by

c_{i}

and

d_{i}

the gcd of the first i components of A and B, respectively. Let

J = {(c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) | i = 2, \dots, n}

. The randomly-chosen A and B must satisfy the following condition:

Con:: I is T-DIST modulo the set J under the indices of K.

Randomly choose two prime numbers

p \neq q

such that

p \geq μ \sum_{i = 1}^{n} a_{i}, q \geq μ \sum_{i = 1}^{n} b_{i} .

(13)

Let

N = p q

. Compute the vector

E = (e_{1}, \dots, e_{n})

according to CRT,

e_{i} \equiv a_{i} (\mod p), e_{i} \equiv b_{i} (\mod q) .

(14)

Compute

w = e_{n}^{- 1} (\mod N)

. The public encrypting vector is

F = (f_{1}, \dots, f_{n}) = (f_{1}, \dots, f_{n - 1}, 1)

with each

f_{i} \equiv w e_{i} (\mod N) .

(15)

The secret key consists of p, q and

e_{n}

. When decrypting a cipher-text, the receiver stores the values of

c_{i}

,

d_{i}

.

4.2. Encryption

Let

M = (m_{1}, \dots, m_{n})

,

m_{i} \in I

be the message to be encrypted, and

G = (g_{1}, \dots, g_{n})

,

g_{i} \in K

be a randomly chosen index vector. Using the public key F, cipher-text c is computed by

c = \sum_{i = 1}^{n} f_{i} m_{i}^{g_{i}} .

(16)

4.3. Decryption

To decipher a cipher-text c, the receiver firstly computes

s_{p}

and

s_{q}

by

\{\begin{matrix} s_{p} \equiv e_{n} c \equiv \sum_{i = 1}^{n} e_{i} m_{i}^{g_{i}} \equiv \sum_{i = 1}^{n} a_{i} m_{i}^{g_{i}} (\mod p), \\ s_{q} \equiv e_{n} c \equiv \sum_{i = 1}^{n} e_{i} m_{i}^{g_{i}} \equiv \sum_{i = 1}^{n} b_{i} m_{i}^{g_{i}} (\mod q) . \end{matrix}

(17)

From Equations (12) and (13), we know that

s_{p} = \sum_{i = 1}^{n} a_{i} m_{i}^{g_{i}}, s_{q} = \sum_{i = 1}^{n} b_{i} m_{i}^{g_{i}} .

(18)

According to the key generation algorithm and Theorem 3, we know that Equation (18) are easy simultaneous Diophantine equations. The receiver can recover the message M by solving Equation (18) according to Algorithm 1.

4.4. Remarks

Even though the parameter N is not an RSA integer, the system works. The “T-DIST” requirement for the cargo vectors A and B in Con is not necessary. In fact, if A and B meet the following requirement,

Con $^{*}$ :: I is P-DIST modulo the set J under the indices of K.

The cipher-text will not be uniquely deciphered. The sender can add some redundant information to the message vector so that the receiver can pick out the exact message from all the plaintexts he deciphers. Alternatively, both of them can agree on an encoding method by means of which the messages are encoded as plaintext vectors so that no collision occurs in all the encoded plaintext vectors.

4.5. A Practical Implementation

To implement the PKCHD in real-life practice, we choose

I = {0, 1, \dots, 7}

,

K = {1, 2, 3}

and

n = 150

. Thus,

μ = m a x i^{k} = 7^{3} = 343

. Let W be a set consisting of the following pairs

(w_{1}, w_{2})

\in {(Z^{+})}^{2}

: (1,51), (1,65), (1,66), (2,33), (2,37), (2,39), (2,41), (2,43), (2,47), (3,17), (3,22), (3,25), (3,26), (3,29), (3,32), (4,23), (5,13), (5,16), (5,19), (6,11), (6,13), (7,11), (8,11), (9,11). We have the following theorem.

Theorem 5.

I is P-DIST modulo the set

J = W \cup W^{T}

under the indices of K.

Proof.

According to Theorem 2, we only need to show that I is P-DIST modulo the set W under the indices of K, which can be proved by verifying that for every

(w_{1}, w_{2}) \in W

,

| I^{K} \mod (w_{1}, w_{2}) | = | I^{K} | < | I | \times | K | .

Take (1,51) as an example,

\begin{matrix} I^{K} \mod (1, 51) = {(0, i) | i = & 0, \dots, 9, 16, 25, 27, 36, 49, 13, 23, 12, 37} . \end{matrix}

Thus,

| I^{K} \mod (1, 51) | = | I^{K} | = 19 < | I | \times | K | = 24

. □

In fact, J gives all the 48 integer pairs

j = (u, v)

with

u v < 100

such that I is P-DIST modulo the set

{(u, v)}

under the indices of

K = {1, 2, 3}

.

We randomly choose two cargo vectors

A = (a_{1}, \dots, a_{n})

and

B = (b_{1}, \dots, b_{n})

such that

(c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) \in J = W \cup W^{T}, i = 2, \dots, n,

where

c_{i} = \gcd (a_{1}, \dots, a_{i})

and

d_{i} = \gcd (b_{1}, \dots, b_{i})

. According to Theorem 5, the generated vectors A and B meet the requirement of Con

^{*}

. We also generate RSA integers

N = p q

with p, q primes and

p \geq 343 \sum_{i = 1}^{n} a_{i}, q \geq 343 \sum_{i = 1}^{n} b_{i}

. We compute the public vector F according to Equations (14) and (15).

The message M is split into

n = 150

blocks with each block

m_{i} \in I

. When generating

G = (g_{1}, \dots, g_{n})

, we should note that, if

m_{i} = 2

, the corresponding

g_{i} \neq 2

. The cipher-text is computed as

c = \sum_{i = 1}^{n} f_{i} m_{i}^{g_{i}}, m_{i} \in I and g_{i} \in K .

(19)

The decryption is the same as Equations (17) and (18). However, if we compute

m_{i}^{g_{i}} = 4

, we should decipher

m_{i}

into 4 rather than 2. When confronted with some

m_{i}^{g_{i}} = 0

or 1, we can uniquely determine

m_{i} = 0

or 1 (Of course,

g_{i}

is not uniquely determined). Thus, the message can be uniquely recovered.

One observation that we also want to point out here is that the proposed implementation can be modified as a deterministic encryption algorithm. We can develop an encoding algorithm which encodes messages into an n-dimensional vector

Y = (y_{1}, \dots, y_{n})

with every

y_{i} \in M^{G} = {m_{i}^{g_{i}} | 0 \leq m_{i} \leq 7, 1 \leq g_{i} \leq 3}

. In such a case, the decryption also works. After deciphering a cipher-text into a

Y \in {(M^{G})}^{n}

, the receiver can decode Y to recover the message. Of course, the modification is of no special significance both in efficiency and for security. However, it will be very useful for us to discuss the low-density attacks on our system.

5. Performance and Parameter Specifications

This section specifies the parameter selection, analyzes the performance related issues, i.e., the key generation, the computational complexity of the encryption and decryption algorithms, the public key size and the information rate.

5.1. Parameter Specifications

p and q should be slightly greater than

μ \sum_{i = 1}^{n} a_{i}

and

μ \sum_{i = 1}^{n} b_{i}

, respectively. When generating the public and secret keys,

| I |, | K | = O (1)

is not necessarily required. However, this requirement does improve the efficiency of decryption. To decrypt a cipher-text, n table-query operations are needed by the receiver. If

| I |, | K | = O (1)

, the table only includes

| I | \times | K | = O (1)

rows, which makes the table-query operations more efficient. In order to make the data sizes of the public and secret keys acceptable, we should require that

\forall i \in I, k \in K

,

{| i |}_{2}, {| k |}_{2} = O (1)

. From Equations (12) and (13), we know that, if the lengths of i and k are relatively large, then the length of N and hence the lengths of the public and secret keys will be very large. It makes the proposed PKCHD system impractical.

If factoring the generated modulus N is hard, N can be published without compromising the security. However, if the sender knows N, he can encrypt a message vector M by

c = \sum_{i = 1}^{n} f_{i} m_{i}^{g_{i}} (\mod N),

(20)

which results in the reduction of the bit-length of the cipher-text. The public vector F can be permuted and re-indexed for increased security.

Remark. The public key size of the proposed system is about

{(n - 1) | N |}_{2}

. Thus, the considerable public data size may be a burden for realizing the PKC. In fact, the public key of a PKC is stored in a certificate issued by the trusted third party. However, if the public key is too large, at the certificate, we can save a hashed value instead of the public key. To encrypt a message, the sender asks the intended receiver for the public key F. If the public key

F^{'}

sent by the receiver matches the hashed value stored at the receiver’s certificate, the sender conceives that the vector

F^{'}

is exactly the public key F of the receiver and then uses it to encrypt the message. This method is suggested in [4] to compress the public key data size.

5.2. On Generating the Keys

Algorithm 2 generates the secret cargo vectors

A = (a_{1}, \dots, a_{n})

and

B = (b_{1}, \dots, b_{n})

subject to Con

^{*}

.

Algorithm 2. Generating the secret cargo vectors

A, B

1

Given I and K, compute a set

J \subset {(Z^{+})}^{2}

such that I is P-DIST modulo K under the indices of J.

2

Randomly choose n−1 integer pairs

(u_{i}, v_{i}) \in J

,

i = 1, \dots, n

-1 with repetition permitted.

3

1): Randomly choose 2(n−1) numbers $s_{2}, \dots, s_{n}$ and $t_{2}, \dots, t_{n}$
with $\{\begin{matrix} \gcd (s_{i}, u_{j}) = \gcd (t_{i}, v_{j}) = 1 \\ \gcd (s_{i}, s_{i + 1}) = \gcd (t_{i}, t_{i + 1}) = 1 \end{matrix}$ for $i = 2, \dots, n$ -1
2): If $s_{1} = t_{1} = u_{n} = v_{n} = 1$ , for $i = 1, \dots, n$ , we calculate $a_{i} = s_{i} \prod_{j = i}^{n} u_{j}$ , $b_{i} = t_{i} \prod_{j = i}^{n} v_{j}$

4

Output

A = (a_{1}, \dots, a_{n})

,

B = (b_{1}, \dots, b_{n})

.

Given I and K, the set J consisting of integer pairs can be generated by doing exhaustive computation for all the integer pairs

(u, v)

with the product

u v

bounded by a small constant (for example, 100). On the basis of Theorem 6, the generated vectors A and B really satisfy the requirement of Con

^{*}

.

Theorem 6.

Generated by Algorithm 2, the secret cargo vectors A and B are subject to Con

^{*}

.

Proof.

Let

c_{i}

and

d_{i}

denote the gcd of the first i components of A and B, respectively. To prove that A and B are subject to Con

^{*}

, we only need to show that, for each

i = 2, \dots, n

, the

(c_{i - 1} / c_{i}, d_{i - 1} / d_{i})

belong to the generated set J.

It is easy to verify that

\begin{matrix} c_{i} & = \gcd (a_{1}, \dots, a_{i}) \\ = \gcd (s_{1} \prod_{j = 1}^{n} u_{j}, \dots, s_{i} \prod_{j = i}^{n} u_{j}) \\ = \gcd (\prod_{j = 1}^{n} u_{j}, \dots, \prod_{j = i}^{n} u_{j}) = \prod_{j = i}^{n} u_{j} . \end{matrix}

Similarly,

d_{i} = \gcd (b_{1}, \dots, b_{i}) = \prod_{j = i}^{n} v_{j} .

Therefore,

(\frac{c_{i - 1}}{c_{i}}, \frac{d_{i - 1}}{d_{i}}) = (u_{i - 1}, v_{i - 1}) \in J,

as desired. □

In Algorithm 2,

s_{i}

and

t_{i}

should be carefully chosen to guarantee that the generated

a_{i}

and

b_{i}

are not too large and always have the same binary length. For example, we can choose those

s_{i}

and

t_{i}

with lengths

{|s_{i}|}_{2} = {|\prod_{j = 1}^{n} u_{j}|}_{2} - {|\prod_{j = i}^{n} u_{j}|}_{2},

and

{|t_{i}|}_{2} = {|\prod_{j = 1}^{n} v_{j}|}_{2} - {|\prod_{j = i}^{n} v_{j}|}_{2} .

Thus,

| a_{i} |_{2} \approx | b_{j} |_{2} \approx | b_{1} |_{2} \approx | a_{1} |_{2} \approx {|\prod_{i = 1}^{n - 1} u_{i}|}_{2} .

(21)

Note that p and q are slightly greater than

μ \sum_{i = 1}^{n} a_{i} = 343 \sum_{i = 1}^{n} a_{i}

and

μ \sum_{i = 1}^{n} b_{i} = 343 \sum_{i = 1}^{n} b_{i}

, and that

u_{i} v_{i} < 100

. Then, for each

f_{i}

, the length is

\begin{matrix} | f_{i} |_{2} & \approx {| N |}_{2} \approx {| p |}_{2} \cdot {| q |}_{2} \approx | 343 n a_{1} |_{2} \cdot {| 343 n b_{1} |}_{2} \\ \approx | 343^{2} n^{2} a_{1} \cdot b_{1} |_{2} \approx 2 {| 343 n |}_{2} + {|\prod_{i = 1}^{n - 1} u_{i} v_{i}|}_{2} \\ {< 2 | 343 n |}_{2} + {| 100^{n - 1} |}_{2} \\ \approx {2 | 343 n |}_{2} + (n - 1) {| 100 |}_{2}, \end{matrix}

(22)

which is bounded by

O (n)

. If the selected

(u_{i}, v_{i})

is uniformly distributed over the set

J = W \cup W^{T}

, the expected value of

u_{i} \cdot v_{i}

is

u_{i} \cdot v_{i} \approx \sqrt[48]{\prod_{(u, v) \in J} u v} = \sqrt[24]{\prod_{(w_{1}, w_{2}) \in W} w_{1} w_{2}} \approx 76.1 .

Thus,

f_{i} \approx N \approx 343^{2} \cdot n^{2} \cdot {76.1}^{n - 1} .

(23)

The two estimations from Equations (22) and (23) are critical for examining the effects of the low-density subset sum attacks on the implementation of the proposed cryptosystem.

To defend against multiple transmission attacks, one way is frequently changing the secret/public keys. However, since the proposed PKCHD cryptosystem requires an RSA modulus, we prefer a slight modification to it in practical use. Here, we can randomly choose two coprime numbers p and q, calculate the modulus

N = p q

and keep it secret. Notice that p and q are not necessarily primes.

5.3. Computational Complexity

In this section, we evaluate the computational complexity of the proposed PKCHD cryptosystem by analyzing the costs for encrypting a message and decrypting a cipher-text. Since the length of

f_{i}

is bounded by

O (n)

(see Equation (22)), encrypting a message (Equation (16)) needs

n - 1

multiplications and additions, and n exponentiations. (1) Generally, the computation for the

n - 1

additions is inexpensive; (2) as pointed out earlier, the lengths of

m_{i} \in I

and

g_{i} \in K

are bounded by

O (1)

. It takes

O (n)

bit operations to perform the n exponentiations. Naturally, the binary length of

m_{i}^{g_{i}}

is also

O (1)

. (3) Meanwhile,

O (| f_{i} |_{2} \times | m_{i}^{g_{i}} |_{2}) = O (n)

bit operations are required to do the multiplication

f_{i} \times m_{i}^{g_{i}}

. Thus, the computational complexity for carrying out the

n - 1

multiplications is given by

O (n^{2})

. Consequently, the computational complexity for message encryption is

O (n^{2})

.

To decrypt a cipher-text, the receiver should do a modular multiplication in (17) and solve the easy simultaneous Diophantine equations in (18). For the modular multiplication,

{O ((| N |}_{2})^{2}) = O (n^{2})

bit operations are required. To solve the Diophantine Equations (18) for M, the receiver only needs

O (n)

division, subtraction, multiplication and table-query operations. Generally, the

O (n)

divisions and multiplications are the most costly. The bit lengths of the two integers involved in a division (or a multiplication) are respectively bounded by

O (n)

and

O (1)

. Thus, the computational complexity for doing the

O (n)

division, subtraction, multiplication and table-query operations is

O (n^{2})

. Thence, the computational complexity of the decryption algorithm is also

O (n^{2})

.

Compared with the traditional asymmetric encryption primitives RSA [2] and El Gamal [3], the proposed PKCHD cryptosystem has improvement in efficiency. For instance, both the encryption and decryption of the proposed PKCHD cryptosystem are only of quadratic bit complexity, whereas RSA [2] and El Gamal [3] reach cubic regarding the security parameter (If the length of the encryption exponentiation e of RSA is bounded by

O (1)

, for example,

e = 3

or

2^{17} + 1

, the encryption only performs

O (\log_{2}^{2} N)

bit operations). To make the comparison more concrete, we take the encryption of the proposed implementation, for example. If

n = 150

, from (23), we have

{|f_{i}|}_{2} \approx {|343^{2} \cdot n^{2} \cdot {76.1}^{n - 1}|}_{2} = 963 .

Thus, about

(n - 1) {|f_{i}|}_{2} {|m_{i}^{g_{i}}|}_{2} = 149 \cdot 963 \cdot 9 \approx 1.3 \times 10^{6}

bit operations are required to finish the encryption. The computational cost is only about

1.3 \times 10^{6} / 1024^{2} \approx 1.24

times that of a standard RSA-1024 modular multiplication.

5.4. Information Rate

The information rate

ρ

of a cryptosystem is defined as the ratio of the binary length of the message to that of the cipher-text. In the proposed PKCHD cryptosystem, the information rate turns out to be

ρ = \frac{3 n}{\log_{2} C_{\max}} .

We need to evaluate the binary length of

C_{\max}

. Note that

\begin{matrix} C_{\max} & = 343 \sum_{i = 1}^{n} f_{i} \approx 343 [(n - 1) f_{1} + 1] \\ \approx 343 (n - 1) f_{1} \approx 343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1} . \end{matrix}

(24)

Thus, the information rate is evaluated by

ρ \approx \frac{3 n}{\log_{2} [343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1}]} .

When

n = 150

, the information rate

ρ

is about 0.46.

6. Security Analysis

Suppose that the attacker is trying to cryptanalyze the proposed PKCHD cryptosystem. Given a ciphertext c, the attacker has two methods to attack the proposed cryptosystem. The one is to solve the cracking problem [44], that is, determine the unique message vector

M = (m_{1}, \dots, m_{n})

according to his knowledge about the public information and the enciphering function (16) such that (16) is satisfied for some small integers

g_{1}, \dots, g_{n}

. The other method is to solve the trapdoor problem, that is, reverse the basic mathematical construction of the trapdoor in a PKC. If the attacker finds an efficient algorithm for the trapdoor problem, he will also have an algorithm for the cracking problem. This section investigates the hardness for the attacker to solve the cracking problem and the trapdoor problem. To make our discussion more concrete, we only consider the attacks on the implementation described in Section 4.

6.1. On Solving the Cracking Problem

6.1.1. Brute Force Attacks

One straightforward way to attack the system is to solve (19) for

M = (m_{1}, \dots, m_{n})

directly. Let

M^{G} = {m_{i}^{g_{i}} | 0 \leq m_{i} \leq 7, 1 \leq g_{i} \leq 3}

. To determine whether (19) has a solution, and if so, to find it, the attacker can compute all the

\sum_{i = 1}^{n} f_{i} m_{i}^{g_{i}}

with

m_{i}^{g_{i}} \in M^{G}

. However, note that

|M^{G}| = 19

, so the brute force attack will take on the order of

19^{n}

steps. A better method is to compute and sort each of the sets

S_{1} = \{\sum_{i = 1}^{n / 2} f_{i} m_{i}^{g_{i}} | m_{i}^{g_{i}} \in M^{G}\}

and

S_{2} = \{c - \sum_{i = n / 2 + 1}^{n} f_{i} m_{i}^{g_{i}} | m_{i}^{g_{i}} \in M^{G}\},

and then scan

S_{1}

and

S_{2}

, looking for a common element. If a common element

s = \sum_{i = 1}^{n / 2} f_{i} m_{i}^{g_{i}} = c - \sum_{i = n / 2 + 1}^{n} f_{i} m_{i}^{g_{i}}

is found, then

c = \sum_{i = 1}^{n} f_{i} m_{i}^{g_{i}}

. The entire procedure takes on

n 19^{n / 2}

steps [24]. For the proper parameters n, the attack is computationally infeasible.

6.1.2. Low-Density Attack

Low-density subset sum attacks only apply to a linear multivariate equation. Note that the encryption function (19) is nonlinear about the message vector M, so the low-density attacks cannot be used to cryptanalyze the proposed cryptosystem directly. The attacker can re-linearize the encryption function. By setting

y_{i} = m_{i}^{g_{i}} \in M^{G}

, the attacker obtains a linear function from the encryption function (19),

c = \sum_{i = 1}^{n} f_{i} y_{i}, y_{i} \in M^{G} .

(25)

Notice that the problem (25) is not a standard compact knapsack problem. Analogous to the case of the standard knapsack problem, the known best method for solving the problem (25) seems to be the “Brute Force Attacks” given by Ref. [24]. However, if the attacker wants to use low-density attacks to recover the corresponding message from a given cipher-text c, he cannot ensure that the solution to (25) belongs to

M^{G}

. The attacker can solve the problem (25) by solving the compact knapsack problem defined below,

c = \sum_{i = 1}^{n} f_{i} y_{i}, 0 \leq y_{i} \leq 343 .

(26)

The attacker looks forward to finding a solution

Y = (y_{1}, \dots, y_{n})

to (26) using the low-density attacks. Now we assume that the attacker has found such a solution Y to the compact knapsack problem (26). If every

y_{i} \in M^{G}

, then the attacker can simply solve n equations

y_{i} = m_{i}^{g_{i}}

to recover the message M. Thus, we call the vector Y a message plaintext since it contains enough information about the message M. On the contrary, if there exists a

y_{i} \notin M^{G}

, then Y contains little information about M and hence is useless for the attacker to decipher the cipher-text. Because the vector Y is also a solution to (26), we call the vector Y a plaintext vector. In other words, in the relinearization attack model, we view the plaintext space as

{0, \dots, 343}^{n}

and the message plaintext space as

{(M^{G})}^{n}

. The difference between the two sets

{0, \dots, 343}^{n} - {(M^{G})}^{n}

is the redundant information added to the messages, or, equivalently, we pick out some elements as the message plaintexts from the whole plaintext space. This method has been used in the Chor–Rivest [5] and Okamoto–Tanaka–Uchiyama [38] schemes. In their schemes, only those vectors whose Hamming weight is exactly h are the message plaintexts.

Now, we begin to investigate the effects of the powerful low-density attacks on the security of the proposed PKCHD. When applied to a specific knapsack instance, the low-density attacks depend on the density of the knapsack. To estimate the density of the compact knapsack problem (26) using the definition of (3), we must evaluate all the

e_{i} = {|m_{i}|}_{2}

and

C_{\max}

. The estimation of

C_{\max}

is given in (24) and each

e_{i} = {|m_{i}|}_{2} = ⌈ \log_{2} (343 + 1) ⌉ = 9

, so the density is

d = \frac{9 n}{\log_{2} C_{\max}} \approx \frac{9 n}{\log_{2} [343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1}]} .

(27)

If we choose

n = 150

, the density is about

1.38 > 0.9408 \dots

.

If the public vector F is evaluated via (22), we can give the lower bound of the density. According to (22) and (24), we can evaluate

C_{\max} \approx 343 (n - 1) f_{1} < 343^{3} (n - 1) n^{2} 100^{n - 1} .

Thus, the density is lower-bounded by

d > \frac{9 n}{\log_{2} [343^{3} (n - 1) n^{2} 100^{n - 1}]} .

In the case of

n = 150

, the lower bound is about

1.3 > 0.9408 \dots

. If we adopt the definition of density given in [7], the estimation will be ever larger.

With an appropriate choice of the parameters, the PKCHD can obtain a high density even under the worst case scenario. However, we cannot claim its security against low-density subset-sum attacks only by an argument based on density. In the knapsack-type cryptographic history, so many cryptosystems have been broken by the powerful low-density attacks. Even those cryptosystems with high density such as Chor–Rivest [5] and Okamoto–Tanaka–Uchiyama [38] schemes were also shown to be vulnerable to low-density attacks [26,27]. Thus, we must be cautious to claim the proposed PKCHD’s security against the low-density attacks. Other lattice-based attacks on the system also need to be well examined. If we have shown that the proposed cryptosystem is invulnerable to the known lattice attacks, we think that the security of the cryptosystem against the lattice-reduction-based attacks should be convincing.

6.1.3. On the Number of Plaintext Vectors That a Cipher-Text Has

The low-density subset-sum attacks always assume that the practical lattice reduction algorithms can serve as an SVP oracle at least in the cases of low-dimensional lattices. In fact, lattice reduction algorithms perform well in practice, and some current experimental records can be found in [27]. Thus, we assume that lattice reduction algorithms can obtain the shortest vector in a lattice with low dimension. Meanwhile, another fact is that the encryption function of the proposed PKCHD is non-injective under the relinearization attack model. Thence, for a given cipher-text c,

0 \leq c \leq 343 \sum_{i = 1}^{n} f_{i}

, there are many preimages Y such that (26) is satisfied. The lengths of the preimages are bounded by the length r of the vector

Y_{\max} = (343, \dots, 343)

. Thus, all the preimages are the lattice points in the n-dimensional sphere of radius r centered at the origin. The number

N (n, r)

of the lattice points in the sphere is exactly the number of the preimages corresponding to a given cipher-text c. Furthermore, all the preimages almost have the same length. No evidence shows that the message is the shortest vector among all the plaintext vectors. In fact, Refs. [42,43] have given a small example in which the message plaintext is not the shortest vector no matter what norms are used. Thus, the lattice reduction algorithms just find a random vector in the

N (n, r)

preimages. We use an assumption to formalize what we have discussed.

Unif:: Given a cipher-text c, the vector output by the lattice reduction algorithms is uniformly distributed over the $N (n, r)$ plaintext vectors.

Theorem 7.

Under the assumption Unif, the probability δ of the lattice algorithms finding out the message vector is negligible.

Proof.

Based on the assumption Unif, we can conclude that

δ = 1 / N (n, r) .

Therefore,

N (n, r)

needs to be evaluated. Since Ref. [27] presented the estimation of the upper bound of

N (n, r)

, to complete this proof, the lower bound is required. Notice that the expected number

N (n, r)

should be the ratio of the number of all the plaintext vectors to that of the possible cipher-texts, i.e.,

\begin{matrix} N (n, r) & \approx \frac{344^{n}}{343 \sum_{i = 1}^{n} f_{i} + 1} \approx \frac{344^{n}}{C_{\max}} \\ \approx \frac{344^{n}}{343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1}} > 2^{n}, \end{matrix}

for sufficiently large n. Obviously,

δ = \frac{1}{N (n, r)} < \frac{1}{2^{n}}

is negligible. □

The evaluation of the number of the preimages that a cipher-text has is somewhat rough. However, it suffices to show the non-injectivity of the encryption function under the relinearization attack model. Thence, another way of evaluating the number of the preimages is presented. Note that any vector

Y \in {\{0, 1, \dots, \dots, 343\}}^{n}

satisfying (26) must be a solution to the modular knapsack problem defined below,

c = \sum_{i = 1}^{n} f_{i} y_{i} (\mod N), 0 \leq y_{i} \leq 343 .

It is easy to verify that this problem is equivalent to the following simultaneous compact knapsack problem,

c e_{n} (\mod p) = \sum_{i = 1}^{n} a_{i} y_{i}, c e_{n} (\mod q) = \sum_{i = 1}^{n} b_{i} y_{i} .

To solve the problem, the method given in Theorem 1 is preferred. According to CRT, a unique

y_{i}

modulo

λ_{i} = lcm (c_{i - 1} / c_{i}, d_{i - 1} / d_{i})

can be determined. However, since

λ_{i} = lcm (c_{i - 1} / c_{i}, d_{i - 1} / d_{i}) = lcm (u_{i - 1}, v_{i - 1}) \leq u_{i - 1} v_{i - 1} < 100

and

0 \leq y_{i} \leq 343

, we can determine at least three values for each

y_{i}

. Finally, there are at least

3^{n}

vectors

Y = (y_{1}, \dots, y_{n})

for which a given cipher-text c can be determined. Of course, not all the vectors are the solutions to (26). However, even if a small amount of the vectors satisfy (26), it suffices to show that a given cipher-text c has exponentially many plaintext vectors.

Now, a small example (see Table 1) is used to illustrate what we have discussed. To simplify the discussion, we set

I = \{0, 1, 2, 3\}

,

K = \{1, 2, 3\}

, and

n = 9

. In this case, the cipher-text

c = 44190990551868

has ten preimages Ys under the relinearization attack model. However, there exists only one message plaintext vector

Y_{1} = (4, 27, 3, 27, 2, 27, 0, 1, 4)

amongst all the ten preimages. The left nine preimages

Y_{2}, \dots, Y_{10}

are the plaintext vectors. Thus, we conclude that the low-density subset sum attack will find the message plaintext vector

Y_{1}

with a probability

δ = \frac{1}{10}

under the assumption Unif. Additionally, the message plaintext vector

Y_{1}

is not the shortest non-zero vector in the lattice involved in the low-density subset sum attack no matter what norms are used. If we use (20) to encrypt the message, the encryption function

c = \sum_{i = 1}^{9} f_{i} y_{i} (\mod N) = 192662536160, 0 \leq y_{i} \leq 27

even has 237 preimages in all, which are not listed in Table 1 for space limitations. In this case, the parameter n is too small to achieve practical security. However, if a relatively large n (e.g., 150) is chosen, the number of the preimages of a given cipher-text will be very large. This is what we have claimed in the proof of Theorem 7.

6.1.4. On Reducing to the CVP

Nguyen and Stern [27] found that the knapsack problem also can be reduced to the CVP. Note that the solutions of

\sum_{i = 1}^{n} z_{i} f_{i} = 0

(28)

form an

(n - 1)

-dimensional linear space over

R

. Thus, the integral solutions of (28) form an

(n - 1)

- dimensional lattice L. Given a cipher-text c, we can compute by using an extended Euclidean algorithm integers

x_{1}, \dots, x_{n}

such that

c = \sum_{i = 1}^{n} x_{i} f_{i} .

Let

Y = (y_{1}, \dots, y_{n})

be a plaintext vector (not necessarily the message plaintext vector). Then the vector

u = (x_{1} - y_{1}, \dots, x_{n} - y_{n})

belongs to L such that

\sum_{i = 1}^{n} (x_{i} - y_{i}) f_{i} = \sum_{i = 1}^{n} x_{i} f_{i} - \sum_{i = 1}^{n} y_{i} f_{i} = c - c = 0 .

In addition, u is fairly close to the vector

X = (x_{1}, \dots, x_{n})

. Thus, the closest vector

u \in L

to X is expected to be found by accessing the CVP-oracle. Thus,

X - u

is a plaintext vector. However, we should observe that the success probability of the reduction depends on the number

N (n, r)

of integer points in the

(n - 1)

-dimensional spheres. According to Theorem 7, we can conclude that the closest vector output by the CVP-oracle is the exact message plaintext vector with a negligible probability.

Furthermore, the cryptanalysis of low-weight knapsacks [26,27] does not compromise the security of the system in which the low-weight vectors are not selected as message vectors. Until now, it is safe to claim the security of the cryptosystem against the known lattice-based attacks including low-density subset-sum attacks.

6.2. On Solving the Trapdoor Problem

When we discuss the cracking problem, we only consider the infeasibility of the attacker’s solving (19) regardless of the structure of the public vector

F = (f_{1}, \dots, f_{n})

. In other words, the public vector

F = (f_{i}, \dots, f_{n})

is considered to be indistinguishable from a randomly generated n-dimensional vector. However, (19) is only a seemingly-hard compact knapsack problem. If the public key reveals enough information for the attacker to reverse the basic mathematical construction of the trapdoor in the proposed PKCHD system, then he also can serve as an authorized receiver to decipher any cipher-text. Thus, the key recovery attacks on the cryptographic scheme also need to be carefully studied.

6.2.1. Simultaneous Diophantine Approximation Attack

Most of the knapsack-type cryptosystems use size conditions to disguise an easy knapsack problem. The designer randomly generates an easy knapsack problem,

y = \sum_{i = 1}^{n} a_{i} x_{i}, x_{i} \in [0, 2^{b} - 1]

, and chooses a modulus m and a multiplier w,

\gcd (m, w) = 1

. He uses the size condition

m > (2^{b} - 1) \sum_{i = 1}^{n} a_{i}

to disguise the easy cargo vector

A = (a_{1}, \dots, a_{n})

as a seemingly-hard knapsack sequence

B = (b_{1}, \dots, b_{n})

,

b_{i} = w a_{i} (\mod m)

. The size condition can be utilized by the simultaneous Diophantine approximation attack to obtain some useful information about

(w, m)

. See [22,28] for more information about the relationship between the simultaneous Diophantine approximation problem and cryptanalytics.

The trapdoor of the proposed PKCHD system is disguised using CRT, which involves no size conditions. Thus, launching a simultaneous Diophantine approximation attack cannot find valuable information about the trapdoor. Even though the size condition has been used in (13), the attacker must peel off the outmost shuffle in (14) and (15) if he wants to launch a simultaneous Diophantine approximation attack. Unfortunately, it is also a difficult task.

6.2.2. Known N Attack

The exact value of N is assumed to be known by the attacker, and he wants to learn some information about the secret key. A straightforward way is to search for

e_{n}

and factor N to recover the trapdoor information. To evaluate to what extent the attacker can succeed, we must decide whether the public key

F = (f_{1}, \dots, f_{n})

and N provide the attacker with enough information to compromise the cryptosystem. If the public vector F is indistinguishable from a random-chosen n-dimensional vector

F^{*}

over

Z_{N}

(In fact, only the first

n - 1

components of

F^{*}

are randomly chosen, and the last components of

F^{*}

must be 1. Otherwise, it makes no sense to say that the public vector F is indistinguishable from a random-chosen n-dimensional vector in that

f_{n} = 1

). We can conclude that the public key F and N provide no useful information for the attacker to recover the secret key. In other words, it is impossible for the attacker to retrieve the integer

e_{n} \in Z_{N}

from a random n-dimensional vector F.

According to Algorithm 2, the only distinction between the generated

a_{i}

,

b_{i}

and a random integer with the same binary length is: when i is small enough, the generated

a_{i}

,

b_{i}

are smooth integers (i.e., it only contains small prime factors), whereas a random integer may not be. However, the public vector F is scrambled by (14) and (15). At the same time, the smoothness of the two vectors A and B is also disguised. After the two shuffles (14) and (15), the only distinction disappears. Then, the generated vector F must be indistinguishable from those random n-dimensional vectors over

Z_{N}

. Thus, the publication of N will not affect the security of the system. On the contrary, it will reduce the length of the cipher-text and improve on the transmitting efficiency.

The attacker cannot expect to recover the secret key by searching for the integer

e_{n}

to make all the

a_{i} = f_{i} e_{i} (\mod p)

and

b_{i} = f_{i} e_{i} (\mod q)

smooth simultaneously, where

i < n

is a relatively small integer. In fact, the best way of retrieving the trapdoor seems to factor N at first and then recover the secret vectors A and B. It is easy to verify that

a_{n} w \equiv 1 (\mod p)

and

b_{n} w \equiv 1 (\mod q)

, where

w = e_{n}^{- 1} (\mod N)

. If we write

a_{n}^{- 1}

and

b_{n}^{- 1}

for the inverse of

a_{n} (\mod p)

and

b_{n} (\mod q)

respectively, and set

f_{i p} = f_{i} (\mod p)

,

f_{i q} = f_{i} (\mod q)

,

i = 1, \dots, n - 1

, (15) modulo p and q result in

f_{i p} \equiv a_{n}^{- 1} a_{i} (\mod p), f_{i q} \equiv b_{n}^{- 1} b_{i} (\mod q) .

Note that the vectors A and B are of some special structure. Therefore, if the modulus N is factored, the attackers will get some useful information from the integers

f_{i p}

and

f_{i p}

. To examine the potential threats against the proposed PKCHD cryptosystem, we consider a stronger assumption, that is, the attacker had factorized the modulus N.

6.2.3. Known p and q Attack

Now, we consider such a scenario that the attacker has factorized the modulus

N = p q

. It is easy for the attacker to compute the

f_{i p}

’s and

f_{i q}

’s. Then, for the attacker, the left task is just to recover

a_{n}

and

b_{n}

in that other

a_{i}

and

b_{i}

can be easily reconstructed via

a_{i} \equiv a_{n} f_{i p} (\mod p), b_{i} \equiv b_{n} f_{i q} (\mod q) .

In addition, the gcd’s

c_{i}

and

d_{i}

are easily determined by using the Euclidean algorithm. Thus, the secret key is recovered.

(a) Structural attack: In fact, if the attacker obtains two pairs

(a_{i}, f_{i p})

and

(b_{j}, f_{j q})

, he can determine the exact values of

a_{n}

and

b_{n}

. Note that

a_{1}

and

b_{1}

have special structures (See Algorithm 2). If the attacker wants to launch a structural attack, i.e., he does exhaustive search for all the possible integer pairs

(a_{1}, b_{1})

. Assume

n = 150

, the

n - 1

integer pairs

(u_{i}, v_{i})

are randomly chosen with repetition permitted such that

(u_{i}, v_{i}) \in J = W \cup W^{T}

. For each i,

(u_{i}, v_{i})

takes 48 possible values. Then, the number of possible choices for the pair

(a_{1}, b_{1})

is given in the following theorem.

Theorem 8.

When

n = 150

, the number t of choices for generating

(a_{1}, b_{1})

is

t = (\binom{197}{47}) .

Proof.

If we denote the set

J = {j_{i} | i = 1, \dots, 48}

and look at each

j_{i}

as an apple with color i, then we are confronted with such an “apple” probability model: choose

n = 150

apples from the 48 color of apples with repetition permitted.

Now, we consider a line on which 197 dots are scattered. We choose 47 dots among the 197 dots and view them as boards. We denote the 47 boards as

b_{i}

,

i = 1, \dots, 47

from left to right. The dots on the left of

b_{1}

are the apples with color 1, and the dots on the right of

b_{47}

are the apples with color 48. These dots between board i and board

i + 1

are the apples with color

i + 1

, for

i = 1, \dots, 46

. Thus, every choice of the 47 board corresponds to a choice of the integer pair

(a_{1}, b_{1})

. We have

t = (\binom{197}{47})

choices in total. Thus, we complete the proof. □

Since

t = (\binom{197}{47}) \approx 2^{1025}

, apparently, it is computationally infeasible for the attacker to try all the possibilities.

(b) Simultaneous Diophantine approximation attack: Without loss of generality, we let

a_{n} f_{i p} - l_{i} p = a_{i}, i = 1, \dots, n - 1 .

(29)

Divide the both sides of (29) by

p a_{n}

, and we obtain

\frac{f_{i p}}{p} - \frac{l_{i}}{a_{n}} = \frac{a_{i}}{p a_{n}} .

(30)

Note that

p \approx 343 \sum_{j = 1}^{n} a_{j} \approx 343 n a_{i} \approx 343 n \sqrt{{76.1}^{n - 1}}

. Thus, we have

|\frac{f_{i p}}{p} - \frac{l_{i}}{a_{n}}| = \frac{a_{i}}{p a_{n}} \approx \frac{1}{p} \approx \frac{1}{343 n \sqrt{{76.1}^{n - 1}}}

from (21), (23) and (30). If we note again that

a_{n} \approx p / (343 n)

, we can claim that

{l_{i} / a_{n}}

is a set of fractions with a common and relatively small denominator

a_{n}

approximating the set of fractions

{f_{i p} / p}

. More formally, we can assume that these fractions

l_{i} / a_{n}

are the simultaneous Diophantine approximations of the fractions

f_{i p} / p

. If there is an efficient algorithm to solve the problem, the attacker can retrieve the secret vector

A = (a_{1}, \dots, a_{n})

. Using a similar method, he also can recover the vector

B = (b_{1}, \dots, b_{n})

. Thus, the gcd’s

c_{i}

and

d_{i}

are also obtained.

Since the simultaneous Diophantine approximation problem is a widely-believed intractable problem, no efficient algorithm has been found for it. From the discussion above, it can be deduced that, to reconstruct the secret key, the attacker must search for the modulus N and then solve two hard number-theoretic problems, namely the integer factorization problem and the simultaneous Diophantine approximation problem. This is a property shared with the scheme presented in [39].

6.3. Generating the Hardest Knapsack Instances

It is general knowledge that the whole public key cryptography is based on the computational complexity theory. We may hope that the PKCs based on proven intractability assumptions, e.g., the knapsack problem, are unbreakable super-codes. However, the fact is not the case; many PKCs based on the NP-complete problems such as the knapsack problem and the multivariate quadratic polynomials [45] had been shown insecure. Fortunately, some PKCs based on unproven mathematics’ assumptions remain unbroken. Following the work of [45], this phenomena can be explained as follows. The security of some of the integer-factorization-based PKCs or the discrete-logarithm-based PKCs is based not only on the hardness of factoring an integer or solving the discrete logarithm problem defined over some cyclic groups, but also on the key generation algorithms. For example, it may not be a difficult thing for factoring a randomly-chosen large integer in that the integer always contains some small prime factors. However, the RSA system does not use such easy-to-factor integers, and it always can select the hardest factorization problem as the basis for its security. The knapsack problem is shown to be NP-complete, but the computational complexity only deals with the worst-case complexity. If the use of the hardest knapsack instances is excluded in public key cryptography, we cannot expect a knapsack cryptosystem to be an unbreakable super-code. In fact, the knapsack problems with density <

0.9408 \dots

is shown easy to solve [20]. Many cryptographers have pointed out that the knapsack instances with density greater than 1 cannot be used in public key cryptography in that the cipher-texts are not uniquely decipherable. Relatively, the room left for designing a secure knapsack cryptosystem is narrow. Further discussion about the relationship between knapsack cryptography and computational complexity refers to [36].

Schnorr and Euchner [29] had shown that the hardest knapsack instances are those with density

d \approx 1 + \log_{2} (n / 2) / n

, which is slightly larger than 1. The density of the proposed PKCHD is given in (27). When n approaches infinity,

lim_{n \to \infty} \frac{9 n}{\log_{2} [343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1}]} = \frac{9}{\log_{2} 76.1} \approx 1.44,

and

lim_{n \to \infty} (1 + \frac{{log}_{2} (n / 2)}{n}) = 1 .

Thus, for a sufficiently large n, we always have

\frac{9 n}{\log_{2} [343^{3} \cdot (n - 1) n^{2} \cdot {76.1}^{n - 1}]} > 1 + \frac{{log}_{2} (n / 2)}{n} .

In other words, the proposed PKCHD cryptosystem always can use a knapsack problem with density

d > 1 + {log}_{2} (n / 2) / n

as the encryption function. To generate the hardest knapsack problem, the cryptosystem can generate two larger primes p and q to make the density

d \approx 1 + {log}_{2} (n / 2) / n

.

To make a knapsack problem be the hardest, the cargo vector should be indistinguishable from the random vectors. In fact, we have shown that the public vector of the PKCHD system is indistinguishable from a randomly-chosen vector. Consequently, if the hardness of a knapsack instance is evaluated by its density, the PKCHD system always can use the hardest knapsack vector as the public key.

6.4. Provable Security Remarks

In public key cryptography, two typical methods are employed for security analysis. One is the provable security theory [46], the basic idea is to reduce the security of a PKC under some attack model to a mathematical hard problem. The other is to deliver the PKC to the cryptological community for attacks that is called enumerative security. Provable security has been widely accepted as a standard method for the security analysis of PKCs. However, due to the following considerations, in this study, we do not prefer provable security results about the proposed PKCHD cryptosystem. Firstly, we should note that almost all the provably secure PKCs are constructed from the number-theoretic problems, i.e., integer factorization and discrete logarithm problems. Secondly, provable security theory is not suitable for analyzing the security of those PKCs based on NP-complete problems. These PKCs are always constructed from an easy problem. Actually, the problem of reversing the encryption functions is only a seemingly-hard rather than a truly hard problem. It makes no sense to reduce the security of a PKC to a seemingly-hard problem. Thirdly, security analysis for a newly-designed trapdoor one-way function should be centered on the estimation of the hardness of reversing the encryption function and retrieving the trapdoor information. If no efficient algorithms have been found for a long time to compromise its security, we can assume its one-wayness and begin to consider adding paddings to it to make it obtain provable security objectives.

It will be a significant theoretical result if one can prove that reversing the encryption function is equivalent to solving the mathematical problems used in constructing the PKC. However, this is an extremely tough task [44].

7. Conclusions

Due to the performance advantages over other cryptosystems, the knapsack cryptosystems, as a typical class of PKCs, plays an important role in the wide variety of available cryptosystems. Especially, new knapsack-type cryptographic primitives have been developed in recent years, e.g., the non-injective knapsack cryptosystems [47], the knapsack Diffie–Hellman problem [48], and elliptic curve discrete logarithm based knapsack public-key cryptosystem [49].

In this paper, a probabilistic knapsack-type PKC, namely PKCHD, which uses CRT to disguise the easy knapsack sequence has been constructed with careful security analysis. Fortunately, no practical attacks have been found to comprise the PKCHD’s security. However, the history that almost all additive knapsack-type cryptosystems were shown to be vulnerable to some attacks makes the designers confident. Thus, some novel attacks are to be investigated to make it more secure.

Author Contributions

Conceptualization, Y.P. and B.W.; methodology, Y.P. and B.W.; validation, Y.P., B.W., S.T. and J.Z.; formal analysis, Y.P., B.W. and J.Z.; investigation, S.T. and H.M.; resources, S.T. and H.M.; writing–original draft preparation, Y.P.; writing–review and editing, Y.P. and B.W.; supervision, B.W.; project administration, Y.P. and B.W.; funding acquisition, Y.P. and B.W.

Funding

This work is supported by the National Key R&D Program of China under Grant No. 2017YFB0802000, the National Natural Science Foundation of China under Grant No. U1736111, the Plan For Scientific Innovation Talent of Henan Province under Grant No. 184100510012, the Program for Science and Technology Innovation Talents in the Universities of Henan Province under Grant No. 18HASTIT022, the Key Technologies R&D Program of Henan Province under Grant No. 182102210123 and 192102210295, the Foundation of Henan Educational Committee under Grant No. 16A520025 and 18A520047, the Foundation for University Key Teacher of Henan Province under Grant No. 2016GGJS-141, the Open Project Foundation of Information Technology Research Base of Civil Aviation Administration of China under Grant No. CAAC-ITRB-201702, and the Innovation Scientists and Technicians Troop Construction Projects of Henan Province.

Acknowledgments

The authors would like to thank the anonymous reviewers for their carefulness and patience, and thank Sheng Tong for the proof of Theorem 8 and Fagen Li for paper preparation.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.

References

Diffie, W.; Hellman, M.E. New Directions in Cryptography. IEEE Trans. Inf. Theory 1976, IT-22, 644–654. [Google Scholar] [CrossRef]
Rivest, R.L.; Shamir, A.; Adleman, L.M. A Method for Obtaining Digital Signature and Public Key Cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef]
ElGamal, T. A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. IEEE Trans. Inf. Theory 1985, IT-31, 469–472. [Google Scholar] [CrossRef]
Merkle, R.C.; Hellman, M.E. Hiding Information and Signatures in Trapdoor Knapsacks. IEEE Trans. Inf. Theory 1978, IT-24, 525–530. [Google Scholar] [CrossRef]
Chor, B.; Rivest, R.L. A Knapsack-Type Public Key Cryptosystem Based on Arithmetic in Finite Fields. IEEE Trans. Inf. Theory 1988, IT-34, 901–909. [Google Scholar] [CrossRef]
Vaudenay, S. Cryptanalysis of The Chor–Rivest Cryptosystem. J. Cryptol. 2001, 14, 87–100. [Google Scholar] [CrossRef]
Orton, G. A Multiple-Iterated Trapdoor for Dense Compact Knapsacks. In Advances in Cryptology–Eurocrypt 1994 (LNCS); Springer-Verlag: Perugia, Italy, 1995; Volume 950, pp. 112–130. [Google Scholar]
Morii, M.; Kasahara, M. New Public Key Cryptosystem Using Discrete Logarithm Over GF(p). IEICE Trans. Fund. 1988, J71-D, 448–453. [Google Scholar]
Naccache, D.; Stern, J. A New Public-Key Cryptosystem. In Advances in Cryptology–Eurocrypt 1997 (LNCS); Springer-Verlag: Konstanz, Germany, 1997; Volume 1233, pp. 27–36. [Google Scholar]
Goodman, R.M.F.; McAuley, A.J. New Trapdoor-Knapsack Public-Key Cryptosystem. IEE Proc. 1985, 132 Pt E, 282–292. [Google Scholar]
Niemi, V. A New Trapdoor in Knapsacks. In Advances in Cryptology–Eurocrypt 1990 (LNCS); Springer-Verlag: Aarhus, Denmark, 1990; Volume 473, pp. 405–411. [Google Scholar]
Janardan, R.; Lakshmanan, K.B. A Public-Key Cryptosystem based on The Matrix Cover NP-Complete Problem. In Advances in Cryptology–Crypto 1982; Plenum: New York, NY, USA, 1983; pp. 21–37. [Google Scholar]
Blackburn, S.R.; Murphy, S.; Stern, J. Weaknesses of A Public Key Cryptosystem based on Factorization of Finite Groups. In Advances in Cryptology–Eurocrypt 1993 (LNCS); Springer-Verlag: Lofthus, Norway, 1994; Volume 765, pp. 50–54. [Google Scholar]
Nguyen, P.; Stern, J. Merkle-Hellman Revisited: A cryptanalysis of The Qu-Vanstone Cryptosystem based on Group Factorizations. In Advances in Cryptology–Crypto 1997 (LNCS); Springer-Verlag: Santa Barbara, CA, USA, 1997; Volume 1294, pp. 198–212. [Google Scholar]
Pieprzyk, J.P. On Public-Key Cryptosystems Built Using Polynomial Rings. In Advances in Cryptology–Eurocrypt 1985 (LNCS); Springer-Verlag: Linz, Austria, 1985; Volume 219, pp. 73–80. [Google Scholar]
Lin, C.H.; Chang, C.C.; Lee, R.C.T. A New Public-Key Cipher System based upon The Diophantine Equations. IEEE Trans. Comput. 1995, 44, 13–19. [Google Scholar] [CrossRef]
Webb, W.A. A Public Key Cryptosystem based on Complementing Sets. Cryptologia 1992, XVI, 177–181. [Google Scholar] [CrossRef]
Brickell, E.F. Solving Low Density Knapsacks. In Advances in Cryptology–Crypto 1983; Plenum: New York, NY, USA, 1984; pp. 24–37. [Google Scholar]
Lagarias, J.C.; Odlyzko, A.M. Solving Low-Density Subset Sum Problems. J. ACM 1985, 32, 229–246. [Google Scholar] [CrossRef]
Coster, M.J.; LaMacchia, B.A.; Odlyzko, A.M.; Schnorr, C.P. An Improved Low-Density Subset Sum Algorithm. In Advances in Cryptology–Eurocrypt 1991 (LNCS); Springer-Verlag: Brighton, UK, 1991; Volume 547, pp. 54–67. [Google Scholar]
Brickell, E.F.; Odlyzko, A.M. Cryptanalysis: A Survey of Recent Results. In Contemporary Cryptology, The Science of Information Integrity; IEEE Press: New York, NY, USA, 1992; pp. 501–540. [Google Scholar]
Lagarias, J.C. Knapsack Public Key Cryptosystems and Diophantine Approximation. In Advances in Cryptology–Crypto 1983; Plenum: New York, NY, USA, 1984; pp. 3–23. [Google Scholar]
Lai, M.K. Knapsack Cryptosystems: The Past and The Future. Available online: http://www.ics.uci.edu/~{}mingl/knapsack.html (accessed on 20 December 2003).
Odlyzko, A.M. The Rise and Fall of Knapsack Cryptosystems. Am. Math. Soc. Proc. Symp. Appl. Math 1990, 42, 75–88. [Google Scholar]
Lenstra, A.K.; Lenstra, H.W., Jr.; Lovász, L. Factoring Polynomials with Rational Coefficients. Math. Ann. 1982, 261, 513–534. [Google Scholar] [CrossRef]
Omura, K.; Tanaka, K. Density Attack to The Knapsack Cryptosystems with Enumerative Source Encoding. IEICE Trans. Fund. 2001, E84-A, 1564–1569. [Google Scholar]
Nguyen, P.; Stern, J. Adapting Density Attacks to Low-Weight Knapsacks. In Advances in Cryptology–Asiacrypt 2005 (LNCS); Springer-Verlag: Chennai, India, 2005; Volume 3788, pp. 41–58. [Google Scholar]
Wang, B.; Hu, Y. Diophantine Approximation Attack on A Fast Public Key Cryptosystem. In The 2nd Information Security Practice and Experience Conference–ISPEC 2006 (LNCS); Springer: Hangzhou, China, 2006; Volume 3903, pp. 25–32. [Google Scholar]
Schnorr, C.P.; Euchner, M. Lattice Basis Reduction: Improved Practical Algorithms and Solving Subset Sum Problems. Math. Progr. 1994, 66, 181–191. [Google Scholar] [CrossRef]
Ajtai, M.; Dwork, C. A Public-Key Cryptosystem with Worst-Case/Average-Case Equivalence. In Proceedings of the 29th ACM STOC, El Paso, TX, USA, 4–6 May 1997; pp. 284–293. [Google Scholar]
Goldreich, O.; Goldwasser, S.; Halvei, S. Public-Key Cryptosystems from Lattice Reduction Problems. In Advances in Cryptology–Crypto 1997 (LNCS); Springer-Verlag: Santa Barbara, CA, USA, 1997; Volume 1294, pp. 112–131. [Google Scholar]
Hoffstein, J.; Pipher, J.; Silverman, J.H. NTRU: A New High Speed Public Key Cryptosystem. In Proceedings of the Algorithm Number Theory–ANTS III (LNCS); Springer-Verlag: Portland, OR, USA, 1998; Volume 1423, pp. 267–288. [Google Scholar]
Cai, J.Y.; Cusick, T.W. A lattice-based Public-Key Cryptosystem. Inf. Comput. 1999, 151, 17–31. [Google Scholar] [CrossRef]
Sakurai, K. A Progress Report on Lattice-based Public-Key Cryptosystems—Theoretical Security Versus Practical Cryptanalysis. IEICE Trans. Inf. Syst. 2000, E83-D, 570–579. [Google Scholar]
Nguyen, P.; Stern, J. The Two Faces of Lattices in Cryptology. In Proceedings of the Cryptography and Lattices–CaLC (LNCS); Springer-Verlag: Providence, RI, USA, 2001; Volume 2146, pp. 146–180. [Google Scholar]
Shamir, A. On The Cryptocomplexity of Knapsack Systems. In Proceedings of the Eleventh Annual ACM Symposium on Theory of Computing, Atlanta, GA, USA, 30 April–2 May 1979; pp. 118–129. [Google Scholar]
Katayangi, K.; Murakami, Y. A New Product-Sum Public-Key Cryptosystem Using Message Extension. IEICE Trans. Fund. 2001, E84-A, 2482–2487. [Google Scholar]
Okamoto, T.; Tanaka, K.; Uchiyama, S. Quantum Public-Key Cryptosystems. In Advances in Cryptology–Crypto 2000 (LNCS); Springer-Verlag: Santa Barbara, CA, USA, 2000; Volume 1880, pp. 147–165. [Google Scholar]
Wang, B.; Hu, Y. Public Key Cryptosystem based on Two Cryptographic Assumptions. IEE Proc. Commun. 2005, 152, 861–865. [Google Scholar]
Shamir, A.; Zippel, R.E. On The Security of The Merkle-Hellman Cryptographic Scheme. IEEE Trans. Inf. Theory 1980, 26, 339–340. [Google Scholar] [CrossRef]
Laih, C.S.; Gau, M.J. Cryptanalysis of A Diophantine Equation Oriented Public Key Cryptosystem. IEEE Trans. Comput. 1997, 46, 511–512. [Google Scholar] [CrossRef]
Eier, R.; Lagger, H. Trapdoors in Knapsack Cryptosystems. In Cryptography–EUROCRYPT 1982 (LNCS); Springer: Berlin/Heidelberg, Germany, 1982; Volume 149, pp. 316–322. [Google Scholar]
Wang, B.; Wu, Q.; Hu, Y. A Knapsack-based Probabilistic Encryption Scheme. Inf. Sci. 2007, 177, 3981–3994. [Google Scholar] [CrossRef]
Koblitz, N. Algebraic Aspects of Cryptography; Springer-Verlag: Berlin, Germany, 1998. [Google Scholar]
Wolf, C. Multivariate Quadratic Polynomials in Public Key Cryptography. Ph.D. Thesis, Katholieke Universiteit Leuven, Leuven, Belgium, 2005. Available online: http://eprint.iacr.org/2005/393 (accessed on 1 November 2005).
Koblitz, N.; Menezes, A.J. Another Look at “Provable Security”. J. Cryptol. 2007, 20, 3–37. [Google Scholar] [CrossRef]
Koskinen, J.A. Non-Injective Knapsack Public-Key Cryptosystems. Theor. Comput. Sci. 2001, 255, 401–422. [Google Scholar] [CrossRef]
Han, S.; Chang, E.; Dillon, T. Knapsack Diffie-Hellman: A New Family of Diffie-Hellman. Cryptology ePrint Archive: Report 2005/347. Available online: http://eprint.iacr.org/2005/347 (accessed on 22 August 2006).
Su, P.C.; Lu, E.; Chang, H. A Knapsack Public-Key Cryptosystem based on Elliptic Curve Discrete Logarithm. Appl. Math. Comput. 2005, 168, 40–46. [Google Scholar] [CrossRef]

Table 1. The non-injectivity of the encryption function under the relinearization attack model.

I	$\{0, 1, 2, 3\}$
K	$\{1, 2, 3\}$
$μ$	27
n	9
A	$10000, 6000, 7000, 5800, 5300, 5840, 8210, 6662, 5113$
B	$10000, 5000, 8000, 5500, 5100, 6150, 5830, 5335, 6007$
p	999979
q	999983
N	999962000357
E	$10000, 250000750, 999712012607, 75004225, 50004250,$
	$499903507646, 594995715, 750303249963, 499757509985$
$e_{9}^{- 1}$	759237254392
F	$661037209656, 7824090728, 451539481682,$
	$866739311295, 192593114076, 586570143338,$
	$753328582077, 356431315295, 1$
M	$(2, 3, 3, 3, 2, 3, 0, 1, 2)$
G	$(2, 3, 1, 3, 1, 3, 2, 3, 2)$
c	44190990551868
Y	$(4, 27, 3, 27, 2, 27, 0, 1, 4), (10, 5, 12, 19, 19, 7, 10, 1, 4)$
	$(5, 12, 9, 13, 9, 27, 10, 1, 4), (18, 6, 4, 25, 13, 4, 0, 11, 4)$
	$(13, 13, 1, 19, 3, 24, 0, 11, 4), (5, 8, 19, 27, 4, 1, 0, 21, 4)$
	$(2, 0, 15, 27, 24, 1, 0, 21, 4), (1, 0, 22, 7, 1, 21, 10, 21, 4)$
	$(2, 3, 16, 8, 1, 21, 21, 1, 14), (3, 2, 5, 23, 0, 12, 12, 11, 24)$

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ping, Y.; Wang, B.; Tian, S.; Zhou, J.; Ma, H. PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density. Information 2019, 10, 75. https://doi.org/10.3390/info10020075

AMA Style

Ping Y, Wang B, Tian S, Zhou J, Ma H. PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density. Information. 2019; 10(2):75. https://doi.org/10.3390/info10020075

Chicago/Turabian Style

Ping, Yuan, Baocang Wang, Shengli Tian, Jingxian Zhou, and Hui Ma. 2019. "PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density" Information 10, no. 2: 75. https://doi.org/10.3390/info10020075

APA Style

Ping, Y., Wang, B., Tian, S., Zhou, J., & Ma, H. (2019). PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density. Information, 10(2), 75. https://doi.org/10.3390/info10020075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PKCHD: Towards a Probabilistic Knapsack Public-Key Cryptosystem with High Density

Abstract

1. Introduction

2. Preliminaries

2.1. Lattice

2.2. Low-Density Subset Sum Attacks

2.3. Simultaneous Diophantine Approximation

3. Easy Knapsack-Type Problems

3.1. An Easy Compact Knapsack Problem

3.2. Generalization of the Simultaneous Compact Knapsack Problem

4. The Proposed PKCHD Cryptosystem

4.1. Key Generation

4.2. Encryption

4.3. Decryption

4.4. Remarks

4.5. A Practical Implementation

5. Performance and Parameter Specifications

5.1. Parameter Specifications

5.2. On Generating the Keys

5.3. Computational Complexity

5.4. Information Rate

6. Security Analysis

6.1. On Solving the Cracking Problem

6.1.1. Brute Force Attacks

6.1.2. Low-Density Attack

6.1.3. On the Number of Plaintext Vectors That a Cipher-Text Has

6.1.4. On Reducing to the CVP

6.2. On Solving the Trapdoor Problem

6.2.1. Simultaneous Diophantine Approximation Attack

6.2.2. Known N Attack

6.2.3. Known p and q Attack

6.3. Generating the Hardest Knapsack Instances

6.4. Provable Security Remarks

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI