1. Introduction
Digital signatures are a very important cryptographic primitive in the modern world. Among the most popular, there are, for instance, schemes based on the RSA assumptions, discrete logarithm (DSA) and its elliptic curves version (ECDSA), all included in the FIPS standard 186-3 [
1]. Many schemes based on coding theory have been proposed over the years that either follow a “direct” hash-and-sign approach like the Courtois-Finiasz-Sendrier scheme (CFS) [
2] and the Kabatianskii-Krouk-Smeets scheme (KKS) [
3], or rely on the Fiat–Shamir transform [
4] to convert an identification scheme into a signature scheme. The latter schemes are usually built via a 3-pass protocol [
5] or, more recently, a 5-pass protocol [
6], in turn relying on the work of Stern [
7,
8]. Unfortunately, many of the various proposals have been broken, and all those that are still considered secure suffer from one or more flaws, be that a huge public key, a large signature or a slow signing algorithm, which make them highly inefficient in practical situations. This is particularly evident in the identification schemes, where it is usually necessary to repeat the protocol many times in order to guarantee correctness or security.
In [
9], we introduced a code-based signature scheme following a different approach, inspired by the work of Lyubashevsky [
10,
11]. Such a proposal had been attempted before (see [
12]) without success, the main issue being the choice of the setting (random binary codes), which proved to be too restrictive. Choosing quasi-cyclic codes allows for taking advantage of the innate ring metric and makes the scheme viable in practice.
  Our Contribution
This full version features a detailed security analysis, including a proof of security that guarantees one-time existential unforgeability against chosen-message attacks, i.e., 1-EUF-CMA. While one-time signatures are not used directly in most applications, they are still relevant since they can be embedded in a Merkle tree structure to obtain a full-fledged signature scheme, which allows for signing up to a predetermined number of times. Our scheme compares very well to other one-time code-based proposals, obtaining what are, to date, the smallest sizes for both signature and public data in the code-based setting.
The paper is organized as follows: in the next section, we give some preliminary notions about codes and code-based cryptography, as well as identification schemes. In 
Section 3, we describe the framework on which our scheme will be based, including the previous code-based proposal by Persichetti. Our scheme is presented in 
Section 4, together with a detailed security analysis (
Section 5), and its performance and comparison with other code-base schemes are discussed in 
Section 6. We conclude in 
Section 7.
  2. Preliminaries
  2.1. Coding Theory
Let  be the finite field with q elements. An  linear code  is a subspace of dimension k of the vector space . Codewords are usually measured in the Hamming metric: the Hamming weight of a word  is the number of its non-zero positions, and the Hamming distance between two words  is the number of positions in which they differ, that is, the weight of their difference. We denote those respectively by  and .
Linear codes can be efficiently described by matrices. The first way of doing this is essentially choosing a basis for the vector subspace. A 
generator matrix is a matrix 
G that generates the code as a linear map: for each message 
, we obtain the corresponding codeword 
. Of course, since the choice of basis is not unique, so is the choice of generator matrix. It is possible to do this in a particular way, so that 
. This is called 
systematic form of the generator matrix. Alternatively, a code can be described by its 
parity-check matrix: this is nothing but a generator for the 
dual code of 
, i.e., the code comprised of all the codewords that are “orthogonal” to those of 
. The parity-check matrix describes the code as follows:
The product  is known as syndrome of the vector x. Note that, if  is a generator matrix in systematic form for , then  is a systematic parity-check matrix for .
Code-based cryptography usually relies more or less directly on the following problem, connected to the parity-check matrix of a code.
Problem 1 (Syndrome Decoding Problem (SDP)).
Given: ,  and .
Goal: find  with  such that .
 This problem is well known and was proved to be NP-complete by Berlekamp, McEliece and van Tilborg in [
13]. Moreover, it is proved that there exists a unique solution to SDP if the weight 
w is below the so-called 
GV Bound.
Definition 1. Let  be an  linear code over . The Gilbert–Varshamov (GV) Distance 
is the largest integer d such that  If this is not the case, multiple solutions exist (see, for example, Overbeck and Sendrier, [
14]). It follows that SDP is of particular interest when the weight 
w is “small”.
  Quasi-Cyclic Codes
A special subfamily of linear codes is that of cyclic codes.
Definition 2. Let  be an  linear code over . We call cyclic 
if  Clearly, if the code is cyclic, then all the right shifts of any codeword have to belong to  as well. An algebraic characterization can be given in terms of polynomial rings. In fact, it is natural to build a bijection between cyclic codes and ideals of the polynomial ring . We identify the vector  with the polynomial , and then the right shift operation corresponds to the multiplication by X in the ring.
Because of this correspondence, it is possible to see that both the generator matrix and the parity-check matrix of a cyclic code have a special form, namely circulant form, where the i-th row corresponds to the cyclic right shift by i positions of the first row.
Cyclic codes have been shown to be insecure in the context of cryptography, as they introduce too much recognizable structure. A subfamily, known as quasi-cyclic codes, has then been proposed with some success, mostly in the context of encryption.
Definition 3. Let  be an  linear code over . We call  Quasi-Cyclic if there exists  such that, for  any codeword a, all the right shifts of a by  positions are also codewords.
 When , it is again possible to have both matrices in a special form, composed of  circulant  blocks. The algebra of quasi-cyclic codes can be connected to that of the polynomial ring , where each codeword is a length- vector of elements of the ring.
For the remainder of the paper, we consider only binary codes; thus, we set , and we restrict our attention to the case . We have the following ring-based formulation of SDP.
Problem 2 (Quasi-Cyclic Syndrome Decoding Problem (QC-SDP)).
Given:  and .
Goal: find  with  such that .
 This was shown to be NP-complete in [
15]. When 
, it has been proved in [
16] that random quasi-cyclic codes lie on the GV bound with overwhelming probability. Moreover, the impact of cyclicity on SDP has been studied, for example in [
17], revealing no substantial gain.
  2.2. Identification Schemes and Signatures
An identification scheme is a protocol that allows a party , the Prover, to prove to another party , the Verifier, that he possesses some secret information x, usually called witness, without revealing to the verifier what that secret information is. The paradigm works as follows:  is equipped with a public key  and some public data D. To start,  chooses some random data y and commits to it by sending  to , where f is usually a trapdoor one-way function or a hash function.  then chooses a random challenge c and sends it to . After receiving c,  computes a response z as a function of  and y and transmits z. Finally,  checks that z is correctly formed using  and D.
A signature scheme is defined by a triple , respectively the key generation algorithm, the signing algorithm and the verification algorithm. The key generation algorithm  takes as input a security parameter  and outputs a signing key  and a verification key . The private signing algorithm  receives as input a signing key  and a message m and returns a signature . Finally, the public verification algorithm  uses a verification key  to verify a signature  that is transmitted together with the message m: it outputs 1, if the signature is recognized as valid, or 0 otherwise.
The standard notion of security for digital signatures schemes is Existential Unforgeability under Chosen-Message Attacks (EUF-CMA), as described, for example, in [
18]. In this scenario, the goal of an attacker is to produce a valid message/signature pair, and the attack model allows the attacker to obtain a certain, predetermined, number of signatures on arbitrarily chosen messages (signing queries). In particular, if the attacker is only allowed to obtain a single signature, we talk about 1-EUF-CMA security. Since this is the security target of this work, we give a precise definition below.
Definition 4. An adversary  is a polynomial-time algorithm that acts as follows:
- 1. 
- Query a key generation oracle to obtain a verification key . 
- 2. 
- Choose a message m and submit it to a signing oracle. The oracle will reply with . 
- 3. 
- Output a pair . 
The adversary succeeds if  and . We say that a signature scheme is 1-EUF-CMA secure if the probability of success of any adversary  is negligible in the security parameter, i.e.,  Fiat and Shamir in [
4] showed how to obtain a full-fledged signature scheme from an identification scheme. With this paradigm, the signer simply runs the identification protocol, where, for the purpose of generating the challenge, the verifier is replaced by a random oracle 
 (usually a cryptographic hash function). The signature is then accepted according to the validity of the response in the identification scheme. We report this in 
Table 1.
Note that several signature schemes, including [
11] and this work, use a slightly modified version of the above paradigm, where the signature is 
 instead of 
. The verifier can then calculate 
Y from 
z and the public key, and check the equality between 
c and the hash digest obtained using this newly-generated 
Y and 
m.
  4. The New Scheme
The core of our idea is to use quasi-cyclic codes in the framework that we have described above. The use of quasi-cyclic codes in cryptography is not a novelty: these have been proposed before in the context of encryption (e.g., [
15]). Their originally suggested use (i.e., with GRS codes) was cryptanalyzed in [
21] and it is thus not recommended, but other variants based on Low-Density Parity-Check (LDPC) and Moderate-Density Parity-Check (MDPC) codes are still considered safe. In both cases, the issue is that introducing the extra algebraic structure can compromise the secrecy of the private matrix used for decoding.
A big advantage of our proposal is that this issue does not apply. In fact, since there is no decoding involved, an entirely random code can be used, and the code itself is public, so there is no private matrix to hide. In this sense, our scheme is closer, to an extent, to the work of [
22], which is centered on 
random quasi-cyclic codes.
As far as signature schemes go, Gaborit and Girault in [
23] propose a variant of Stern’s ID scheme that uses quasi-cyclic codes (called “double-circulant” by the authors). While this proves to be more efficient than the classical Stern scheme, the protocol still features the same flaw, i.e., a non-trivial cheating probability. This leads to the necessity of repeating the protocol several times, with an obvious impact on the efficiency of the scheme.
In our setting, we use 2-quasi-cyclic codes where words are vectors in . For a word , the syndrome function associated to  is defined as , following the notation that takes a parity-check matrix in systematic form (and hence defined by h) as in Problem 2. For a more general formulation, we also adapt the notation from the previous section, indicating with  and  the distributions that sample uniformly at random vectors of  having weight respectively less or equal to  and . Our signature scheme is presented below. The scheme uses a hash function  that outputs bit strings of fixed weight , which is one of the system parameters.
	  
- KeyGen - Input: parameters  and a vector . - 1.  Sample . - 2.  The signing key is x. - 3.  The verification key is . 
- Sign - Input: a message m and the signing key x. - 1.  Sample . - 2.  Compute . - 3.  Compute . - 4.  Compute . - 5.  The signature is . 
- Ver - Input: a message m, a signature  and the verification key . - 1.  Compute . - 2.  Use the verification key to compute . - 3.  Compute . - 4.  Accept if  and . - 5.  Else, reject. 
Like before, we have a constraint on the weight of the response vector z: in this case, , since c is no longer a constant. Then, w is required to be below the GV bound to ensure that the response z is the unique solution to the corresponding QC-SDP instance. This is a consequence of the security requirements, as we will see next.
To conclude, note that it is easy to check that an honest verifier always gets accepted. In fact, in an honest run of the protocol, then . Due to the transitivity of the syndrome computation, this is the same as . Therefore,  and the verification is passed.
  5. Security
The change of metric in our proposal means that our scheme is substantially different from the “naïve” SDP-based proposal of 
Section 3.2, and in fact resembles the lattice setting much more. In fact, as in the lattice case, our objects are “vectors of vectors”, namely in this case a length-2 vector of length-
p binary vectors. Due to the inherent arithmetic associated to the ring 
, this allows us to choose 
c in the same realm, and perform an operation (ring multiplication) that is still compatible with the verification operation, but does affect the weight of the response vector. Polynomial multiplication simultaneously increases and scrambles the error positions, and in so doing prevents the simple attack based on statistical analysis that affected the previous proposal. Unfortunately, this is still not enough to hide the private information. The following procedure [
24] shows that it is still possible to recover the private key with a polynomial number of signatures.
Procedure 1. Start by obtaining a polynomial number ℓ of signatures, i.e., pairs  for . For each pair,  is chosen uniformly at random among the vectors of weight δ, and  where  is also chosen uniformly at random (sampled from ). For each i, write , that is, as a polynomial of weight δ in . Then, calculatewhere  and . Since  is just a shift of x and  is just a shift of , and their support will likely have little to no intersection with the support of x (due to the weight of the vectors), it is possible to reveal the support of x simply by looking at the bits that belong to the support of a large enough number of .
 Note that the above procedure is in fact a refinement of the simple statistical analysis attack encountered before: in both cases, the problem is that the weight of the vectors is simply too low to properly mask the private vector. It is then clear that it is impossible to sign multiple times and preserve security. It follows that our scheme only achieves one-time security. To prove the one-time security of our scheme, we follow the paradigm for a generic one-time signature scheme of Pointcheval and Stern, which was already employed in the code-based setting in [
25]. In this paradigm, signature schemes are treated in a unified way, as a protocol that outputs triples of the form 
, where 
 represents the commitment, or a sequence of commitments, if the protocol needs to be repeated multiple times. 
 is the response, or a sequence of responses, and h is the hash value, as in the Fiat–Shamir scheme. To obtain security, it is necessary that 
 is sampled uniformly at random from a large set and that 
 only depends on 
, the message m and the hash value h.
In our scheme, the first element  is sampled uniformly at random from , which has size . Note that, even though this value is not explicitly output as part of the signature, it is immediate to recover it from the signature, as shown in Step 2 of the verification algorithm. The vector c is exactly the hash value obtained from the message m and , i.e., the element h in the Pointcheval–Stern notation. We clearly use c from now on, to avoid confusion as h is used to denote the vector defining the parity-check matrix in a QC code. Finally, we show that  indeed only depends on the message m,  and c. The dependence is obvious, given that z is computed using only the private key, c itself and y, which is in a one-to-one correspondence with  (due to  being below the GV bound). Furthermore, z is uniquely determined by those values. In fact, suppose there existed a distinct valid triple  with . Since the triple is valid, it needs to satisfy the verification equation, thus . This is clearly not possible because both z and  have weight below the GV bound, which implies that there exists only one vector having syndrome , i.e., .
The next step is to show that, in our signature scheme, it is possible to simulate the target triples without knowing the private key, unbeknownst to the adversary.
Lemma 1. It is possible to obtain artificially-generated triples of the form  which are indistinguishable from honestly-generated triples, unless the adversary is able to solve an instance of QC-SDP.
 Proof.  To begin, notice that any valid triple is required to satisfy two constraints. First, the weight of z has to be below the GV bound; in fact,  is expected to be statistically close to the bound . Second, the triple needs to pass the verification equation, and so . Then, to simulate a valid triple, it is enough to sample two elements at random and set the third to match. More precisely, one would sample  and , the second one chosen such that . Then, one would proceed by setting  to be exactly , which is possible since the public key  is known.
Now, it is easy to see that all honestly-generated triples correspond to syndromes  where y has weight  below the GV bound, while, for simulated triples, the syndrome  is obtained from a vector  which has expected weight above the GV bound with overwhelming probability. This is because both c and z are generated independently and at random, and so the expected weight is simply , which is bigger than the bound with overwhelming probability.
In conclusion, distinguishing a simulated triple from an honest one corresponds to solving a QC-SDP instance as claimed.  ☐
 The last piece necessary for our proof is the well-known 
forking lemma. We report it below, as formulated in [
26].
Theorem 1 (General Forking Lemma). Let  be a signature scheme with security parameter λ. Let  be an adversary, running in time T and performing at most q random oracle queries and ℓ signing queries. Suppose that  is able to produce a valid signature  with probability . If the triples  can be simulated without knowing the private key with only a negligible advantage for , then there exists a polynomial-time algorithm  that can simulate the interaction with  and is able to produce two valid signatures  and , for , in time .
 We are now ready for our security result. 
Theorem 2. Let  be a polynomial-time 1-EUF-CMA adversary for the signature scheme with parameters , running in time T and performing at most q random oracle queries. Let the probability of success of  be . Then, the QC-SDP problem with parameters  can be solved in time .
 Proof.  We have seen in Procedure 1 that it is possible to recover the private key using a polynomial number ℓ of signatures. The forking lemma can be iterated so that it is guaranteed to produce ℓ distinct, valid signatures in time less or equal to . The thesis naturally follows from the combination of these two facts.  ☐
   6. Performance and Comparison
To properly evaluate the performance, we start by recalling the main components of our scheme. First of all, the public data consists of the vector 
h (of length 
p) and the syndrome 
 (also of length 
p), for a total of 
 bits. The signature, on the other hand, is given by the challenge string 
c and the response 
z. In our scheme, this corresponds respectively to a vector of length 
p and a vector of length 
. It is possible to greatly reduce this size thanks to a storing technique [
27] which allows for representing low-weight vectors in a compact manner. Namely, a binary vector of length 
n and weight 
w is represented as an index, plus an indication of the actual vector weight, for a total of 
. Note that in our case this applies to both 
c and 
z.
We now provide (
Table 2) some parameters for the codes in our scheme. These are normally evaluated with respect to general decoding algorithms such as Information-Set Decoding [
28,
29,
30,
31,
32]: the amount of security bits is indicated in the column “Security”.
The first two rows report well-known parameters suggested in the literature for QC-MDPC codes; however, since our codes do not need to be decodable, we are able to slightly increase the number of errors introduced. The last two rows, instead, are parameters chosen ad hoc, in order to optimize performance.
  6.1. Existing Code-Based Solutions
We are now going to briefly discuss the three main approaches to obtain code-based signatures, and related variants. This will give an insight into why designing an efficient code-based signature scheme is still an open problem.
  6.1.1. CFS
The CFS scheme [
2] follows the “hash and sign” paradigm, which is a very natural approach for code-based cryptography, and thus it retains most of its traits, both good and bad. For instance, the verification consists of a single matrix-vector multiplication and so it is usually very fast. On the other hand, the scheme features a very large public key (the whole parity-check matrix). Structured instances as proposed for example in [
33] reduce this size drastically and are therefore able to deal with this issue, although with a potential few security concerns. However, the main downfall of CFS is the extremely slow signing time. This is a consequence of the well-known fact that a random word is in general 
not decodable, thus finding a decodable syndrome requires an incredibly high number of attempts (at least 
 in the simplest instances). To lower this number, the common solution is to use codes with very high rate, which in itself could lead to potential insecurities (e.g., the distinguisher of). Thus, it seems unrealistic to obtain an efficient signature scheme in this way.
  6.1.2. KKS
The KKS approach [
3] still creates signatures in a “direct” way, but without decoding. Instead, the scheme relies on certain aspects of the codes such as a carefully chosen distance between the codewords, and uses a secret support. Unfortunately, the main drawback of KKS-like schemes is the security. In fact, it has been shown in [
34] that most of the original proposals can be broken after recovering just a few signatures. Furthermore, not even a one-time version of the scheme (e.g., [
25]) is secure, as shown by Otmani and Tillich [
35], who are able to break all proposals in the literature without needing to know any message/signature pair. It is therefore unlikely that the KKS approach could be suitable for a credible code-based signature scheme.
  6.1.3. Identification Schemes
All of the code-based identification schemes proposed so far are 3-pass (or 5-pass) schemes with multiple challenges. Thus, the prover sends two or three entirely different responses depending on the value of the challenge (usually a bit or {0,1,2}). In this sense, our proposal represents a big novelty. In fact, multiple challenges allow for a malicious user to be able to cheat in some instances. For example, in the original proposal by Stern [
7], it is possible to choose any two out of three possible responses and pass verification for those even without knowing the private key, thus leading to a cheating probability of 2/3. This cheating probability is subsequently lowered in most recently proposals, approaching 1/2. Nevertheless, this causes a huge issue, since the protocol needs to be repeated several times in order for an honest prover to be accepted. The 35 repetitions of the original scheme can be lowered to approximately 16 repetitions in recent variants, but, even so, communication costs prove to be very high, leading to a very large signature size. In 
Table 3 below, we report a comparison of parameters for different variants of the scheme, where the column Véron refers to [
5], CVE to [
6] and AGS to [
36]. Note that all of these parameters refer to a cheating probability of 
, a weak authentication level.
In the latest proposal (column AGS), the size of the public matrix is considerably smaller thanks to the use of double-circulant codes. However, the signature size is still very large (about 93 Kb). Moreover, for a signature to be considered secure, one would expect computational costs to produce a forgery to be no less than 
; this would require, as claimed by the authors in [
36], to multiply all the above data by 5, producing even larger sizes.
  6.2. Comparison
A comparison of our scheme with the full-fledged schemes described above would not be entirely accurate. We can however compare (
Table 4) our scheme to other code-based proposals that are one-time secure, such as [
25,
37]. Both of these schemes follow the KKS approach, and therefore come with some potential security concerns, as mentioned in the previous section. For simplicity, we will refer to [
25] as BMS and to [
37] as GS. Note that the latter comes in two variants, which use respectively quasi-cyclic codes, and a newly-introduced class of codes called “quadratic double-circulant” by the authors. All the parameters and sizes (in bits) are reported in the following table, and correspond to a security level of 
.
It is immediate to notice that our scheme presents the smallest amount of public data (which groups together public key and any additional public information) and the smallest signature size. To be fair, the BMS scheme employs the same indexing trick used in this work, while this is not the case for the other scheme. Since the signature of the GS scheme (in both variants) also includes a low-weight vector, we expect that it would be possible to apply the same technique to the GS scheme as well, with the obvious reduction in size. We did not compute this explicitly, but it is plausible to assume it would be very close to that of our scheme. Nevertheless, the size of the public data remains much larger even in the most aggressive of the two variants (GS 2).
  6.3. Implementation
To confirm the practicality of our scheme, we have developed a simple implementation in C. The implementation is a straightforward translation to C with the addition of the steps for generating public and private keys. The hash function used was SHA-256. We ran the protocol on a small microprocessor, namely a 580 MHz single-core MIPS 24KEc. The choice of this microprocessor was made based on the usage of it, since this type of microprocessor is commonly used in the Internet of Things (IoT) applications. The measurements are reported in 
Table 5 below.
Note that key generation is dominated by the syndrome computation necessary to obtain the verification key, while sampling the signing key has a negligible cost. The signing operation is the most expensive, which makes sense, while the verification is of the same order of magnitude as the key generation. Both signing and verification algorithm are relatively fast but could be sped up even further, since the hash function used was, at the time the measurements were taken, not optimized to run in such a small device.