A Lightweight Cipher Based on Salsa20 for Resource-Constrained IoT Devices

The Internet of Things (IoT) paradigm envisions a world where everyday things interchange information between each other in a way that allows users to make smarter decisions in a given context. Even though IoT has many advantages, its characteristics make it very vulnerable to security attacks. Ciphers are a security primitive that can prevent some of the attacks; however, the constrained computing and energy resources of IoT devices impede them from implementing current ciphers. This article presents the stream cipher Generador de Bits Pseudo Aleatorios (GBPA) based on Salsa20 cipher, which is part of the eSTREAM project, but designed for resource-constrained IoT devices of Class 0. GBPA has lower program and data memory requirements compared with Salsa20 and lightweight ciphers. These properties allow low-cost resource-constrained IoT devices, 29.5% of the embedded systems in the market, to be able to implement a security service that they are currently incapable of, to preserve the user’s data privacy and protect the system from attacks that could damage it. For the evaluation of its output, three statistical test suites were used: NIST Statistical Test Suite (STS), DIEHARD and EACirc, with good results. The GBPA cipher provides security without having a negative impact on the computing resources of IoT devices.


Introduction
Society is moving towards a more connected world. The Internet of Things (IoT) is a technology whose goal is that everyday objects interact and exchange information with each other to accomplish a particular objective. With this simple idea, a wide range of applications is conceivable, which include: smart cities, smart houses, smart farming, industrial automation, security, medical services, entertainment, etc. IoT devices are physically composed of sensors, actuators, microcontrollers and transceivers, to accomplish their mission; and are capable of communicating and identifying each other through the Internet. Ideally, they are everywhere and always on; therefore, when notifying about the data perceived by their sensors and the state of their actuators, better knowledge of the current context can be achieved [1,2]. Cloud services are used by them to provide contextual data to users and to retrieve petitions to modify them; connectivity to the Internet makes them accessible from everywhere, and all the time; however, this empowerment also makes them very vulnerable because from anywhere in the world, they can be reached and attacked. If successful, an attacker could access private information such as personal, medical, financial or location data, and can also use actuators to bring severe damage to the system and even the user welfare [3]. An analysis of vulnerabilities and possible attacks is presented in [4][5][6]. Another concern is that, towards being ubiquitous, they are implemented as low-power devices with constrained resources, which creates a challenge in the performed backward. The original message cannot be recovered from a ciphertext without knowing the key; therefore, the key must be kept secret from unauthorized parties [17].
Symmetric ciphers can be classified into two types: block ciphers and stream ciphers. A block cipher divides the message or ciphertext into blocks of a fixed size and encrypts or decrypts it one block at a time. A stream cipher encrypts or decrypts one bit at a time. A stream cipher can be seen as a block cipher where the block size is one [17]. Stream ciphers usually require less computing resources than block ciphers; therefore, they are more convenient to provide security on resource-constrained devices. The proposed algorithm in this article is a stream cipher.
On a stream cipher, the encryption transformation consists of: c i = E k i (m i ) = m i + k i , and the decryption of: m i = D k i (c i ) = c i + k i ; where m i is the i bit of the message, c i is the i bit of the ciphertext and k i is the i bit of the keystream. The symbol + represents an addition module two, which is equivalent to the Boolean operation xor, and the keystream consists of a random or pseudo-random number generated by the algorithm from the key. Even though the xor is a simple operation, its use in encryption has been proven secure [18].
As seen in Table 1, the output of the xor is balanced, that is there is a 50% probability of obtaining as output a zero or a one. When encrypting, no assumption can be made about the content of a message, so the value of c i depends on the value of k i . If k i is random and unpredictable (50% probability of being zero or one), then c i is also random and unpredictable. Accordingly, the security of a stream cipher relies on the randomness and unpredictability of the keystream [18]. Randomness can be characterized as a probabilistic property, and statistical tests can be used to evaluate if a sequence has the properties that a truly random sequence has. The statistical tests evaluate the uniformity of the output and determine the presence of patterns that would reveal its non-randomness. Each test examines the sequence for a different type of pattern or property, so a single test cannot be considered enough [19]. To be certain that the results obtained with the statistical tests when evaluating a sequence will be reproducible for any other sequence produced with the same generator, the evaluating sequence has to be very long, such that any pattern or non-random property that the generator produces is revealed in it.
For cryptography, random or pseudo-random sequences [20,21] have to be unpredictable to be considered secure. This means that given an output k i , k i+1 , ..., k n−1 , there does not exist a polynomial time algorithm that can predict the next bit k n nor the preceding bit k i−1 with a probability greater than 50% [18].

Salsa20 Cipher
The GBPA algorithm is based on the Salsa20 cipher designed by Daniel J. Bernstein. Salsa20 consists of a hash function executed ten times in counter mode over an input of 64 bytes and returns an output of the same size. The hash function receives as input a key of 16 or 32 bytes, a nonce of 8 bytes, a counter of 8 bytes and 16 bytes of constant values. Salsa20 is comprised of four functions: quarterround, rowround, columnround and doubleround. The core function is quarterround; doubleround includes all the functions and is executed ten times [22][23][24].
Given y = (y 0 , y 1 , y 2 , y 3 ), where each element of y is 32 bit long, then quarterround(y) = (z 0 , z 1 , z 2 , z 3 ), where: where symbol a <<< b represents the rotation of value a by b positions to the left, a + b is arithmetic addition of a and b and a ⊕ b represents a bitwise xor between values a and b.
Function doubleround consists of executing over the input columnround and then rowround: The Salsa20 algorithm is defined as Salsa20(x) = x + doubleround 10 (x).

GBPA Cipher
The proposed algorithm consists of the function GBPA. It receives an input of 19 bytes corresponding to the key, nonce and counter and returns an output of 16 bytes. It comprises two phases: initialization and generation. On initialization, the input parameters plus some constant values are combined and reduced to 16 bytes long and arranged in an appropriate order for the next phase. The generation phase consists first of saving a copy of the init state returned by the previous phase, then executing the quarterround function ten times over the input and finally adding to it the saved init state. The output of this phase is the keystream used to perform the xor with the message.
Each execution of the GBPA functions produces a pseudo-random sequence of 16 bytes. A longer sequence can be generated by executing the function in counter mode, which consist of attaching to the nonce a counter that is incremented for each 16-byte block generated, as seen below: As shown above, the counter is three bytes long; therefore, the GBPA can generate a sequence of a maximum of 128x2 24 bits per nonce. Given , c 0 and r = r 2 , r 1 , r 0 , where the symbology a >>> b represents the rotation of value a by b positions to the right and a <<< b the rotation of value a by b positions to the left. The function GBPA is defined as: Given that y = (y 0 , y 1 , y 2 , y 3 ), where y n = 0, 1 32 ; and z = (z 0 , z 1 , z 2 , z 3 ), where z n = 0, 1 32 : The initial state of y is saved and denominated x. The quarterround function of Salsa20 is then executed 10 times. In the first nine rounds, y is updated by y = z. Finally, to make the function non-invertible, the initial state x is added to z: The output of GBPA is z of Equation (10), which is the keystream generated from the key.

Statistical Test
Three statistical test suites were used to evaluate the output of the GBPA cipher: the Statistical Test Suite from NIST (STS) [19], DIEHARD from G. Marsaglia [25] and EACirc from the Centre for Research on Cryptography and Security [26]. The first one contains 15 tests and the second one 19, where each test inspects either the distribution of ones and zeroes, harmonics or patterns in the sequence. The EACirc suite, different from the others, does not have a fixed number of tests, but it builds the tests empirically based on the sequence to be evaluated and a truly random sequence.
EACirc is an open-source project available at [27]. The STS and DIEHARD test batteries are well known and have been used by many articles to assess randomness; however, there exists a documented case where these batteries were not able to detect non-randomness on a sequence, while EACirc did [26]. Considering this limitation of STS and DIEHARD, the EACirc framework was also used to evaluate the randomness of GBPA. When performing the evaluation, the three test suites returned a p-value that represents the evidence against the null hypothesis that the sequence is random.
As mentioned before, Salsa20 is considered a secure stream cipher by eSTREAM; therefore, the evaluations of the randomness and unpredictability of the GBPA output was done to ensure that the modifications did not make it insecure. The order of presentation of the results of the tests was STS, DIEHARD and finally EACirc.
An implementation in C language of the algorithm was made to obtain the bitstream for the statistical evaluation. Each execution of the algorithm produced a 128-bit output, and after eight million executions, a sequence of 1.024 × 10 9 bits long was obtained, which conforms with the required input size of the test batteries. This output was obtained from feeding the algorithm with "weak parameters", that is parameters with a very close relationship between them. The purpose of this was to be sure that the algorithm was capable of producing pseudo-random sequences even when its inputs were not random. The pseudo code to generate the bitstream is given in Algorithm 1.
Algorithm 1: Generate binary sequence.  [19], the evaluation has to be done for at least 1000 binary sequences, and each statistical test requires a minimum bit size per sequence, where one million bits per sequence is appropriated for all of the tests. Thus, the 1.024 × 10 9 generated bits were fed to STS as 1024 sequences of one million bits each.
The default parameters were used on each test, including the significance level α of 0.01. If the returned p-value ≥ α, then there was no evidence to reject the null hypothesis, that is the sequence was accepted as random; if the p-value < α, then the null hypothesis was rejected, considering the sequence as non-random. Table 2 shows a typical output after evaluating one sequence. Table 3 shows the proportion of the 1024 sequences that passed each statistical test. A confidence interval was calculated to determine if the proportion of sequences that passed the statistical tests was within the acceptable range, defined as [19]: where: p = 1 − α, z c = 3, n = 1024. CI = 0.99 ± 0.009328007 Figure 1 shows the proportion of sequences that passed each test; the proportions were within the inferior confidence interval of 0.980671993 and the superior one of 0.999328007. For reference, the tests were also performed on a bitstream obtained with the algorithm Salsa20, using the same criteria to generate it as with the GBPA cipher. The implementation in C language of Salsa20 is the one in [15]. Table 4 shows the evaluation results. Of the 1024 sequences of Salsa20, 98.9635% passed the STS statistical tests. Of the 1024 sequences of GBPA, 98.9136% passed the tests. The GBPA cipher has a diminution of 0.0499 compared with the results of Salsa20.

DIEHARD
For DIEHARD, the recommended input size of the battery is a sequence at least 80 million bits long; the generated bitstream conforms with this recommendation.
Results of the evaluation are presented in Table 5. When more than one p-value was returned by a test, a final p-value was obtained through the Kolmogorov-Smirnov test, which determines the uniformity of multiple p-values. A Kolmogorov-Smirnov test was used again to determine the uniformity of all the p-values returned by the tests shown in Table 5. The result was a p-value of 0.262431.
As was done previously, DIEHARD tests were also performed on the binary sequence generated with Salsa20. Table 6 presents the results. The p-value returned by the Kolmogorov-Smirnov test comprised of all the p-values in Table 6 was 0.184669. Table 6. p-values returned by DIEHARD tests when evaluating a sequence of 1.024 × 10 9 bits generated by Salsa20. One of the advantages of EACirc is that it requires a lower amount of bits to detect non-randomness compared with the previous test suites, being able to work with even 1000 bits [26]. The parameters used during the evaluation of GBPA and Salsa20 sequences were the default ones presented in [27], which can be summarized as follows: α: 0.01, Number of epochs: 300, Test vector size: 16, Test vector count: 1000, Function set: NOP, CONS, NOT, AND, NAND, OR, XOR, NOR, SHIL, SHIR, ROTL, ROTR, MASK. The function set parameter included all the operations supported by EACirc, and these were used to construct the test stochastically using a genetic algorithm [26].
In Table 7, the results of GBPA and Salsa20 are presented. The shown p-values corresponded to the Kolmogorov-Smirnov test performed by EACirc comprised of the p-values returned by it. As can be seen in Table 7, the bitstreams generated by GBPA and Salsa20 were considered random by the EACirc suite.

Computing Requirements
The GBPA and Salsa20 ciphers were implemented on the 8-bit AVR microcontroller Atmega644p [28] using the Integrated Development Environment (IDE) Atmel Studio 7 [29] of Microchip, with the compiler and linker AVR/GNU Version 5.4.0. With this IDE, the memory requirements and executing cycles of both algorithms were retrieved.
In Figure 2, a comparison of the usage of data and program memory of the algorithms is presented. The information was returned by the compiler and linker AVR/GNU, and no dynamic memory allocation was done in the implementations. The number of processor cycles required to execute the algorithms was obtained using the Microchip AVR MCU Simulator. This tool is considered accurate because it uses models based on the register-transfer level (RTL) code used in the making of the actual microcontroller [30]. As indicated in Sections 2 and 3, the output size of Salsa20 was 512 bits, and the output of GBPA was 128 bits.
Both algorithms had to produce the same output size to compare their number of cycles, so GBPA was executed four times. Figure 3 presents the comparison of cycles necessary to generate an output of 512 bits. When measuring the required processor cycles of an algorithm, the amount of consumed energy by it can be calculated with (12). Table 8 shows the power consumption of the ciphers in a system operating at 2 V, 0.5 mA, low-power mode running at 1 MHz. As can be seen, GBPA consumed 48.4819% less power than Salsa20.
where: CE = consumed energy, W = Watts, f CPU = frequency of the processor clock, PC = processor cycles used by the algorithm. Table 8. Power consumption comparison between Salsa20 and GBPA.
A comparison of the computing requirements of GBPA against lightweight block ciphers is presented in Table 9. The requirements were from implementations of the algorithms on 8-bit AVR microcontrollers in C language. The memory footprint and processor cycles of the HIGHT, RC5 and Skipjack ciphers were obtained from [31] and of the PRESENT cipher from [32].
As shown in Table 9, the GBPA cipher had lower program and data memory requirements than all the ciphers and used fewer executing cycles than HIGHT, RC5 and Skipjack. PRESENT used 48.5699% fewer executing cycles than GBPA; however, it used 58.7526% more program memory and 92.1875% more data memory. Fewer executing cycles means a higher-throughput; however, this was being achieved at the cost of a higher requirement of memory, a limited resource on IoT devices. A low memory requirement allows a device to be implemented as small and low-cost, both of them essential characteristics of IoT because it enables ubiquitous computing. GBPA had lower program and data memory requirements and a proper throughput when comparing it with Salsa20 and the rest of the lightweight ciphers, making it appropriate for the restrictions and characteristics of IoT devices.

Discussion
As mentioned above, to be part of the eSTREAM portfolio, Salsa20 was under evaluation for four years (2004 through 2008), and no possible attack on it was detected [15]; weaknesses were found only when using a reduced number of rounds [33][34][35]; an analysis of the security of the cipher can be seen in [36]. The GBPA cipher uses Salsa20's core function quarterround, but with fewer parameters to make its use possible on resource-constrained devices. Evaluation of the randomness and unpredictability was performed on its output to evaluate that the modifications were secure. As explained in [18], the xor operation returns a random output when at least one of its inputs is random, and the random part in a stream cipher is the keystream generated by the algorithm. Three statistical test suites were used to evaluate the keystream: the first one was STS, which was designed to evaluate bitstreams for cryptographic applications; the second one was DIEHARD, a well-known test battery; and EACirc was the third one. As mentioned before, there is a documented case were the STS and DIEHARD test batteries were not able to detect non-randomness in a sequence, while EACirc was capable of such; consequently, the EACirc suite of empirical tests was also used.
The mentioned test batteries were used to evaluate a 1.024 × 10 9 bitstream. When dividing the bitstream into 1024 sequences and inputting them into STS, 98.9136% passed the tests; some sequences did not pass some them, but the proportion that passed was within the confidence interval. There are expected to be generated random sequences that do not pass some tests because if they do, this means that its generator is not capable of generating any sequence and its output is not uniform. As shown in Table 10, compared with Salsa20, there was a decrease of 0.0499% of sequences that passed the tests, which does not seem significant. When evaluating the bitstream with DIEHARD, a p-value of 0.262431 was obtained, which shows that there is no evidence of the bitstream not being random, that is the test battery could not distinguish it from a truly random sequence. the EACirc suite was also used for the evaluation of the output of GBPA, and a p-value of 0.515966 was returned by it, showing the acceptance of the null hypothesis. Both Salsa20 and GBPA sequences were accepted as random by DIEHARD and EACirc, but GBPA obtained a higher p-value, which means lower evidence against the null hypothesis that the sequence is random; the difference in p-values is presented in Table 10. The results of STS, DIEHARD and EACirc show that the keystream generated by GBPA is indistinguishable from random, so after performing the xor with a message to cipher, the output will also be indistinguishable from random. Regarding its computing requirements, GBPA uses little more than 1.5 KB of program memory and only 20 bytes of data memory; therefore, it would not have an impact on memory usage on the system. Table 11 shows how many more resources the algorithms Salsa20, HIGHT, PRESENT, RC5 and Skipjack use compared to GBPA. As can be seen, GBPA uses less memory and processor cycles than the compared algorithms, except for the PRESENT cipher, which uses fewer processor cycles. As can also be seen, this is achieved at the cost of more memory usage. Considering that IoT devices have limited memory and that it has to be divided between the application and network services, besides the security services, the higher memory requirement by the cipher can have a high impact on the device, or its use might not be feasible. GBPA has lower memory usage and proper processor time usage when comparing it with Salsa20 and lightweight ciphers. The proposed algorithm is not intended to replace Salsa20 or traditional ciphers; it is recommended to use them when the device's resources allow it. Instead, a security solution is being provided to the many devices that could not afford those algorithms, such as small smart sensors and other low-cost IoT devices.

Conclusions
IoT is a promising technology that could bring a significant improvement to our daily lives, from making our lives more comfortable to having a better response to emergency situations. Even though it has many advantages, its characteristics make it not only very vulnerable to attacks, but also, such attacks can have severe consequences in the system and even for the user. Encryption is a security primitive that could prevent some of the attacks, but because of the limited resources in IoT devices of Class 0, the use of traditional security algorithms is not viable. In this document, the stream cipher GBPA designed for IoT has been presented. The algorithm is based on the Salsa20 cipher, which was under evaluation by the international community, who decided that it was secure enough to be part of the eSTREAM project. GBPA uses Salsa20's core function, but with fewer input parameters, a smaller memory footprint and lower processor-time usage. The GBPA cipher has a small output size; this is appropriate for IoT devices because the information sent by them is usually contextual data in the form of short-length packets; thus, no unnecessary memory and processor usage is performed. When comparing the computing requirements of GBPA against lightweight ciphers, GBPA resulted in lower program and data memory usage. The low program and data memory and processor-time usage allow low-cost resource-constrained IoT devices of Class 0 to be able to implement a security service that they are not currently capable of, to protect the user's data privacy, the user's welfare and the system from attacks that could damage or disrupt its functionality.
The randomness of the output of the cipher was evaluated using three statistical test suites: STS, which was designed to assess pseudo-random numbers for cryptographic applications, the well-known DIEHARD test battery and EACirc, an empirical test suite, which in some cases can provide better results than the previous two. With the three of them, good results were obtained. The tests were also applied to Salsa20 for reference, and no significant difference between the results of the two algorithms was found.
As future work, a modification of the algorithm is planned to make it support two key sizes, the current 96-bit key and also a 128-bit key; the latter for devices with more computing resources and security requirements.