^{1}

^{*}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Networks are evolving toward a ubiquitous model in which heterogeneous devices are interconnected. Cryptographic algorithms are required for developing security solutions that protect network activity. However, the computational and energy limitations of network devices jeopardize the actual implementation of such mechanisms. In this paper, we perform a wide analysis on the expenses of launching symmetric and asymmetric cryptographic algorithms, hash chain functions, elliptic curves cryptography and pairing based cryptography on personal agendas, and compare them with the costs of basic operating system functions. Results show that although cryptographic power costs are high and such operations shall be restricted in time, they are not the main limiting factor of the autonomy of a device.

The technological cost reduction of network devices has lead to the expansion of personal portable computers and the appearance of new types of networks. Heterogeneous ubiquitous networks formed by small and constrained devices are actively researched or already in use, and security mechanisms have to be provided to protect them from malicious attacks. The main nature of ubiquitous networks (providing easy network interconnections anytime and anywhere) makes the introduction of measures to ensure the correct operation of the protocols is very necessary at all layers; from networking operations (routing and network management) to collaborative enforcing protocols or privacy protecting mechanisms. Therefore, it is important to know the real cost of cryptography in small devices and design client protocols that suit this context.

Some research works have studied the performance of cryptographic algorithms in small and constrained devices as Personal Digital Assistants (PDAs). However, none of them provides a global view of the impact of cryptographic techniques in the system, nor can be inferred since tests do not share any common ground. Some works focus on a few types of cryptosystems [

In this paper we present a study of different cryptographic techniques in embedded devices based on ARM architectures, measuring the computational ability to process cryptographic functions and the battery power consumption. We are interested in a broad vision of the problem, how security affects the autonomy and functioning of a device with the goal to design efficient secure communication protocols. The primary security needs of network protocols are to provide authenticity and integrity. Thus, we center our study on block ciphers (because MACs can be constructed from block ciphers), hash functions, and public key digital signature algorithms.

We conducted benchmarking tests for the most used algorithms nowadays and the ones recommended by international organizations and projects (NIST, NESSIE, CRYPTREC). The chosen block cipher algorithms are Data Encryption Standard (DES), Triple DES (3DES) and Advanced Encryption Standard (AES). The tested hash functions are Message Digest 5 (MD5), Secure Hash Algorithm 1 (SHA-1) and Secure Hash Algorithm 2 (SHA-2). In public key cryptography, we have tested Rivest Shamir Adleman (RSA), Digital Signature Algorithm (DSA), Elliptic Curve Digital Signature Algorithm (ECDSA) and emerging algorithms based on pairings (Boneh-Lynn-Shacham (BLS) and Boneh-Boyen (BB)). We have used the procedures of OpenSSL-0.98d C library to program the tests, except for the pairing schemes, in which we have used the PBC_sig-0.0.2 library from Stanford University.

The rest of the paper is organized as follows. We first describe the testing environment and measurement system. Then, in Section 3, we present the basic costs of the system and the performance of chosen cryptosystems. Finally, in Section 4 we summarize the most relevant results.

Benchmarks tests were launched on two actual PDAs, a Compaq iPAQ3970 and an HP Hx2790. HP Hx2790 is more powerful than the other so that we can generalize how further technology improvements can modulate the results. Compaq iPAQ3970 uses an Intel XScale PXA250 processor at 400 MHz. It runs Linux Familiar operating system. On the other hand, HP Hx2790 uses an Intel XScale PXA270 processor at 624 MHz. It runs Windows Mobile operating system. Specifications of both platforms are detailed in

Tests were conducted with the support of a PC, a DELL-DCNE with a 2.8 GHz Intel processor and 512 MB of RAM, and which runs Ubuntu Linux. Tests and required libraries have been built in the PC using cross-compilers (see

The performance study has evaluated two aspects of different PDA processes: temporal cost and consumed energy. Time delays have been measured through the implementation of temporal monitors in test applications. We have used the function

In this section we present benchmark test results for various PDA functions. On one hand we measured the basic power costs of the PDA,

The cost of the basic operating system functions was measured with the default operation system applications on, the serial port connectivity enabled, all network interfaces disabled, and without running any end-user application. Then, we evaluated the screen consumption by switching on the display and computing the introduced overhead. Eventually, we analyzed the network interface costs (we only evaluated these costs for the Hx2790 device, since our iPAQ3970 is not equipped with this interface).

The most contributing factor in the basic power costs is the wireless network interface. The network card consumption in idle state and configured with a transmission power of 15 dBm (∼32 mW) and a sensitivity of –80 dBm is 792.68 mW. When using the communication channel, data transmission costs 10.40

The next contributor to basic power costs is the screen, which is usually switched on if the user is working with the PDA. In iPAQ3970, the use of the screen at maximum luminescence costs 406.58 mW, while in Hx2790 this power is 501.06 mW. This cost can be reduced if the screen brightness is set at low levels. The consumption rate is linearly proportional to the glow.

Finally, the basic operating system functions use up 375.64 mW in iPAQ3970, and 213.28 mW in Hx2790. The differences between the two devices are due to the processor, operating system, and battery.

The exposed results show that the power overhead for executing cryptographic algorithms is higher in the newest and fastest PDA: Hx2790. Nevertheless, this does not mean that this machine is less efficient for cryptography, as we will see in next sections. Hx2790 can better exploit its resources when necessary so that operations are fast.

Moreover, we studied whether the battery charge affects the performance of a device, and we confirmed that it does. The process time to execute an algorithm when the battery drain is at 25% is the same that in full charge. Still, the instant current increases and so also the total spent energy. In average, the costs induced when the battery is at 25% are around 16% higher.

We initiate the analysis of security performance in a PDA with the simplest cryptographic operations: symmetric ciphers and hash functions. Benchmark tests were conducted using eight different plaintexts of size 20 B, 1 KB, 10 KB, 100 KB, 250 KB, 500 KB, 750 KB and 1 MB. For each plaintext file the test was repeated from 500 times for big files, up to 50,000 for small ones, with the aim that the uncertainty of the results is less than 0.01%.

Regarding symmetric cryptography, we analyzed the resources employed by DES, 3DES and AES to encrypt and decrypt a file using the Electronic Code Book (ECB) cipher mode, in which each plaintext block is encrypted in turn with the block cipher. ECB is the easiest of the cipher modes because there are no dependencies between ciphered blocks. We take this mode for our analysis as a reference point, because of its simplicity and its widespread use.

Since in ECB the plaintext is cut in blocks of a predefined size and these blocks are sequentially encrypted by the processor, we have modelled the consumed energy and time of symmetric encryption with a linear equation:

The key setup phase is common to all block ciphers and it is performed before the encryption/decryption. This phase consists in expanding the input key in order to derive a distinct and cryptographically strong key. Then, the ciphering proceeds through a repeated sequence of mathematical computations over input blocks of data. The results demonstrate the model is correct because obtained regression lines present a determination coefficient ^{2} above 99.9%.

Results are better in the Hx2790 device, which is faster. While the clock frequency of the Hx2790 is 1.56 times faster than the one of iPAQ3970, cryptographic performance has improved more than threefold. This is because not only the core processor speed is greater but also is the system bus, and what is more, the processor has an internal memory for optimizing the performance. Furthermore, the increase of performance when using a faster PDA is demonstrated to be greater for AES algorithms than for DES or 3DES. The implementation of AES can take profit of the major resources of a device.

In any case, we notice that AES algorithms perform much better than DES and 3DES in both ARM processors. The required time to cipher or decipher a DES file is even worse than the time required for doing it using AES with keys of 256 bits, and the security level of this last one is stronger. Nowadays DES algorithm is considered weak, not because it is vulnerable to security flaws, but because the keys are so small that they may be discovered by brute force attacks.

From the time cost expressions, it is remarkable that the required initialization time for DES and 3DES algorithms is greater than that of AES (in fact, in Hx2790 the initialization cost of most of the algorithms is is smaller than 1

In the same way that temporal costs,

One issue of AES is that decryption does not perform exactly the same steps than encryption, and so, the code is partially different. In [

Actually, depending on the implementation, AES encryption and decryption differences can be more significant in one sense or another. In OpenSSL, AES is accelerated via a 10 KB lookup table implementation. The OpenSSL compilation for Linux Familiar takes full profit of this design and the effect is that a decryption process in the iPAQ3970 is even faster than encryption (20% faster). In the same way, the consumed energy is in general lower, except when using small input files (up to 100 KB). Yet, in the Windows mobile platform the results are not quite the same. The throughput of the encryption and decryption operations in the Hx2790 is more or less the same for both cases. As for the energy, the encryption better results are very significant, but the differences between ciphering and deciphering keep coming down as the size of the input text grows. This means that deciphering has some complex operations that do not depend on the size of the input text and that become dissolved with the rest of computations as the size of the ciphertext grows. The differences in AES encryption and decryption processes lead to slightly different results in throughput and energy consumption. Nevertheless, data presented in

In [

Tests in [

In [

Our AES results (AES-256: 1035.29 KB/s in iPAQ3970 and 4724.55 KB/s in Hx2790) are much better than [

Finally, we compare the results with [

Regarding energy results, they are in accordance with other studies, like [

We conducted benchmark tests for hash functions. A hash function is a one-way collision resistant function that compresses an arbitrarily large message into some information of fixed length. The construction of a hash function consists of two elements: a compression function that maps a fixed length input to a fixed length output, and a domain extender, that allows us to apply the compression function to inputs of variable sizes. Compression functions are usually designed using block ciphers. On the other hand, most hash functions have a domain extender that is constructed using an iterative structure, which is based on Merkle-Damgård hash construction. Since hash algorithms work in data blocks of fixed size, we assume temporal and energetic costs have a linear dependency with the input message size. Results confirm this assumption and also that the initialization phase of the algorithm is negligible.

The faster device, Hx2790, clearly outperforms the other and, as in the symmetric cryptography case, the improvements are greater in the temporal domain than in the energetic one. The algorithms of the SHA-2 family are the ones which benefits more of a faster processor, with differences between Hx2790 and IPAQ 3970 greater than fivefold. Compared with symmetric algorithms results, the simpler hash functions (MD5, SHA-1) are at least two times faster, while the behaviour of SHA-2 functions is similar to the AES cryptosystems.

For each device, the energy cost relation between all the algorithms is quite similar to what happened in temporal costs. However, the main difference is that MD5 and SHA-1 are more energy efficient in iPAQ3970 than in Hx2790. This is because these algorithms are simple and can be executed with the same number of instructions in both devices, and executing an instruction requires less energy in iPAQ3970 than in Hx2790.

The evaluated public key algorithms are RSA, DSA, Elliptic Curve Cryptography (ECC) and Pairings.

We have evaluated RSA and DSA using keys of 512, 1,024 and 2,048 bits. We have set the public exponent of RSA keys to 3 because it provides better performance in signature verifications; this exponent is usually used in constrained environments where a lot of verifications have to take place. The results using other exponents do not differ a lot, signature generations are more efficient and verifications are not so much, but overall, the general performance of the algorithm is quite the same.

For the case of ECC, we have tested the algorithm using three curves, secp112r1, secp160r1, and secp224r1. All curves are over a prime field and their parameters are chosen verifiable at random. The length of the field order of each curve is: 112 bits for the secp112r1, 160 bits for the secp160r1 and 224 bits for the secp224r1.

Regarding Pairings, we have analyzed key generation algorithms for two digital signature schemes: BLS signatures from [_{1} and _{2} is 160 bits, and the elements of the groups take 512 bits to represent.

RSA and DSA key generation involves primality testing, which is an expensive operation. Besides, prime tests are probabilistic, which means that the execution times are not always the same and occasionally can be very long. The required time to find proper primes increases polynomially with the size of the searched prime.

The performance for ECC key generation is very good as it only involves one scalar point multiplication; that is, it multiplies a scalar

Finally, the key generation in pairing based schemes is also quite efficient because it is based on modular exponentiations over a prime field. The key generation for BLS only requires one exponentiation, and the one for BB signature requires two exponentiations, and one bilinear pairing. The results show that key generation in BB is about three times greater than in BLS.

Several papers (

Regarding energy costs, Potlapally

Potlapally

We have evaluated the digital signature algorithms RSA, DSA, ECDSA, BLS and BB using the keys generated in previous Section 3.4. The following results do not include hashing operations, except for the BB case where the hashing is intrinsically embedded in the signature and verification process. Thus, for the RSA case, signing is equivalent to deciphering, and verifying is equivalent to ciphering.

ECDSA key generation algorithm was the best of all key generation candidates, but

Furthermore, ECDSA performs better than RSA and DSA when incrementing the security level. In average, RSA operations using 512 bit keys are about 4 times faster than using 1,024 bits keys. In DSA this ratio is around 3. Besides, this ratio increases when the length of the RSA and DSA keys gets longer. In contrast, the ratio for ECDSA keys of 112 and 160 bits is less than twofold and shrinks when increasing the security level.

Finally, the temporal costs of pairing based signatures are quite high compared with the others. The advantages of these schemes are for one hand that the signatures are very short, so their transmission is quite easy. The other is that they support multisignature and batch signature verification schemes that can reduce the overhead and provide competitive time results.

Moreover, pairing based operations can be optimized in hardware implementations thus reducing the overhead of BLS and BB signatures. The pbl_sig library we are using implements Tate pairing calculations based on Miller's algorithm, from [

There are some other research works that have studied the performance of digital signature algorithms in constrained devices. [

On the other hand, they also launched ECDSA benchmarks with a self implementation library of ECC that uses general elliptic curves over binary fields. They state their library is on average 5 times faster than OpenSSL. Results for 163-bits ECDSA were 5.7 ms for signature generation, and 17.9 ms for signature verification. These results are clearly better that those we obtained with OpenSSL 0.98 d libraries. In an iPAQ3970 device, we get for a ECDSA-160 signature generation 47.78 ms, and the signature verification takes 57.92 ms.

Gupta

RSA, DSA and ECDSA use the SHA-1 hash algorithm (costs described in Section 3.3).

BLS signatures use a hash algorithm over bilinear maps and a compression-decompression algorithm. The time required to execute this hash algorithm is (105.08 + 0.03 ·

In BB short signatures the hash of the message is embedded in the signature process.

The column labelled with a Δ shows the performance increase of the algorithm as the length in

Comparing with the throughput of symmetric algorithms (see

Results of other studies are coherent with ours in the sense that the relationship between the performances of the algorithms in each device is quite similar to what we have obtained in our target PDAs. However, in absolute values, results are very different due to the use of different processors. Potlapally

The most efficient signature algorithm is the BB short signature, closely followed by DSA and RSA algorithms using keys of 512 bits. The least energy-consuming verification algorithms are RSA using keys of 512 and 1,024 bits.

Comparing with the energy costs of symmetric algorithms (see

In this paper we have analyzed the performance of different cryptographic algorithms in PDAs and we have compared it with device's basic costs: operating system, screen, and network interface. This study provides evidence to determine the amount of overhead that a security protocol can introduce in a system.

The conclusions are the following:

The best symmetric algorithm for ARM processors, both regarding time and energy costs, is AES. The throughput of AES-256 is around 25% smaller than AES-128. Both are more efficient than DES. Besides, AES has a greater security level than DES or 3DES.

Hash algorithms present a similar throughput as symmetric algorithms. MD5 is the fastest of them, although it presents some collisions and its use is not recommended.

In public key cryptography, key generation is in general very complex and demanding. DSA presents the worst results, while ECC has very good performance.

The results of digital signature functions are quite different depending on the algorithm. RSA-512 is the most efficient, nearly followed by DSA-512. In the next security level RSA-1204 is the lightest of all, with a performance around 75% slower than RSA-512. This difference is more pronounced in slower processors. It is also remarkable that RSA is very light especially in verifications. The average cost of signing and verifying in DSA-1024 and ECDSA-112 is quite similar than that of RSA-1024, however they have better results in signature generation. In the top security level, the best algorithm is by far ECDSA-224.

The global costs of signing and verifying a message involve the expenses of hash operations or some compression algorithms. In this sense, it is remarkable that BB pairing based signatures are more efficient (above all in time) than the others when working with medium and large files.

The drain on the battery sets the energy expenses of the device. The consumption of running cryptographic algorithms when the batteries are low charged is around 16% higher than when they are full.

The use of cryptographic algorithms in network protocols, specially multi-hop protocols, introduces an important overhead due to the network interface costs during waiting times. The problem is not so in computationally cryptographic costs, but the total protocol completion time that involves notorious energy consumption. Thus, security protocols shall be designed to reduce delay times as much as possible, for example applying appropriate scheduling techniques [

The main contribution of this paper is that it presents a benchmark of a wide range of algorithms and a consistent comparison between them. Results can be used to estimate the costs of network security protocols, design them appropriately and evaluate them.

Average energy consumption (mW) of a PDA.

Throughput (represented with bars) and Energy Performance (represented with dots) of block ciphers.

Throughput (represented with bars) and Energy Performance (represented with dots) of hash algorithms.

Temporal costs (represented with bars) and Energy costs (represented with dots) of asymmetric key generation algorithms.

Temporal costs of digital signature algorithms.

Energy costs of digital signature algorithms.

PDA Specifications.

Intel XScale PXA250 400 MHz | Intel XScale PXA270 624 MHz | |

3,8″ TFT (240 × 320) | 3,5″ TFT (240 × 320) | |

LinuxFam. 0.8.4 (k 2.4.19) | Microsoft WinMobile 5.0 | |

48 MB | 320 MB | |

64 MB | 64 MB | |

– | 256 KB | |

64 KB | 64 KB | |

100 MHz | 208 MHz | |

1,400 mAh Lithium Ion | 1,440 mAh Lithium Ion | |

– | Power = 15 dBm |

Libraries.

arm-linux | arm-wince-cegcc | |

lib-gmp-4.2.1-arm-linux | lib-gmp-4.2.1-arm-wince | |

lib-pbc-0.4.7-arm-linux | lib-pbc-0.4.7-arm-wince | |

lib-pbc_sig-0.0.2.arm-linux | lib-pbc_sig-0.0.2-arm-wince |

Temporal costs of symmetric algorithms in a PDA (ms).

DES (64) | 0.24 + 1.04x | 0.27x | 3.83 |

3DES (192) | 0.90 + 2.66x | 0.02 + 0.73x | 3.66 |

AES (128) | 0.02 + 0.71x | 0.16x | 4.44 |

AES (256) | 0.03 + 0.97x | 0.21x | 4.56 |

Energy costs of symmetric algorithms in a PDA (mJ).

DES (64) | 0.35 + 0.64x | 0.01 + 0.32x | 2.01 |

3DES (192) | 0.56 + 1.66x | 0.03 + 0.85x | 1.95 |

AES (128) | 0.07 + 0.46x | 0.19x | 2.38 |

AES (256) | 0.07 + 0.64.x | 0.27x | 2.35 |

Security level of cryptographic algorithms.

48 | 96 | 480 |

56 | 112 | 640 |

80 | 160 | 1248 |

112 | 224 | 2048 |

128 | 256 | 3248 |

256 | 512 | 15424 |

Digital Signature Performance (MB/s) in Hx2790.

Δ | MB/s (files 1 K) | MB/s (files 1 M) | Δ | MB/s (files 1 K) | MB/s (files 1 M) | |
---|---|---|---|---|---|---|

1.33 · 10^{−2} |
1.50 · 10^{−1} |
13.73 | 1.38 · 10^{−2} |
5.33 · 10^{−1} |
14.67 | |

1.08 · 10^{−2} |
4.05 · 10^{−2} |
11.06 | 1.38 · 10^{−2} |
3.51 · 10^{−1} |
14.47 | |

5.79 · 10^{−3} |
9.56 · 10^{−3} |
5.94 | 1.35 · 10^{−2} |
1.84 · 10^{−1} |
13.96 | |

1.36 · 10^{−2} |
2.22 · 10^{−1} |
14.14 | 1.36 · 10^{−2} |
2.05 · 10^{−1} |
14.07 | |

1.25 · 10^{−2} |
8.62 · 10^{−2} |
12.88 | 1.22 · 10^{−2} |
7.47 · 10^{−2} |
12.60 | |

8.24 · 10^{−3} |
1.88 · 10^{−2} |
8.45 | 7.41 · 10^{−3} |
1.49 · 10^{−2} |
7.59 | |

1.07 · 10^{−2} |
3.90 · 10^{−2} |
10.94 | 1.06 · 10^{−2} |
3.87 · 10^{−2} |
10.92 | |

9.89 · 10^{−3} |
3.03 · 10^{−2} |
10.14 | 9.52 · 10^{−3} |
2.71 · 10^{−2} |
9.76 | |

9.32 · 10^{−3} |
2.55 · 10^{−2} |
9.56 | 8.47 · 10^{−3} |
2.00 · 10^{−2} |
8.69 | |

4.81 · 10^{−3} |
5.90 · 10^{−3} |
4.92 | 3.84 · 10^{−3} |
4.51 · 10^{−3} |
3.94 | |

1.98 · 10^{−2} |
1.98 · 10^{−2} |
20.24 | 6.11 · 10^{−3} |
6.11 · 10^{−3} |
6.25 |

Energy costs (MB/J) of digital signature in Hx2790.

Δ | MB/J (files 1 K) | MB/J (files 1 M) | Δ | MB/J (files 1 K) | MB/J (files 1 M) | |
---|---|---|---|---|---|---|

1.05 · 10^{−2} |
1.27 · 10^{−1} |
10.87 | 1.09 · 10^{−2} |
4.52 · 10^{−1} |
11.56 | |

8.82 ·10^{−3} |
3.76 · 10^{−2} |
9.06 | 1.09 · 10^{−2} |
2.95 · 10^{−1} |
11.40 | |

5.56 · 10^{−3} |
1.07 · 10^{−2} |
5.70 | 1.08 · 10^{−2} |
2.18 ·10^{−1} |
11.25 | |

1.07 · 10^{−2} |
1.63 · 10^{−1} |
11.07 | 1.07 · 10^{−2} |
1.67 · 10^{−1} |
11.09 | |

9.93 · 10^{−}^{3} |
7.30 · 10^{−2} |
10.23 | 9.51 · 10^{−3} |
5.46 · 10^{−2} |
9.78 | |

7.73 · 10^{−}^{3} |
2.34 · 10^{−2} |
7.93 | 7.41 · 10^{−3} |
2.07 · 10^{−2} |
7.61 | |

8.65 · 10^{−}^{3} |
3.47 · 10^{−2} |
8.89 | 8.66 · 10^{−3} |
3.48 · 10^{−2} |
8.89 | |

8.32 · 10^{−}^{3} |
2.98 · 10^{−2} |
8.54 | 7.90 · 10^{−3} |
2.51 · 10^{−2} |
8.11 | |

8.14 · 10^{−}^{3} |
2.76 · 10^{−2} |
8.35 | 7.69 · 10^{−3} |
2.31 · 10^{−2} |
7.89 | |

3.81 · 10^{−}^{3} |
4.41 · 10^{−3} |
3.90 | 3.07 · 10^{−3} |
3.44 · 10^{−3} |
3.14 | |

1.73 · 10^{−}^{2} |
1.73 · 10^{−2} |
17.71 | 5.13 · 10^{−3} |
5.13 · 10^{−3} |
5.25 |

This work is partially supported by the Spanish Ministry of Science and Innovation and the FEDER funds under the grants TSI-020100-2009-374 SAT2, TSI2007-65406-C03-03 E-AEGIS and CONSOLIDER CSD2007-00004 ARES.