Efﬁcient Implementation of a Crypto Library Using Web Assembly

: We implement a cryptographic library using Web Assembly. Web Assembly is expected to show better performance than Javascript. The proposed library provides comprehensive algorithm sets including revised CHAM, Hash Message Authentication Code (HMAC), and ECDH using the NIST P-256 curve to provide conﬁdentiality, data authentication, and key agreement functions. To optimize the performance of revised CHAM in the proposed library, we apply an existing method that is a four-round combining method and additionally propose the precomputation method to CHAM-64/128. The proposed revised CHAM showed an approximate 2.06 times (CHAM-64/128), approximate 2.13 times (CHAM-128/128), and approximate 2.63 times (CHAM-128/256) performance improvement in Web Assembly compared to JavaScript. In addition, CHAM-64/128 applying the precomputation method showed an improved performance by approximately 1.2 times more than the existing CHAM-64/128. For the ECDH using P-256 curve, the naive implementation of ECDH is vulnerable to side-channel attacks (SCA), e.g., simple power analysis (SPA), and timing analysis (TA). Thus, we apply an SPA and TA resistant scalar multiplication method, which is a core operation in ECDH. We present atomic block-based scalar multiplication by revising the previous work. Existing atomic blocks show a performance overhead of 55%, 23%, and 37%, but atomic blocks proposed to use only P = ( X , Y , Z ) show 18%, 6%, and 11% performance overhead. The proposed Web Assembly-based crypto library provides enhanced performance and resistance against SCA thus, it can be used in various web-based applications. order of *, +, − , + regardless of 1 or 0. We improved the atomic block assuming that scalar multiplication is performed using only the basis point P = ( X , Y , Z ) . Change is completed from the existing atomic blocks *, +, − , + to *, +, − . Thus, 10 and 16 addition operations were reduced in ECDBL and ECADD , respectively, compared to the existing atomic block. We apply wNAF and improved atomic block to the ECDH algorithm of the proposed crypto library.


Introduction
Recently, various types of Internet technology services, e.g., personal and business services, are provided to users via web-based applications due to the accessibility of the web. Typically, web-based applications comprised of servers and clients, and private information, e.g., private user data and passwords, are exchanged between clients and servers. Data transmitted in plaintext form are vulnerable to attackers thus, it is necessary to provide cryptographic operations to protect private data and build secure web-based services. In other words, data confidentiality, data authentication, and key establishment functions must be provided to develop secure web-based services [1]. JavaScript is a cross-platform script programming language that is used in various fields, e.g., server-side network programming, databases, and the Internet of Things (IoT) [2]. JavaScript is used in web browsers to display web sites and can be accessed from another application's built-in objects. However, JavaScript is an interpreted language and is relatively slower than native languages such as, e.g., C. In addition, it does not support the mathematical operations required for cryptographic Web Assembly and JavaScript are implemented algorithms, executed in Web browsers such as Chrome, Firefox, and Microsoft Edge respectively to measure performance. Following performance improvements that have been achieved in the order of Chrome, Firefox, and Microsoft Edge. In case of block cipher, 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256 shows performance improvement. CHAM-64/128 with applied pre-computation method shows a performance improvement of 1.2 times than not applied to the algorithm in three web browsers. For the key exchange algorithm, wN AF was applied to P-256. The atomic block method, which is an algorithm corresponding against SPA and TA, was also applied. When applying the existing atomic block and the proposed atomic block to wN AF, we check how much performance overhead appears than the original wN AF due to the increased number of operations, and how much the proposed atomic block is improved over the existing atomic block. For this purpose, each algorithm implemented in Web Assembly and JavaScript was measured in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly improved more than JavaScript, for the original wN AF by respectively 11, 12, and 11 times, the existing atomic block wN AF by respectively 10, 10, and 14 times, and the proposed wN AF by respectively 11, 12, and 14 times. Existing atomic block wN AF shows a performance overhead of 55%, 23%, and 37% compared to the original wN AF. However, the atomic block wNAF proposed to use only P = (X, Y, Z), showing performance overheads of 18%, 6%, and 11%. The message authentication code is HMAC that uses SHA-256 to create a MAC. As a result of measurement, Web Assembly showed a higher performance over JavaScript by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HMAC.

Contribution
In this section, we propose the contributions of this paper.

1.
First implementation of a crypto library using Web Assembly Recently, web-based applications with various functions are being made in the cross-platform language JavaScript. Web-based applications require confidentiality, integrity, and key exchange algorithms to send and receive data. Cryptographic algorithms are made in JavaScript for use in web-based applications. However, JavaScript is a heavy language and the nature of JavaScript operations has disadvantages in implementing cryptographic algorithms that require many mathematical operations. Therefore, Web Assembly was created due to the need for a performance similar to that of low-level languages in the web environment. In this paper, we propose to build a crypto library with cryptographic algorithms implemented using Web Assembly to implement data security and faster cryptographic algorithms in web-based applications. The proposed crypto library includes the block cipher CHAM family, the message authentication code HMAC, and the key exchange algorithm ECDH. For each cryptographic algorithm, the code implemented by Web Assembly shows a better performance than JavaScript. Our implementations are measured in currently popular Web browsers such as Chrome, Firefox, and Microsoft Edge. As a result of the measurement, on average, the CHAM family improved in speed by about 2.2 times, HMAC by about 7.1 times, and ECDH scalar multiplication improved by 12.3 times.

Optimized implementations of a crypto library on Web Assembly
Since web-based applications exchange data with various environments, encryption is an essential function to send data confidentially. However, due to the advancement of technology and various environments and communication, the amount of data exchanged has also increased. Since the data to be communicated is encrypted in order, it is necessary to optimize for the environment in which the algorithm is used in order to encrypt quickly. The block cipher, a component of our proposed cipher library, is chosen as belonging to the CHAM family. However, the original CHAM algorithm is vulnerable to differential attacks. Therefore, CHAM-64/128, CHAM-128/128, and CHAM-128/256 use the revised CHAM algorithm which increases the number of rounds from 80 to 88, 80 to 112, and 96 to 120, respectively. In the revised CHAM algorithm, there is a process of changing the place of the word constituting the input value for each round. For a faster encryption operation, we apply a 4-round combining method, which is an existing method, to eliminate the process of changing the word position to perform a flexible operation. Additionally, we propose a pre-computation method for faster operation in CHAM-64/128. The method we propose applies to the internal functions ROL1, ROL8, and Keyschedule functions of CHAM-64/128. ROL1 and ROL8 are operations that shift the input value by 1, 8-bit Rotation Left Shift, and KeySchedule is a round key generation function. The input values of the three functions are 16-bit, which is a method of storing and using the result values from 0 × 0 to 0 × f f f f after pre-calculation. Thus, in the encryption process, the previously calculated values are simply taken and used. As a result, the performance was improved about 1.2 times compared to when the pre-computation method was not applied in Chrome, Firefox, and Microsoft Edge.

3.
Providing improved method that resists side channel attacks Until now, there have not been many studies of side-channel analysis on the web environment.
In particular, a secure key exchange protocol should be applied to provide a secure communication protocol in a web environment. ECDH is used as the key exchange algorithm. There is scalar multiplication, which is the main operation of ECDH. However, since the scalar multiplication process performs the ECDBL, ECADD operation when the value of 1-bit of the scalar integer is 1, and the ECDBL operation when it is 0, it is possible to attack the scalar value because each bit is classified during an attack. We propose a secure key exchange protocol that is applied by improving the previously studied atomic block to cope with TA and SPA, which are vulnerable to side channel analysis attacks in the web environment. Existing atomic blocks consist of *, +, −, and + in one block. Fake operations are added to the main operations of scalar multiplication, ECDBL and ECADD, and are configured to operate in the order of *, +, −, +. Therefore, it becomes difficult to distinguish because 1-bit values are calculated in the order of *, +, −, + regardless of 1 or 0. We change the existing atomic block to *, +,− and make it into one block. Thus, we reduced 10 and 16 addition operations in ECDBL and ECADD, respectively. The method we are suggesting is a method used only with P = (X, Y, Z). In addition, we calculate by applying wN AF and a proposed atomic block to P-256 for efficient scalar multiplication. The implemented algorithms measured the results in web browsers Chrome, Firefox, and Microsoft Edge. As a result of the measurement, compared to the original wN AF, wN AF applied with an existing atomic block shows a performance overhead of about 33%, and wN AF with the proposed atomic block shows a performance overhead of about 11%. As a result, the proposed atomic block, compared to the existing method, reduced the performance overhead by 1 3 .
The remainder of this paper is organized as follows. Section 2 provides a basic overview of the web environment, Web Assembly's description and conversion process, and the need for a crypto library. Section 3 describes the architecture of the proposed crypto library and target cryptographic algorithm. Section 4 describes related work. Section 5 describes the construction of a crypto library using the proposed cryptographic algorithm. Section 6 describes the performance measurement results. Finally, Section 7 concludes the paper.

Overview of Web Environment
Users frequently make use of web applications and access web services for a long time. There is a variety of web browsers, e.g., Chrome, Firefox and Microsoft Edge, to access the web. Web browsers are created using HTML, CSS, and JavaScript. A web browser uses a rendering engine that works on the content and data of a web page and a JavaScript engine to execute JavaScript code to drive the web browser. Each web browser uses a different rendering engine and a JavaScript engine. For example, Chrome uses Blink as the rendering engine and V8 as the JavaScript engine. Microsoft Edge uses EdgeHTML and Chakra, Firefox uses Gecko and Rhino. Web-based applications view the same content on all devices, e.g., PCs and smartphones. Unlike native applications, web-based applications do not communicate directly with the operating system but run within the browser. Web-based applications can always keep up to date without downloading or upgrading, and operating systems do not require a separate platform, so a standard web language is made. Thus, users can easily access their choice of web using mobile devices, e.g., smartphones. The code in one web page does not affect the code in other pages. No matter which function is executed by the JavaScript code on a web page, other web pages are irrelevant to the result obtained from the previous web page. Due to the development of the web environment and the need for various functions, various libraries are created continuously to enable various functions in the web environment using JavaScript. In addition, web developers can use these JavaScript libraries easily and such libraries can be further modified. This is why JavaScript libraries and the web are constantly evolving.
The web page executes the HTML, CSS, and JavaScript code that makes up the web page, as shown in Figure 1, the rendering engine reads the code, parses the code, and then creates a Document Object Model (DOM) and CSS Object Model (CSSOM) tree. These trees create a render tree, which renders the web page to a web browser. The JavaScript engine handles the operation and program codes. The rendering engine stops working when it encounters JavaScript code. The JavaScript engine reads JavaScript code and creates a tree by parsing. After processing all of the JavaScript code, the rendering engine performs its own tasks again from the process where it stopped and processes the process.

Overview of Web Assembly
JavaScript is primarily used in web-based applications however, the operation speed of JavaScript is significantly slower than that of other native languages. Web-based applications cannot use native languages, e.g., C/C++. With the various content available on the web, the computation of content has become complicated or heavy, and implementing such operations in JavaScript is a disadvantage from a performance perspective. A language is required for the web that can be implemented and operate at a similar level of performance as a native language. Initially, Mozilla announced asm.js however, it has not received much attention due to its performance inefficiency. In addition, asm.js is difficulty to use. The need for native language-level performance in web environments continued, and Web Assembly was created based on asm.js. Web Assembly is in constant development and web browser companies, e.g., Google, Microsoft, and Mozilla, are involved in its development. Web Assembly is not intended to replace JavaScript, but is designed to operate web-based applications efficiently with JavaScript. Web Assembly implements code using languages that can identify existing variable types, e.g., C/C++, Rust, Typescript, Assemblyscript, and Go, and then converts them to Web Assembly using Emscripten. Figure 2 shows the process of converting C/C++ code to Web Assembly code. After writing an algorithm in C/C++, Emscripten enters the C/C++ code into the Clang + LLVM and receives the compilation results to generate the Web Assembly extension (i.e., a WASM file). The WASM file is not immediately accessible to the DOM thus, Emscripten can help print the results of the wasm execution in HTML documents through JavaScript glue code to access the DOM.

Necessity of Crypto Library for Secure Web Application
Web-based applications can easily be accessed by users through various devices, e.g., PCs and smartphones therefore, various users, e.g., companies, institutions, and individuals, are using web-based applications. Many users use web-based applications for information provision, collection, search, or personal work. Web-based applications must show the same data on different platforms thus, web-based applications are created using JavaScript (a cross-platform language). Therefore, users obtain the same information on different platforms. JavaScript is also used in server-side network programming, databases, and the IoT. Due to convenience and various features, web-based applications communicate with various other environments and platforms. This is why web-based application send and receive various data and store them on a server. To ensure the continuous development of web-based applications and data security, a crypto library comprising of cryptographic and authentication algorithms is required. For security, encryption is performed when data are stored on a server and decryption is needed when data are used. In addition, authentication is required to determine whether data transmitted and received during communication are intact. Therefore, to securely communicate with other environments in web-based applications, ensuring confidentiality and integrity is essential.

Design Motivation and Library Architecture
Crypto libraries created using JavaScript make it easy for users of other web environments to obtain and use cipher algorithms, e.g., block ciphers, key agreement, key exchange algorithms, and message authentication. Web-based application developers that use JavaScript enable users to safely use applications by using a crypto library to protect user information, encrypt, and safely store data created by the web-based application, and verify data integrity through message authentication. Even if a 1-bit error occurs, users cannot obtain the correct data thus, when implementing an encryption algorithm, it must be implemented carefully in the operation process.
In the case of JavaScript, data types are not divided into char, short, and int according to bit size like C/C++, and there are no dividing negative and positive numbers, e.g., unsigned and signed. With C/C++, the bit size of the value that can be stored for each data type is determined thus, parts that exceed the bit size are cut automatically when calculating integers, which is useful for parts that require subtraction after computation, e.g., modular addition, in the computation of cryptographic algorithms. It can express negative and positive numbers as unsigned and signed and there are many useful parts in the finite field operation of cryptographic algorithms. However, JavaScript is not divided into data type, unsigned, and signed, so each cryptographic algorithm has different word sizes, and additional operations must be used to obtain the desired result. JavaScript is a heavy language, and it is slower because it requires additional operations when performing the same operations as C/C++. Thus, JavaScript is less efficient when implementing cryptographic algorithms.
Converting existing programming languages, such as C/C++, Rust, etc., to Web Assembly is used via Emscripten to allow them to operate in a web environment. Data types can be divided and operated for each size, and positive and negative numbers can be distinguished, such as unsigned and signed, so that a user can get the desired value without additional operations, unlike JavaScript. For cryptographic algorithms with many mathematical operations, Web Assembly can be implemented and operated faster and more efficiently in a web environment. If users use a Web Assembly-based crypto library when communicating with the web environment and other environments, the web-based application can perform faster computations and encryptions than when using a JavaScript-based crypto library.

Revised CHAM
In ICISC 2017, National Security Research Institute Koo et al. proposed the lightweight CHAM crypto family [10], which is divided into CHAM-64/128, CHAM-128/128, and CHAM-128/256 depending on the parameters. Table 1 shows the CHAM parameters. It also features a stateless on-the-fly key schedule, which reduces key storage space and provides lightweight cryptography with the ARX structure, which is suitable for limited environments. The key scheduling process in CHAM is shown in Figure 3. The ROL1, ROL11, ROL8, and XOR operations generate n/k × 2 round keys. Then, it encrypts all rounds with an n/k × 2 round key. The encryption process comprises of odd and even rounds, and each round function comprises of ROL1, ROL8, and XOR operations, as well as modular addition. After each round, a cyclic left shift is performed in the word unit. The odd and even round encryption process of CHAM is shown in Figure 4.
In ICISC 2019 [5], it was suggested that the original CHAM was vulnerable to differential attacks by discovering the differential characteristics in the reduced round. CHAM-64/128, CHAM-128/128, and CHAM-128/256 found some differential characteristics in rounds 56, 72, and 78, respectively. Thus, for the revised CHAM the numbers of rounds are increased to defend against differential attacks. The revised CHAM-64/128 increases the number of rounds from 80 to 88, the revised CHAM-128/128 increases the number of rounds from 80 to 112, and the revised CHAM-128/256 increases the number of rounds from 96 to 120 rounds. Despite increasing the number of rounds, the revised CHAM showed efficient performance in both software and hardware, and was faster and safer against differential attacks than the lightweight SIMON and SPECK.

Overview of HMAC
Web-based applications send and receive a lot of data in real time. To establish a secure communication environment, it is necessary to authenticate whether a message has been tampered with due to an intermediate attack, or whether the data have been transmitted from the correct user. MAC is used to confirm this and provides message integrity and authentication by generating a MAC by inputting a key shared with each other between the message sender and receiver. Various MAC, e.g., GCM, CCM, and HMAC have been proposed to provide message integrity and authentication. HMAC is classified into HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512 according to the SHA-2 family used in the message compression process [6]. We use HMAC-SHA-256 as the target message authentication code by using SHA-256, which is the most frequently used in the message compression process. The overall process of HMAC-SHA-256 is shown in Figure 5. The MAC value is generated through two SHA-256 processes. IPAD and OPAD repeat 0x36 for IPAD and 0x5c for OPAD as much as the block length of the hash function. First, if the key length is greater than 512-bit, the key value is hashed. The remaining space is padded with zeros to adjust the length of the key to 512-bit. If the length of the key is less than 512 bits, the remaining space is padded with zeros to adjust the length of the key to 512-bit. Then, the input of the hash function is set by applying the XOR operation to each 512-bit IPAD and OPAD and then, the message value for authentication is added after the IPAD and padded XOR result value to form a single message, and a 256-bit hash value is generated through the SHA-256 process. Finally, the generated hash value is pasted after the OPAD and padded K-value XOR result to form a single message, and then set as the input data for SHA-256. Finally, the generated hash value becomes the MAC value for message authentication.

Message
∑ 256 ∑ 256 SHA-256 Padding the Message: The SHA-256 block has a 512-bit size, and the block operation is performed in 32-bit units. The SHA-256 function stores the length of the input data in the last block 64-bit. Therefore, the padding process must be included in the SHA-2 family for storing the message length, the padding process is summarized as follows.
-Padding process Step 0 Let l is the length of the message; Step 1 Append the bit "1" to the end of the message; Step 2 Followed by k zero bits, where k is the smallest, non-negative solution to the equation l + 1 + k = 448 mod 512; Step 3 Then append the 64-bit that is equal to the message length l expressed using a binary representation.
Padding can be inserted before hash computation begins on a message or any other time during the hash computation prior to processing the block(s) that will contain the padding [15].
SHA-256 Message Compression: The block operation in SHA-256 repeats the same process for 64 rounds. In the block operation, each round uses padded message data. Thus, SHA-256 must expand the data using message padding, this process is the message expansion process. Algorithm 1 shows the pseudocode of the SHA-256 message expansion process.   [15]. Then, Algorithm 2 is executed, the digest is updated using the eight working values. In SHA-256, the digest comprises eight 32-bit words. When the SHA-256 algorithm is called, the digest is initialized to a defined value [15]. After the message compression process, the digest is updated with the eight working values. The digest updates the 32-bit word and working value with 2 32 modular addition ( ). When message compression uses the last padding block, the SHA-256 digest is updated through a working value. Finally, SHA-256 returns a 256-bit digest. 3: = a, a = T 1 T 2 5: end for 6: return Hash value (a, b, c, d, e, f , g, h)

Target Key Agreement Algorithm
ECDH with P-256 Curve P-256 is a NIST curve amongst the 15 elliptic curves recommended by NIST [8]. It is an elliptic curve defined over a 256-bit prime field that offers approximately 128-bit security. This elliptic curve is defined by the following equation: where b is a constant in a finite field F p . The prime p is a 256-bit prime selected for easy modular reduction. This elliptic curve has an Abelian group structure with identity element O called the point of infinity. Scalar multiplication calculates kP using the 256-bit scalar value integer k and base point Here, the algorithms used for scalar multiplication are ECADD and ECDBL. The input value points used for ECDBL and ECADD are affine coordinate systems P = (X 1 , Y 1 ), Q = (X 2 , Y 2 ). ECDBL calculates P + Q = 2P when P = Q and ECADD performs P + Q when P = Q. The security of ECC is based on the difficulty of computing the elliptic curve discrete logarithm problem (ECDLP), i.e., it is very difficult to find scalar value k when Q and k are given by Q = kP.
The prime curve's equation is y 2 = x 3 + ax + b. The prime curve is divided into P-256, P-384, and P-521 for each parameter. Here, scalar multiplication is performed using the affine coordinate system. ECADD is performed whenever the 1-bit value of scalar k, i.e., the input value of scalar multiplication, is 1. ECADD includes inverse circle arithmetic. Among the finite field operations (addition, subtraction, multiplication, and inverse), inverse operations are the heaviest. Therefore, rather than performing inverse calculation through ECADD whenever the 1-bit value is 1 by extending to the projective coordinate system, the load on the inverse calculation is reduced by performing the inverse calculation once after the scalar multiplication operation. This method calculates scalar multiplication quickly using a more optimized method than projective coordinate by implementing scalar multiplication with a Jacobian coordinate system fixed at a = −3. After converting the affine coordinate system to the Jacobian coordinate system, the ECDBL and ECADD operations are performed as shown in Table 2. After the scalar multiplication operation is completed, the value of kP can be obtained by converting the Jacobian coordinate system to the affine coordinate system [8]. Table 2. Jacobian ECDBL, ECADD.

ECDBL ECADD
ECDH is a Diffie-Hellman key exchange protocol that uses elliptic curve-based operations [7]. Elliptic curve cryptography is a public key method based on an elliptic curve and security in the discrete logarithm problem. In addition, as an alternative to RSA, it provides security with a much shorter key length than RSA. The elliptic curve-based operation comprises ECADD and ECDBL. ECADD is an operation that adds two points, and ECDBL is an operation that doubles a point.
The dP for scalar d, i.e., a point on the elliptic curve at the point at base point P, is calculated as scalar multiplication using two elliptic curve operations.
The Diffie-Hellman key exchange is security with the difficulty of the discrete logarithm problem. Here, Alice and Bob calculate g a mod p and g b mod p with the private keys a and b, respectively, in the cyclic group < g > with order p. Then, after sending g a mod p and g b mod p to Bob and Alice respectively, by exponentially multiplying each private key to the transmitted value, private keys as g ab mod p can be exchanged safely without revealing key information to an attacker. In DH, the key lengths of a and b are long, which is a disadvantage however, ECDH, which combines elliptic curve cryptography and DH, provides efficient security with a short key length using elliptic curve cryptography. The entire process of ECDH is shown in Figure 6. Here, Alice and Bob generate private keys a and b, respectively, and, after generating private keys, Alice and Bob set the base point G on the elliptic curve to calculate public keys aG and bG, respectively, and send the public keys aG and bG to each other.

Providing Side Channel Resistance
Atomic Block-Based ECDH Implementation The scalar multiplication operation of elliptic curve cryptography is vulnerable to simple power analysis (SPA). This is because the scalar multiplication operation operates ECDBL and ECADD when the 1-bit of the scalar integer is 1, and calculates only ECDBL when the 1-bit of the scalar integer is 0, resulting in different power consumption. In addition, ECADD is only performed when the scalar multiplication operation is 1 thus, the use of branch statements is vulnerable to timing attacks. Countermeasures for side-channel analysis against scalar multiplication of elliptic curve cryptography have been proposed [12][13][14].
In [14], an atomic block, an algorithm for countering SPA, which is a side-channel attack method of RSA and elliptic curve cryptography, was proposed. In the Scalar multiplication operation, an atomic block is applied to ECDBL and ECADD to be safe against SPA, which is a side-channel attack, and the existing atomic block operation repeats in the order of multiplication, addition, subtraction, and addition operations to perform a Scalar multiplication operation. ECADD and ECDBL to which the atomic block is applied are shown in Table 3. In order to safely perform ECDBL and ECADD through an atomic block, a fake operation must be added. For the existing atomic block, 17 fake operations were added for ECDBL and 32 for ECADD. If ECDBL and ECADD are configured through the calculation process shown in Table 3, the same power waveform is repeated when an attacker measures the power consumption for scalar multiplication, so it is safe for SPA. In addition, it is safe for TA because branch statements are not required when implementing atomic blocks. When exchanging keys between web environment and another environment, using a branch statement in scalar multiplication inside ECDH is vulnerable to TA. Therefore, we present a secure key exchange protocol to users when using crypto libraries by applying an atomic block which is a security method for TA and SPA to scalar multiplication. Table 3. Existing atomic block method.

CHAM Algorithm in JavaScript and Web Assembly
The original CHAM algorithm (Figure 4) is divided into odd and even rounds, and swaps the position of the word at the end of each round. Use 2 × k/w round keys repeatedly. The words of the original CHAM algorithm return to their original positions every four rounds. Thus, as shown in Figure 7, it is possible to maintain the position of each word by calculating the necessary values for each round without performing a swap. This method is faster because the swap process in the original CHAM algorithm is not used [9]. The CHAM and AES algorithms are implemented with Web Assembly and demonstrate faster performance than JavaScript implementation [16].

Crypto Implementations on Web Assembly Environment
In [17], H ACL * [18], libsodium [19], and the proposed W H ACL * [17] libraries are converted to Web Assembly to compare performance. H ACL * is a verified library of cryptographic primitives that is implemented in Low * and compiled to C via KreMLin [20]. Libsodium is a modern, easy-to-use software library for encryption, decryption, signatures, password hashing, and more. W H ACL * is the library proposed in [17]. In Table 4, (A) is a H ACL * library compiled with C using KreMLin and then compiled as Web Assembly through Emscripten, (B) is libsodium compiled with Web Assembly through Emscripten, and (C) is W H ACL * compiled with KreMLin. Looking at Table 4, H ACL * is slower than libsodium in Curve25519 and Ed25519. H ACL * depends on 128-bit arithmetic in C compilers such as gcc and clang. Libsodium converts to 32-bit implementation and operates. Web Assembly also encodes 128-bit integers into 64-bit integer pairs. Due to these characteristics, there is a difference in performance when converting H ACL * and libsodium libraries to Web Assembly. As a result, when using a cryptographic algorithm by converting the code implemented in a web-based application into a Web Assembly, implementing a cryptographic algorithm in consideration of the characteristics of such Web Assembly helps to improve performance.
In [21], the official implementation of Picnic [22], which was NIST's second round candidate for the standardization of quantum tolerance encryption, was converted into Web Assembly, and its performance was measured in Chrome, Firefox, and Microsoft Edge. Comparing Tables 5 and 6, as a result, Web Assembly shows a result that is about 2 ∼ 3 times slower than that of C.  [17].

Proposed Implementation of Revised CHAM
The revised CHAM algorithm is an ARX-based lightweight cipher, and is an algorithm that is safer for differential attacks than the original CHAM. The revised CHAM algorithm is safe for differential attacks because it increases the number of rounds of the original CHAM algorithms, CHAM-64/128, CHAM-128/128, and CHAM-128/256. With this method, we implement the revised CHAM algorithm to be safe for differential attacks by implementing it using Web Assembly. The number of words in the plaintext entering the input value from the original CHAM algorithm is 4. The original CHAM algorithm swaps the place of four words that make up the plaintext at the end of one round.
Rather than swapping four words for each round [9], as shown in Figure 7, it uses a feature that returns to the original words every four rounds to improve performance. At the end of each round, the round algorithm is calculated using the necessary values while maintaining the position of each word without swapping by removing the word swapping process from the existing algorithm to induce a faster round operation. In CHAM algorithms, the plaintext and 1-word of the key are 16-bit in CHAM-64/128. In Figures 3 and 4, 16-bit word is used as the input to ROL8, ROL1, and Keyschedule. Algorithms 3 and 4 present a method to pre-compute the input values of ROL8, ROL1, and Keyschedule, through which the resultant values are 16-bit and are calculated in advance from 0 × 0 to 0 × f f f f , the number of all 16-bit inputs. Whenever the ROL1, ROL8, and Keyschedule functions were required, they used a method of taking and using the result values based on the input computed in the pre-built table rather than the operation. Table   Output: ROL1- Table[

Proposed Implementation of ECDH with Side Channel Resistance
In the literature [11], the N AF algorithm used a negative representation to reduce the number of 1s for scalar k. As the number of ECADD decreases as much as the number of 1, scalar multiplication is possibly faster than before. The wN AF algorithm processes ECADD for w-bit at once thus, the wN AF algorithm realizes a faster scalar multiplication than the binary left to right scalar multiplication algorithm. To process w-bit, pre-computation is required for odd values in the range [−2 w , 2 w−1 − 1]. It can be used at variable points due to the relatively low cost of pre-computation.
To use wN AF, conversion from scalar k to N AF w (k) is required, which is realized in the same manner as Algorithm 5. The N AF w (k) can be up to 1 bit longer than the existing k, and the maximum nonzero density will be 1 w+1 . Multiplication for the overall scalar k is performed in the same manner as Algorithm 6. In the pre-calculation, one ECDBL and 2 w − 2 ECADD operations are required, and, in the scalar multiplication process, l ECDBL and 1 w+1 ECADD operations are required. The wN AF algorithm is safe for SPA because it uses the number of holes in the range [−2 w , 2 w−1 − 1]. Depending on the bit size of Scalar k, the number of pre-computed ECADDs varies. Therefore, it is vulnerable to TA, and it is implemented to be safe for TA using atomic blocks.

Algorithm 5 Computing the width-wN AF of a positive integer
Input : Window width w, positive integer k. Output : N AF w (k) if k is odd then 4: end if 8: k ← k/2, i ← i + 1 9: end while 10: return (k i−1 , k i−2 , · · · , k 1 , k 0 ) Algorithm 6 Window N AF method for point multiplication Input : Window width w, positive integer k, P ∈ E(F q ) Output : kP if k i = 0 then 7: if k i > 0 then 8: Q ← Q + P k i 9: Atomic block is safe for SCA by repeating the same process regardless of 0 or 1 in scalar multiplication operation. Atomic block provides safety for SCA by making it difficult to distinguish between ECDBL and ECADD by adding fake operations to make calculations in a regular order. In this paper, we present the operation process of a new atomic block by reducing the fake operation in the previous atomic block [14].
As seen in Table 3, the existing atomic blocks in the literature [14] consist of *, +, −, and +. The method we propose is an improved method, assuming that only P = (X, Y, Z) is used. We propose a method to reduce the number of fake operations by changing the block configuration of the existing atomic block to the configuration of *, +, −. The proposed atomic block composes ECDBL and ECADD into 10 and 16 blocks by removing one addition in one block process, respectively. Therefore, 10 and 16 addition operations in ECDBL and ECADD are reduced compared to the existing atomic block. As for the existing atomic block, ECDBL has nine fake additions and eight fake subtractions, and ECADD has 22 fake additions and 10 fake subtractions. In the proposed atomic block, ECDBL has six fake subtractions, ECADD has nine fake additions and nine fake subtractions. Finally, the proposed atomic block reduced nine fake additions and two fake subtractions in ECDBL and 13 fake additions and one fake subtraction in ECADD compared to the existing atomic block. The proposed atomic block operation process is shown in Table 7. Table 8 lists the number of additions, subtractions, and multiplications of the original wN AF, the existing atomic block, and the proposed atomic block.

Proposed Implementation of HMAC
When a web-based application communicates with other environments, it encrypts the data using various cryptographic algorithms, and then sends the encrypted data. For the sent encrypted data, it is necessary to determine whether it was sent without damage. Encrypted data can be confirmed whether it has been transmitted normally using HMAC, which is MAC made using SHA-256. Implementing HMAC as Web Assembly allows web-based applications to authenticate faster than JavaScript [6,15].

Performance Analysis
In the environment of Table 9, the proposed crypto library was implemented as Web Assembly and JavaScript, was compared in Web browsers Chrome, Firefox, and Microsoft Edge to evaluate the performance. Tables 10-18 show the results of the implementation of existing algorithms and the proposed methods, i.e., the revised CHAM algorithm, wN AF, SHA-256, and HMAC.   Tables 10-12 are the results of measuring the implemented CHAM family algorithm in Chrome, Firefox, and Microsoft Edge. The revised CHAM family algorithm has a 4-round combination method, and additionally, CHAM-64/128 is implemented with JavaScript and Web Assembly by applying a pre-computation method. As a result, the revised CHAM algorithm with the applied 4-round combining method showed an improved performance, in Chrome, Firefox, and MicrosoftEdge, by 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256. Pre-computation applied to CHAM-64/128 shows a 1.2 times performance improvement than existing revised CHAM-64/128 in three web browsers.  Tables 13-15 are the result tables measured in Chrome, Firefox, and Microsoft Edge after implementing the original wN AF, the existing atomic block wN AF, and the proposed atomic block wN AF with JavaScript and Web Assembly. As a result of measurement in Chrome, Firefox, and Microsoft Edge, Web Assembly improved more than JavaScript, for the original wN AF by 11, 12, and 11 times, the existing atomic block wN AF by 10, 10, and 14 times, and the proposed wN AF by 11, 12, and 14 times. As shown in Table 8, the atomic block increases the number of operations compared to the existing ECDBL and ECADD, resulting in performance overhead. Therefore, in the case of the existing atomic block wN AF, performance overhead of 55, 23, and 37% occurs. However, in the case of the atomic block wN AF proposed in P = (X, Y, Z), the number of operations is reduced, resulting in a performance overhead of 18%, 6%, and 11%, and scalar multiplication is possible faster than the conventional atomic block.  Tables 16-18 are the results of measuring HMAC, a MAC made using SHA-256 and SHA-256 implemented with JavaScript and Web Assembly in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly showed a higher performance by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HAMC, over JavaScript in Chrome, Firefox, and Microsoft Edge.

Conclusions
In this paper, we proposed a crypto library by implementing a cryptographic algorithm using Web Assembly to improve the performance of cryptographic algorithms in web-based applications. The block cipher, key exchange algorithm, and MAC algorithm were implemented directly in JavaScript and Web Assembly and were compared. As the block cipher, we employed a lightweight cipher (i.e., the CHAM algorithm), applied the four-round combining method, and applied revised CHAM algorithm method, which is secure against differential attacks. Algorithms implemented in Web Assembly and JavaScript were measured in Chrome, Firefox, and Microsoft Edge. In case of block cipher, 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256 showed improvement in performance. CHAM-64/128 to which the pre-computation method was applied showed a performance improvement of 1.2 times in three web browsers than when the algorithm was not applied. For the key exchange algorithm, wN AF was applied to P-256. The atomic block method, which is an algorithm corresponding against SPA and TA, was also applied. When applying the existing atomic block and proposed atomic block to wN AF, we checked how much the performance overhead appeared in comparison to the original wN AF due to the increased number of operations, and how much the proposed atomic block improved over the existing atomic block. For this purpose, each algorithm implemented in Web Assembly and JavaScript was measured in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly improved over JavaScript, for the original wN AF by 11, 12, and 11 times, the existing atomic block wN AF by 10, 10, 14 times, and the proposed wN AF by 11, 12, and 14 times. Existing atomic block wN AF shows a performance overhead of 55%, 23%, and 37% compared to the original wN AF. However, the atomic block wN AF was proposed to be used at P = (X, Y, Z) showing performance overheads of 18%, 6% and 11%. The message authentication code was HMAC, which uses SHA-256 to create a MAC. As a result of the measurement, Web Assembly showed higher performance over JavaScript by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HMAC.
Web Assembly will continue to evolve through several web browser companies. Web Assembly is intended to be used together, not as a replacement for JavaScript. Therefore, with the development of Web Assembly in future, the function call time between Web Assembly and JavaScript will gradually decrease. Thus, from a cryptographic algorithm perspective in future, Web Assembly will be an appropriate language to use. Cryptographic algorithms with a lot of mathematical operations use Web Assembly, and additionally, it will be more efficient from a Web-based application perspective if it is configured using a JavaScript library of various functions. Web Assembly works in a SISD way, and therefore, there is a disadvantage that Web Assembly is slower when processing the same amount of data than the cryptographic algorithm using SIMD which is currently being studied. However, Web Assembly is being developed to support the SIMD method, supporting quite a few intrinsic functions, and is continuously evolving. In addition, an API called WebGPU is being created that can use the functions of a graphic card in a web environment. WebGPU enables the SIMD operation using a graphic card in a web environment. In addition, WebGPU is evolving to support use with Web Assembly. Eventually, we will be able to encrypt and decrypt large amounts of data at high speed when we can use high-performance functions in the web environment such as Web Assembly and WebGPU in future. Currently, there are various attack methods for cryptographic algorithms, but our proposed crypto library only applied a differential attack for block ciphers and SPA and TA for key exchange. We plan to investigate possible attack methods for cryptographic algorithms in the web environment in future and study to improve response algorithms suitable for attack methods. In addition, we will study further because it will be possible to optimize cryptographic algorithms in web-based applications through the support of Web Assembly's SIMD and WebGPU. As such, research on cryptographic libraries used in web-based applications through the development of Web Assembly and support for various functions in the future will be of valuable study.