3.1. Scalar Multiplication
Let 
 be an integer and 
P be a point in 
. The scalar multiplication 
 of 
P is defined by 
. The computation of 
 is lengthy. To reduce the duration, first, 
k is converted to a binary representation, as follows:
        where 
 for 
. Let 
 be non-negative integers such that 
 with 
. Then, using Horner’s rule, Equation (
3) can be represented as
For example, suppose that 
 and 
. Then,
As the idea behind the proposed method comes from the sliding window [
21], let us briefly introduce the basic concept of the sliding window by the following example. For 
k in Equation (
4) with window size 
, 
k can be written as
		In accordance with the sliding window method, the precomputations are 
 point additions; a point doubling number of 9; and 
, 
, 
, and 
. The number of point doubling is the number of times a window with length 
w is successively shifted one place from left to right, skipping the zeros if they are not in the window. More details on the sliding window method can be found in [
21]. With the proposed method, 
 is written as
In Equation (
6), for 
, each
        is referred to as a 
w-bit word, denoted as 
. For the last 
r terms,
        in Equation (
6) is also represented as a 
w-bit word 
 with 
 for 
.
For Equations (
7) and (
8), it is evident that any scalar multiplication operation can be equivalently expressed as the computation of 
, 
 for each 
i. For a small value of 
w, the points 
 can be precomputed and stored in advance, as illustrated in 
Table 1. In this table, given the point 
P, the scalar 
k, and the word length 
w, the result of Equation (
7) or (
8) can be directly retrieved from the entry 
, provided that 
 for 
 and 
.
We will use the following example to demonstrate how to look up values in 
Table 1. Suppose that 
; we precompute all eight possible combinations, as shown in the following table. For the value of 
k according to Equation (
5), for the combination 110, we obtain the result from the table entry 
, which is 
.
        
 |  | 
 | 0 | 
 | P | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  | 
Therefore, given point 
P, scalar 
k, and word length 
w, 
 can be computed with the following Algorithm 1, 
ScalarMUL.
		
| Algorithm 1 ScalarMUL | 
| 1. | Set 
                 | 
| 2. | Using P to create table L, as shown in Table 1 | 
| 3. | Set 
                 
                and 
                 | 
| 4. | For  downto 0 | 
| 5. | do | 
| 6. |         | 
| 7. |         | 
| 8. | Enddo | 
| 9. |  | 
| 10. |  | 
| 11. | return Q | 
  3.2. Reducing Inverse in the Repeating Point Doubling
The sliding window method [
21] shifts a window of length 
 and skips over runs of zeros between them while disregarding the fixed digit boundaries. However, in the 
ScalarMUL algorithm, the binary representation of 
k is partitioned into fixed-length bit-words of size 
w, where each word is processed sequentially. This approach can also be extended to the sliding window method, as will be demonstrated in 
Section 4 with the experimental results. Within the 
ScalarMUL algorithm, it is necessary to compute 
 in step 7 and 
 in step 9.
According to the definition of scalar multiplication 
 of 
Q and the associative property of point addition on the elliptic curve 
, for any positive integer 
n, 
 can be expressed as the point doubling of 
. Specifically, 
. Traditionally, as described in Equation (
2), 
 can be computed using the following Algorithm 2, referred to as 
Tradition. In the 
Tradition algorithm, line 4 employs Equation (
2) to compute 
. Each iteration performs a point-doubling operation on 
Q requiring five XORs (additions), two multiplications, and one inverse operation. The addition, multiplication, and square operations mentioned here are all operations defined within 
.
		
| Algorithm 2 Tradition | 
| 1. | Set 
                 | 
| 2. | For  downto 0 | 
| 3. | do | 
| 4. |     | 
| 5. | Enddo | 
| 6. | return 
                Q | 
To obtain 
, we have to compute 
. Therefore, in the computation, there are 
n inverse operations, 
 XORs, 
 multiplications, and 
 squares. Since the inverse operation is computationally expensive, we have developed optimized formulas to replace the point-doubling computation in the 
Tradition algorithm. The derived formulas are designed to ensure that only a single inverse operation is required when computing 
 of a given point 
Q, significantly improving computational efficiency. Let 
 be a point in 
. For 
, let 
 be the point doubling of 
. Then, 
 is the scalar multiplication 
 of 
. Let 
 be the slope of the tangent line passing through the point 
. Then, to derive formulas for 
 obtained from 
 via the iteration of point doubling, first, consider 
. We have
        where 
 and 
.
In what follows, the formula for  will be omitted until  and  are obtained.
For 
,
        where 
 and 
.
For 
,
        where 
 and 
.
For 
,
        where
        and
The formulas for 
 and 
 can be extended iteratively for arbitrarily large values of 
n, allowing us to compute 
 for any desired 
n. However, the derivation process becomes increasingly laborious and cumbersome as 
n grows larger, making it impractical for manual computation. Before establishing that there is only one inverse operation involved in the computation of scalar multiplication, it will be helpful to introduce the following recurrence relations. By following Equations (
9)–(
12), let 
, and 
. Then,
		For 
, the following relationships can be easily derived:
Table 2 is an illustration of Equations (
9)–(
12) to compute 
 and 
. In the example, the curve is defined over 
. For 
, the computations of 
 and 
 are shown in 
Appendix A.
 Lemma 1. For ,  and .
 Proof of Lemma 1. We will proceed with induction on 
n. Equations (
9)–(
12) show the basis step for 
 and 
. For the inductive step,
This lemma holds.    □
 Corollary 1. For , .
 Proof of Corollary 1. According to Lemma 1, 
,
   □
 Given a point 
 and a positive integer 
n, the 
n-times point doubling 
 of 
Q can be efficiently computed using the following Algorithm 3, referred to as 
PDNTimes.
		
| Algorithm 3 PDNTimes | 
| 1. | Set 
                 | 
| 2. | If , then | 
| 3. |        ;  | 
| 4. |        ;  | 
| 5. |         | 
| 6. |         | 
| 7. |        For  upto n | 
| 8. |        do | 
| 9. |              | 
| 10. |              | 
| 11. |              | 
| 12. |        Enddo | 
| 13. |         | 
| 14. |         | 
| 15. |         | 
| 16. |         | 
| 17. |         | 
| 18. |         | 
| 19. | EndIf | 
| 20. | return 
                 | 
In the PDNTimes algorithm, the computational complexity can be broken down as follows:
Lines 5 and 6: These lines involve 3 XOR operations, 2 multiplications, and 2 square operations.
Lines 9–11: Each iteration of the loop in these lines requires 5 XOR operations, 6 multiplications, and 6 square operations.
Lines 13–17: These lines consist of 2 XOR operations, 4 multiplications, 3 square operations, and 1 inverse operation.
Therefore, a total of  XORs,  multiplications, and  squares are required. However, in the case of hardware devices, the time complexity of adding any two n-bit numbers is currently , while the time complexity of their multiplication is .
Lemma 2. Over , let  be an integer and Q be a point in . The computation of n times point doubling  of Q requires  multiplications,  squares, and one inverse operation.
 For the repeating point doubling on 
, 
Table 3 demonstrates the execution times of the 
Tradition algorithm and the 
PDNTimes algorithm involved in the 
ScalarMUL algorithm. In other words, in line 7 of the 
ScalarMUL algorithm, the computation of 
 is compared using 
PDNTimes and 
Tradition. Let 
 and 
 denote the execution time of the previous method and the proposed method, respectively. Then, in the table, the decreasing ratio is given by
When comparing the performance of 
Tradition with that of 
PDNTimes for different values of 
m, it is observed that while the reduction in inverse operations has led to a decrease in computation time, the increased number of multiplication and square operations in the formula results in a slowdown of the computation time reduction as 
n approaches 8. This trend is illustrated in 
Figure 1. This trend is attributed to the increase in word length, which leads to longer table construction times and a corresponding rise in memory consumption. Furthermore, as depicted in the figure, this behavior remains consistent across different values of 
m, indicating that the trade-off between reduced inversions and increased multiplication and square operations persists regardless of the specific parameters.
  3.3. Reducing Square Operation Time
In the 
PDNTimes algorithm, there are many square operations in 
, and 
. To further reduce the computation time for scalar multiplication or repeating point doubling, precomputations for square operations are employed again. The method we propose below will enable the square operation to utilize three main operations: XOR, bit shifting, and table lookup. Recall that 
 is a polynomial defined over 
. Then, given an integer 
, let 
d and 
r be integers such that 
 and 
. Using Horner’s rule again (note that the 
m we are considering is odd),
In Equation (
16), the computation of 
 involves sequentially evaluating the expression
        for increasing values of 
i. Similar to Equation (
7) (respectively, Equation (
8)), the expression 
 (respectively, 
) represents a 
w-bit word, denoted as 
 (respectively, 
). The result of computing 
, for 
, and 
 can be found in the entry 
 in 
Table 4 provided that 
 for 
. In the subsequent discussion, the notation “
” will be used to denote shifting • to the left by 
n positions, with all the least significant bits set to zero, where 
n is a positive integer.
In Equation (
17), since the maximum degree before applying the modulo operation with respect to 
 is less than 
m, the remainder obtained through traditional long division depends on the polynomial 
. The result of this modulo operation, denoted as 
, is provided in 
Table 5, which represents the remainder of (
17). 
Table 5 comprehensively lists all possible outcomes for 
.
Therefore, the square operation 
 can be computed with the following Algorithm 4, 
SquareMod.
		
| Algorithm 4 SquareMod | 
| 1. | Set 
                 | 
| 2. | Make table  and  such as Table 4 and Table 5, respectively | 
| 3. | For 
                 
                to 
                 | 
| 4. | do | 
| 5. |        | 
| 6. |        | 
| 7. | Enddo | 
| 8. |  | 
| 9. |  | 
| 10. |  | 
| 11. | return C | 
In the 
SquareMod algorithm, for each iteration 
i, the result of the equation of Equation (
17) is represented as 
. In practical implementation, the term 
 in Equation (
17) implies that each 
 in 
C is shifted to the left by 
 positions, with all lower-order bits set to zero, where 
. Let 
 denote the maximum degree of the polynomial in Equation (
17) before applying the modulo operation with 
, and let 
. As 
 is stored in an 
m-bit array in the code, there is a constraint on the shifting of 
C. Specifically, 
 must be greater than the sum of 
 and the maximum degree of 
. This ensures that the shifting operation does not exceed the bounds of the array and that the modulo operation can be correctly applied.
In the SquareMod algorithm, the computational time can be broken down as follows:
Lines 5 and 6: Each iteration of the loop in these lines requires 2 XOR operations, 1 shift, and 2 table lookups.
Lines 8–10: These lines consist of 3 XOR operations, 1 multiplication, and 3 table lookups.
Therefore, a total of  XORs, d shifts, and  table lookups are required. From the perspective of time complexity, this time is negligible compared with the time required for multiplication.
In the 
ScalarMUL and 
SquareMod algorithms, scalar multiplication corresponds to retrieving precomputed values stored in 
Table 1, 
Table 4, and 
Table 5. As a result, this approach significantly enhances computational efficiency by reducing the need for repeated calculations.
Lemma 3. Given an integer w, the scalar multiplication of a point on  over  can be computed in  iterations in the algorithms ScalarMUL and SquareMod.
          
 In Lemma 3, the 
 iterations imply that a scalar multiplication of the form 
 of a given point 
Q is performed on a given point 
Q. To evaluate the execution time of the 
SquareMod algorithm, a test code was implemented to execute the algorithm 100,000 times for each word length 
w with 
. Additionally, the memory size required for the lookup table in 
SquareMod was measured for each word length. For instance, in the case of 
, 
Table 6 summarizes the execution time and the corresponding memory size needed for the lookup table in 
SquareMod. 
Figure 2 provides a graphical representation of the data presented in 
Table 6. As evident from the table or figure, there is a trade-off between execution time and memory usage. While increasing the word length 
w can enhance computational efficiency, it also results in a significant increase in the memory size required and construction times for the lookup table. This highlights the need to carefully balance performance optimization with memory constraints when implementing the 
SquareMod algorithm. Finding the optimal word length will also determine the performance of scalar multiplication, meaning the efficiency of scalar multiplication is adjustable. Taking 
 as an example, in our program execution environment, the memory size required for each word length 
w is shown in 
Table 7. The execution time can be optimized by selecting an appropriate value of 
w based on the hardware and software specifications of the specific execution environment.