This section first provides parameters for the implementation of selected BHC model in
Section 4.1). Subsequently, the synthesis results for various FPGA devices are given in
Section 4.2. Finally, the comparison with existing state of the art solutions is provided in
Section 4.3.
4.2. Implementation Results
We modeled two Verilog (HDL) designs over
and
using the Xilinx ISE design tool. To perform functional verification, the behavioral simulation models are verified with their corresponding C-based functional implementations. The implementation results for various Xilinx FPGA devices are given in
Table 4. For Virtex-4, Virtex-5, Virtex-6 and Virtex-7 FPGA boards, the selected devices for logic synthesis are xc4vfx140-11ff1517, xc5vfx130t-3ff1738, xc6vlx550t-2ff1760 and xc7vx690t-3ffg1930, respectively. Column one in
Table 4 presents the key length (m) while the implementation platform is given in the second column. The remaining columns (three to six) provide FPGA slices, operational clock frequency (in MHz), number of clock cycles and latency (in μs), respectively. The values for reported FPGA slices and clock frequency are determined with the Xilinx ISE tool. Similarly, the number of clock cycles and latency values are calculated using Equations (
3) and (
4), respectively.
As shown in
Table 4, the proposed architecture over
and
requires 10,393 and 11,137 clock cycles, respectively. For each implementation platform over
and
, the achieved results in terms of FPGA slices, clock frequency and time to perform one PM operation are given in the following:
Results on Virtex-4 and Virtex-5. As shown in column three of
Table 4, the proposed architecture over
and
utilizes 5302 and 11,557 slices on Virtex-4. Moreover, the achieved clock frequency is 152 and 103 MHz, respectively. Apart from hardware slices and clock frequency, the time required to perform one PM computation is 68 μs and 108 μs. On Virtex-5, the proposed architecture over
and
utilizes 2412 and 10,065 FPGA slices which are comparatively 1.14 and 2.19 times lower than our Virtex-4 slices. The achieved clock frequency increases as compared to our Virtex-4 implementations. The increase in clock frequency (from 152 to 194 over
and from 103 to 157 over
) ultimately reduces the latency, as shown in
Table 4.
Results for Virtex-6 and Virtex-7. On Virtex-6, the utilized FPGA slices over and are 2982 and 4370, respectively. For the same key lengths (163 and 233), the achieved clock frequency is 200 MHz and 164 MHz. When moving to Virtex-6 from Virtex-5 FPGA, there is a small decrease in the computation of PM time (53 μs to 51 μs over and 70 μs to 67 μs over ). On newer Virtex-7 FPGA, the proposed architecture achieves higher clock frequency as compared to our implementations on Virtex-4, Virtex-5 and Virtex-6 devices. For a higher key length (), the proposed architecture consumes lower hardware slices as compared to the corresponding Virtex-4, Virtex-5 and Virtex-6 FPGA implementations. The required computation time for one PM is 42 μs and 54 μs over and , respectively.
To summarize, the proposed architecture consumes lower slices on Virtex-5 and Virtex-7 devices. The newer technologies (Virtex-6 and Virtex-7) provide a relatively higher clock frequency as compared to older Virtex-5 and Virtex-4 FPGAs. As shown in the last column of
Table 4, the latency increases with an increase in the key length. Moreover, as the target platform changes from Virtex-4 to Virtex-7, the latency of the architecture is decreased.
4.3. Comparison with State of the Art Architectures
In order to provide a realistic and reasonable comparison with state of the art, we synthesized our Verilog (HDL) models for similar FPGA devices, as shown in
Table 5. The comparison with a variety of existing low area architectures of BHC model is challenging as there are fewer hardware-based published works [
18,
19,
20,
21,
22]. Therefore, we also provided a comparison with the most recent area optimized implementations of the unified BEC model [
8,
23,
24,
25]. It is important to mention that we have placed the symbol ‘−’ in
Table 4 where the relevant information is not given.
Comparison with BHC and BEC architectures on Virtex-4: The BHC and BEC architectures on Virtex-4 FPGA are reported in [
18,
21,
23], respectively. As shown in
Table 5, our design consumes 1.76 times fewer hardware slices over Virtex-4 as compared to [
18]. This is due to the use of multiple FF operators (multiplier and adder) in the datapath. On the other hand, the proposed architecture utilizes only one segmented-LSD multiplier, adder and squarer in the datapath. Additionally, the use of a hybrid Karatsuba multiplier (by merging general and simple multipliers) increases hardware resources. Moreover, the achieved operational clock frequency in our design is 103 MHz which is comparatively 1.27 times higher. However, to perform one PM computation, it requires more clock cycles and needs higher computational time (in terms of latency). The architecture of [
21] utilizes 1.18 times lower hardware slices as compared to this work. Nevertheless, the proposed architecture requires 1.17 times lower clock cycles. It implies that there is always a trade-off between the achieved performance and the consumed area. The BEC architecture, presented in [
23], utilizes 1.88 times more hardware resources compared to our architecture. This is due to use of the hybrid Karatsuba multiplier in the datapath. In addition to the optimized hardware resources, the presented architecture also provides 1.68 times higher clock frequency.
Comparison with BHC and BEC architectures on Virtex-5: The BHC and BEC architectures on Virtex-5 FPGA are reported in [
21,
24,
25], respectively. The architecture in [
21] utilizes 1.50 times fewer slices as compared to this work. Similarly, the BEC architecture in [
24] consumes 1.70 times fewer slices. The comparison in terms of clock cycles and frequency is not possible as the values for these design parameters are not given. The BEC architecture in [
25] utilizes 1.57 times more hardware resources as compared to our design. However, our design provides 1.96 times lower clock frequency.
Comparison with BHC and BEC architectures on Virtex-6: The BHC and BEC architectures on Virtex-6 FPGA are reported in [
8,
19,
20], respectively. In ref. [
19], a hybrid Karatsuba multiplier is employed. The use of a segmented-LSD multiplier in the proposed architecture results in 1.63 times fewer slices. Furthermore, the architecture of [
19] requires 1.51 times fewer clock cycles and requires lower computational time. A digit-parallel least significant digit multiplier, with a digit size of 32-bit, is incorporated in [
20]. The use of a digit parallel multiplier results in 1.75 times more hardware resources. For the twisted BEC, the prime
with
is utilized in [
8]. The proposed architecture utilizes 1.51 times fewer slices as compared to the most recent BEC architecture of [
8]. Moreover, for same the key lengths (i.e., 233), the proposed architecture over
is 31.79 (ratio of 2130 over 67) times faster.
Comparison with BHC architectures on Virtex-7: The BHC architectures on Virtex-7 FPGA are reported in [
19,
20,
21]. The use of a segmented-LSD multiplier in this article results in 1.41 times lower slices as compared to [
19], where an hybrid Karatsuba multiplier is employed. Furthermore, an increase of 1.11 times in operational clock frequency is also obtained. Similarly, the use of a digit parallel multiplier in [
20] results in 1.48 times more hardware resources. Due to four-stage pipelining, the architecture of [
21] achieves a higher clock frequency. Nevertheless, the reported clock cycles are 1.17 times higher than this work. This is due to the inherent data dependency in the
of BHC model (see
Table 2). Using the same FF multiplier of [
20], the dedicated architecture in [
22] consumes 1.42 times more slices. Moreover, the architectures of [
20,
22] achieve a higher operational clock frequency. On the other hand, the proposed solution in this article requires fewer clock cycles.