# CORDIC Hardware Acceleration Using DMA-Based ISA Extension

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Nios-II Custom Instruction

#### 2.2. Altera Hardware CORDIC Unit

## 3. Related Work

## 4. Methodology

#### 4.1. ISA Extension Basic Implementation

#### 4.2. ISA Extension Using Pipelined Approach

#### 4.3. ISA Extension Using DMA Approach

## 5. Results

## 6. Summary and Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Cadence. Tensilica Customizable Processors. Available online: https://ip.cadence.com (accessed on 9 November 2021).
- Intel. Nios II Processors. Available online: https://www.intel.com (accessed on 9 November 2021).
- Davide Schiavone, P.; Conti, F.; Rossi, D.; Gautschi, M.; Pullini, A.; Flamand, E.; Benini, L. Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications. In Proceedings of the 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Thessaloniki, Greece, 25–27 September 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Joseph Yiu, A. Innovate by Customized Instructions, but without Fragmenting the Ecosystem. Available online: https://armkeil.blob.core.windows.net/developer/Files/pdf/white-paper/arm-custom-instructions-without-fragmentation-whitepaper.pdf (accessed on 9 November 2021).
- Wang, X.; Magno, M.; Cavigelli, L.; Benini, L. FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things. arXiv
**2019**, arXiv:1911.03314. [Google Scholar] [CrossRef] - Sharma, N.K.; Rathore, S.; Khan, M.R. A Comparative Analysis on Coordinate Rotation Digital Computer (CORDIC) Algorithm and Its use on Computer Vision Technology. In Proceedings of the 2020 First International Conference on Power, Control and Computing Technologies (ICPC2T), Raipur, India, 3–5 January 2020; pp. 106–110. [Google Scholar] [CrossRef]
- Raut, G.; Rai, S.; Vishvakarma, S.K.; Kumar, A. A CORDIC Based Configurable Activation Function for ANN Applications. In Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 6–8 July 2020; pp. 78–83. [Google Scholar] [CrossRef]
- Raut, G.; Rai, S.; Vishvakarma, S.K.; Kumar, A. RECON: Resource-Efficient CORDIC-Based Neuron Architecture. IEEE Open J. Circuits Syst.
**2021**, 2, 170–181. [Google Scholar] [CrossRef] - Heidarpur, M.; Ahmadi, A.; Ahmadi, M.; Rahimi Azghadi, M. CORDIC-SNN: On-FPGA STDP Learning with Izhikevich Neurons. IEEE Trans. Circuits Syst. I Regul. Pap.
**2019**, 66, 2651–2661. [Google Scholar] [CrossRef] - Hao, X.; Yang, S.; Wang, J.; Deng, B.; Wei, X.; Yi, G. Efficient Implementation of Cerebellar Purkinje Cell With the CORDIC Algorithm on LaCSNN. Front. Neurosci.
**2019**, 13, 1078. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Manor, E.; Greenberg, S. Efficient Hardware/Software partitioning for Heterogeneous Embedded Systems. In Proceedings of the 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE), Eilat, Israel, 12–14 December 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Volder, J.E. The Birth of Cordic. J. VLSI Signal Process. Syst.
**2000**, 25, 101–105. [Google Scholar] [CrossRef] - Lin, K.J.; Hou, C.C. Implementation of trigonometric custom functions hardware on embedded processor. In Proceedings of the 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE), Tokyo, Japan, 14 November 2013; pp. 155–157. [Google Scholar]
- Walther, J.S. A Unified Algorithm for Elementary Functions. In Proceedings of the Spring Joint Computer Conference—AFIPS ’71 (Spring), Atlantic City, NJ, USA, 18–20 May 1971; Association for Computing Machinery: New York, NY, USA, 1971; pp. 379–385. [Google Scholar] [CrossRef]
- Walther, J.S. A Unified Algorithm for Elementary Functions; AFIPS ’71; Spring: Berlin/Heidelberg, Germany, 1971. [Google Scholar]
- Nguyen, H.; Nguyen, X.; Pham, C.; Hoang, T.; Le, D. A parallel pipeline CORDIC based on adaptive angle selection. In Proceedings of the 2016 International Conference on Electronics, Information, and Communications (ICEIC), Danang, Vietnam, 27–30 January 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Detrey, J.; de Dinechin, F. Floating-Point Trigonometric Functions for FPGAs. In Proceedings of the 2007 International Conference on Field Programmable Logic and Applications, Amsterdam, The Netherlands, 27–29 August 2007; pp. 29–34. [Google Scholar]
- Liventsev, E.; Silantiev, A.; Primakov, E.; Telminov, O. Extending MIPSfpga instruction set for navigation data processing. In Proceedings of the 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), St. Petersburg and Moscow, Russia, 1–3 February 2017; pp. 480–484. [Google Scholar] [CrossRef]
- Andraka, R. A Survey of CORDIC Algorithms for FPGA Based Computers. In Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays—FPGA ’98, Monterey, CA, USA, 22–24 February 1998; Association for Computing Machinery: New York, NY, USA, 1998; pp. 191–200. [Google Scholar] [CrossRef]
- Nolting, S.; Payá-Vayá, G.; Schmeadecke, I.; Blume, H. Evaluation of a Generic Radix-4 CORDIC Coprocessor Tightly Coupled with a Generic VLIW-SIMD ASIP Architecture. 2012. Available online: https://www.researchgate.net/profile/Guillermo-Paya-Vaya/publication/260614052_Evaluation_of_a_Generic_Radix-4_CORDIC_Coprocessor_Tightly_Coupled_with_a_Generic_VLIW-SIMD_ASIP_Architecture/links/570121bb08aea6b7746a78b1/Evaluation-of-a-Generic-Radix-4-CORDIC-Coprocessor-Tightly-Coupled-with-a-Generic-VLIW-SIMD-ASIP-Architecture.pdf (accessed on 9 November 2021).
- Ibrahim, M.; Chen, K.T.; Idroas, M.; Yahya, Z. The implementation of a pipelined floating-point CORDIC coprocessor on NIOS II soft processor. Int. J. Electr. Electron. Data Commun.
**2015**, 3, 15–20. [Google Scholar] - Zhou, J.; Dong, Y.; Dou, Y.; Lei, Y. Dynamic Configurable Floating-Point FFT Pipelines and Hybrid-Mode CORDIC on FPGA. In Proceedings of the 2008 International Conference on Embedded Software and Systems, Chengdu, China, 29–31 July 2008; pp. 616–620. [Google Scholar]
- Li, B.; Fang, L.; Xie, Y.; Chen, H.; Chen, L. A unified reconfigurable floating-point arithmetic architecture based on CORDIC algorithm. In Proceedings of the 2017 International Conference on Field Programmable Technology (ICFPT), Melbourne, VIC, Australia, 11–13 December 2017; pp. 301–302. [Google Scholar] [CrossRef]
- Nguyen, K.D.; Kiet, D.T.; Hoang, T.T.; Quynh, N.Q.N.; Tran, X.T.; Pham, C.K. A trigonometric hardware acceleration in 32-bit RISC-V microcontroller with custom instruction. IEICE Electron. Express
**2021**, 18, 20210266. [Google Scholar] [CrossRef] - Buzdar, A.; Sun, L.; Khan, S.; Buzdar, A. Area and Energy efficient CORDIC Accelerator for Embedded Processor Datapaths. Inf. Midem Ljubl.
**2016**, 46, 197–208. [Google Scholar] - Sun, F.; Ravi, S.; Raghunathan, A.; Jha, N.K. A Synthesis Methodology for Hybrid Custom Instruction and Coprocessor Generation for Extensible Processors. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
**2007**, 26, 2035–2045. [Google Scholar] [CrossRef] - Wolf, M. Chapter 2 -Instruction Sets. In Computers as Components, 3rd ed.; Wolf, M., Ed.; The Morgan Kaufmann Series in Computer Architecture and Design; Morgan Kaufmann: Boston, MA, USA, 2012; pp. 51–93. [Google Scholar] [CrossRef]
- Kavvadias, N.; Masselos, K. Efficient Hardware Looping Units for FPGAs. In Proceedings of the 2010 IEEE Computer Society Annual Symposium on VLSI, Lixouri, Greece, 5–7 July 2010; pp. 35–40. [Google Scholar] [CrossRef]
- STMicroelectronics. AN5325 Getting Started with the CORDIC Accelerator Using STM32CubeG4 MCU Package. Available online: https://www.st.com/en/embedded-software/stm32cubeg4.html#documentation (accessed on 9 November 2021).

**Figure 1.**Custom instruction architecture. Intel’s website [2].

**Figure 6.**Single neuron with multiple inputs followed by activation function [7].

Function | Mode | x_in | y_in | z_in | Result | Phase Type |
---|---|---|---|---|---|---|

A*cos(B) | 1 | A | 0 | B | x_out | Circular |

A*sin(B) | 1 | A | 0 | B | y_out | Circular |

asin(A) | 1 | Unit | 0 | A | z_out | Circular |

atan(B/A) | 0 | A | B | 0 | z_out | Circular |

Mag(A,B) | 0 | A | B | 0 | x_out | Linear |

C-B/A | 0 | A | B | C | z_out | Hyperbolic |

A*cosh(B) | 1 | A | 0 | B | x_out | Hyperbolic |

A*sinh(B) | 1 | 0 | A | B | x_out | Hyperbolic |

A*exp(B) | 1 | A | A | B | y_out | Hyperbolic |

atanh(B/A) | 0 | A | B | 0 | z_out | Hyperbolic |

0.5*ln(A) | 0 | A + 1 | A − 1 | 0 | z_out | Hyperbolic |

sqrt(A) | 0 | A + $\frac{1}{4}$ | A − $\frac{1}{4}$ | 0 | x_out | Hyperbolic |

Macro | C Equivalent Function |
---|---|

CORDIC_COS(phase) | cos(phase) |

CORDIC_SIN(phase) | sin(phase) |

CORDIC_AMPCOS(amp,phase) | amp*cos(phase) |

CORDIC_AMPSIN(amp,phase) | amp*sin(phase) |

CORDIC_ROTATEX(x,y,phase) | x*cos(phase) − y*sin(phase) |

CORDIC_ROTATEY(x,y,phase) | y*cos(phase) + x*sin(phase) |

CORDIC_SIGMOID(x) | sigmoid(x) |

Macro | Functionality |
---|---|

DMA_CORDIC_COS(z_in*,x_out*,size) | x_out = cos(z_in) |

DMA_CORDIC_SIN(z_in*,y_out*,size) | y_out = sin(z_in) |

DMA_CORDIC_AMPCOS (z_in*,x_in*,x_out*,size) | x_out = x_in*cos(z_in) |

DMA_CORDIC_AMPSIN (z_in*,x_in*,y_out*,size) | y_out = x_in*sin(z_in) |

DMA_CORDIC_ROTATEXY (z_in*,x_in*,y_in*,x_out*,y_out*,size) | x_out = x_in*cos(z_in) − y_in*sin(z_in) y_out = y_in*cos(z_in) + x_in*sin(z_in) |

DMA_CORDIC_NEURON (x_in*,y_in*,y_out*,size) | y_out = sigmoid(${\sum}_{i=1}^{n}{x\_in}_{i}\ast {y\_in}_{i}$) |

LUTs | Registers | Memory | DSP | Speed (MHz) | |
---|---|---|---|---|---|

Component | |||||

Nios-II/f Core | 1290 | 376 | 10,240 | 6 | - |

FPU Accelerator | 415 | 198 | 144 | 7 | - |

CORDIC | 1099 | 170 | 0 | 0 | - |

scaler | 126 | 148 | 0 | 0 | - |

float2fix | 261 | 257 | 0 | 0 | - |

fix2float | 192 | 238 | 0 | 0 | - |

Basic CIC | 634 | 1039 | 768 | 4 | - |

Pipelined CIC | 658 | 1071 | 1280 | 6 | - |

DMA CIC (+Avalon I/F) | 1790 | 2382 | 4096 | 6 | - |

System | |||||

Nios with FPU | 1819 | 717 | 10,453 | 13 | 114 |

Basic Implementation | 4933 | 2463 | 12,032 | 10 | 145 |

Pipelined Approach | 4955 | 2495 | 12,544 | 12 | 143 |

DMA Approach | 6089 | 3796 | 15,360 | 12 | 146 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Manor, E.; Ben-David, A.; Greenberg, S.
CORDIC Hardware Acceleration Using DMA-Based ISA Extension. *J. Low Power Electron. Appl.* **2022**, *12*, 4.
https://doi.org/10.3390/jlpea12010004

**AMA Style**

Manor E, Ben-David A, Greenberg S.
CORDIC Hardware Acceleration Using DMA-Based ISA Extension. *Journal of Low Power Electronics and Applications*. 2022; 12(1):4.
https://doi.org/10.3390/jlpea12010004

**Chicago/Turabian Style**

Manor, Erez, Avrech Ben-David, and Shlomo Greenberg.
2022. "CORDIC Hardware Acceleration Using DMA-Based ISA Extension" *Journal of Low Power Electronics and Applications* 12, no. 1: 4.
https://doi.org/10.3390/jlpea12010004