A Low-Latency RDP-CORDIC Algorithm for Real-Time Signal Processing of Edge Computing Devices in Smart Grid Cyber-Physical Systems

Smart grids are being expanded in scale with the increasing complexity of the equipment. Edge computing is gradually replacing conventional cloud computing due to its low latency, low power consumption, and high reliability. The CORDIC algorithm has the characteristics of high-speed real-time processing and is very suitable for hardware accelerators in edge computing devices. The iterative calculation method of the CORDIC algorithm yet leads to problems such as complex structure and high consumption of hardware resource. In this paper, we propose an RDP-CORDIC algorithm which pre-computes all micro-rotation directions and transforms the conventional single-stage iterative structure into a three-stage and multi-stage combined iterative structure, thereby enabling it to solve the problems of the conventional CORDIC algorithm with many iterations and high consumption. An accuracy compensation algorithm for the direction prediction constant is also proposed to solve the problem of high ROM consumption in the high precision implementation of the RDP-CORDIC algorithm. The experimental results showed that the RDP-CORDIC algorithm had faster computation speed and lower resource consumption with higher guaranteed accuracy than other CORDIC algorithms. Therefore, the RDP-CORDIC algorithm proposed in this paper may effectively increase computation performance while reducing the power and resource consumption of edge computing devices in smart grid systems.


Introduction
Over the past few years, cloud computing infrastructure has been the dominant solution used to handle heavy computational tasks related to smart grid applications [1].With the rise of smart grid cyber-physical systems and the growing number of Internet of Things (IoT) devices, cloud computing can no longer satisfy all the computing needs of smart grid applications.Edge computing can move data processing tasks from remote cloud computing centers to devices at the edge of the network.Edge computing technology alleviates network congestion, latency and packet loss in smart grid architectures under cloud computing [2,3].The architecture of edge-enabled smart grid cyber-physical systems is shown in Figure 1, which consists of an access layer, an edge layer, a network layer, a platform layer, and an application layer.Here, the ability to process intelligent edge data is provided by the edge computing devices at the Edge Layer.Many scholars have proposed applications of edge computing, which are especially suitable for smart grids.A cloud edge collaborative intelligent method for object detection was proposed in the literature [4], and it is applied to insulator string recognition defect detection in the power IIoT.A fault detection method for pumping units based on edge intelligence, which effectively improves the fault detection accuracy while maintaining low computational requirements, was proposed in [5].The important features of edge computing applied to smart grid mainly include: support for real-time [6][7][8][9][10] and low power [11][12][13][14][15] consumption.Data processing is the most time-consuming and energy-intensive part of edge computing.Furthermore, since edge computing devices cannot guarantee high-capacity storage, processing these large volumes of data is an important issue to be addressed [16].

Technology for Smart Grid
Most of the current edge computing devices use an edge computing framework based on heterogeneous computing, as shown in Figure 2. In this framework, the computational power provided by edge devices mainly depends on hardware accelerators, which include Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA).In the heterogeneous framework based on CPU+FPGA [17], FPGA has the characteristics of reconfiguration and energy efficiency.The literature [18] appropriately places DSP operators on edge devices, so that the edge layer can reduce the energy consumption of each event by as much as 4%.The edge computing device based on CPU+ASIC structure [19] has a significantly better acceleration ratio than the existing GPU+CPU method, and has the advantages of small size and low power consumption.A signal processing algorithm named CORDIC is often used in hardware to handle complex data computation problems in real-time.It can implement many complex functions and mathematical problems with simple addition, subtraction, and shift operations.Table 1 lists some applications of the CORDIC algorithm, including trigonometric functions [20], hyperbolic functions [21], FFT [22] and singular value decomposition [23].Nonetheless, the computational speed of the conventional CORDIC algorithm is limited by the number of iterations, i.e., the more iterations of the CORDIC algorithm, the higher the computational accuracy and the longer the time delay.Therefore, reducing the number of iterations of the CORDIC algorithm, while ensuring the computational accuracy of the algorithm, can reduce the computational latency and hardware resource consumption.Many scholars have proposed applications of edge computing, which are especially suitable for smart grids.A cloud edge collaborative intelligent method for object detection was proposed in the literature [4], and it is applied to insulator string recognition defect detection in the power IIoT.A fault detection method for pumping units based on edge intelligence, which effectively improves the fault detection accuracy while maintaining low computational requirements, was proposed in [5].The important features of edge computing applied to smart grid mainly include: support for real-time [6][7][8][9][10] and low power [11][12][13][14][15] consumption.Data processing is the most time-consuming and energyintensive part of edge computing.Furthermore, since edge computing devices cannot guarantee high-capacity storage, processing these large volumes of data is an important issue to be addressed [16].
Most of the current edge computing devices use an edge computing framework based on heterogeneous computing, as shown in Figure 2. In this framework, the computational power provided by edge devices mainly depends on hardware accelerators, which include Digital Signal Processing (DSP), Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA).In the heterogeneous framework based on CPU+FPGA [17], FPGA has the characteristics of reconfiguration and energy efficiency.The literature [18] appropriately places DSP operators on edge devices, so that the edge layer can reduce the energy consumption of each event by as much as 4%.The edge computing device based on CPU+ASIC structure [19] has a significantly better acceleration ratio than the existing GPU+CPU method, and has the advantages of small size and low power consumption.A signal processing algorithm named CORDIC is often used in hardware to handle complex data computation problems in real-time.It can implement many complex functions and mathematical problems with simple addition, subtraction, and shift operations.Table 1 lists some applications of the CORDIC algorithm, including trigonometric functions [20], hyperbolic functions [21], FFT [22] and singular value decomposition [23].Nonetheless, the computational speed of the conventional CORDIC algorithm is limited by the number of iterations, i.e., the more iterations of the CORDIC algorithm, the higher the computational accuracy and the longer the time delay.Therefore, reducing the number of iterations of the CORDIC algorithm, while ensuring the computational accuracy of the algorithm, can reduce the computational latency and hardware resource consumption.In summary, this work aimed to discover a high-performance CORDIC algorithm As a result, we proposed a RDP-CORDIC algorithm and implemented the hardware design of the algorithm.The RDP-CORDIC algorithm, characterized by fewer iterations, less hardware resource consumption and faster processing speed, can effectively improve the data processing speed and reduce the latency and power consumption of edge computing devices in smart grid cyber-physical systems.The main contributions of this work are as follows:

•
We proposed a rotation direction prediction method of the CORDIC algorithm which completed the calculation of all the micro-rotation directions by inputting the angle and direction prediction constants, providing the basis for the subsequen merge iteration; • A constant compensation algorithm for direction prediction was proposed to achieve higher accuracy of direction prediction, being able to solve the problem of large memory consumption under the condition of high accuracy;

•
The single-stage iterative structure of the CORDIC algorithm was replaced by a three-stage and multi-stage iterative structure.Based on this structure, the CORDIC  In summary, this work aimed to discover a high-performance CORDIC algorithm.As a result, we proposed a RDP-CORDIC algorithm and implemented the hardware design of the algorithm.The RDP-CORDIC algorithm, characterized by fewer iterations, less hardware resource consumption and faster processing speed, can effectively improve the data processing speed and reduce the latency and power consumption of edge computing devices in smart grid cyber-physical systems.The main contributions of this work are as follows: • We proposed a rotation direction prediction method of the CORDIC algorithm, which completed the calculation of all the micro-rotation directions by inputting the angle and direction prediction constants, providing the basis for the subsequent merge iteration; • A constant compensation algorithm for direction prediction was proposed to achieve higher accuracy of direction prediction, being able to solve the problem of large memory consumption under the condition of high accuracy;

•
The single-stage iterative structure of the CORDIC algorithm was replaced by a three-stage and multi-stage iterative structure.Based on this structure, the CORDIC algorithm design with high accuracy, low latency, and low power consumption was achieved.

Related Work
The CORDIC algorithm was proposed by Volder in 1959 and was later generalized by Walther.Subsequently, some other methods were proposed that aimed to enhance the precision and reduce iterations and resource consumption.Among them, Radix-4 CORDIC algorithms [24] worked on zero hopping technology to reduce the number of iterations for rotation to 50%.Later, a hybrid radix 2-4 CORDIC algorithm with highperformance compensation technique waspresented [25] with reduced number of iterations by 1/4, including scale factor calculation and compensation.Nevertheless, the computation and correction of variable scale factor was a focused issue for higher radix CORDIC algorithms [26][27][28][29] and advanced hybrid CORDIC algorithms [30].The scale-free CORDIC algorithm [31,32] approximated the sine and cosine functions by the Taylor series, thereby eliminating the need for the scalar factors, except for a limited convergence range and poor accuracy.A new hybrid CORDIC algorithm was proposed [33] to be able to further reduce the latency of CORDIC by reducing the number of iterations equal to (3N/8) + 1.A technique reported in low latency CORDIC algorithm [34] utilized the binaryto-bipolar recoding (BBR) method to reduce the overall iterations to (N + 1)/3, with no scale factor compensation.Similar to [23], the CORDIC algorithms [35][36][37] cut down time and memory at the expense of accuracy.The CORDIC II algorithm proposed in [38] had excellent performance in terms of resource consumption and latency, but its low accuracy held it back.Table 2 lists some important features of the above related CORDIC algorithms, including the rotation radix, prediction of rotation direction and whether the scaling factor is fixed.

Conventional CORDIC Algorithm
The CORDIC algorithm contains two modes (rotation mode, vector mode) and three coordinate systems (circular coordinates, linear coordinates, hyperbolic coordinates).Different functions can be derived under different modes and different coordinate systems.The CORDIC algorithm rotation mode of the circle coordinate system was taken as an example to construct the simplest vector rotation model, as shown in Figure 3.
Suppose vector v1 is rotated by θ to obtain vector v2, and the coordinates of v1 and v2 are (x0, y0), (xt, yt) respectively, then, the equation of change of vector coordinates can be expressed by Equation (1).Suppose vector v 1 is rotated by θ to obtain vector v 2 , and the coordinates of v 1 and v 2 are (x 0 , y 0 ), (x t , y t ) respectively, then, the equation of change of vector coordinates can be expressed by Equation (1).
By dividing the single rotation angle of Equation ( 1) into multiple directed rotations θ i = tan −1 (2 −i ), each rotation can be expressed by the iterative Equation (2).
where z i+1 indicates the remaining angle and d i is the direction of rotation.(x i+1, y i+1 ) indicates the coordinates of (x i , y i ) after the next rotation.In the rotation mode, the remaining angle z i+1 value was used as a direction reference, and after n iterations, the z i+1 value tended to zero and the vector v i almost tended to the vector, thus realizing the successive approximation calculation.Since cos θ i in Equation ( 2) involved multiplication in the iterative calculation process, it can be proposed not to participate in the iterative operation.Let K = ∏ n i=0 cos θi =1/[(1 + 2 −2i ) 0.5 ], 1/K is the scaling factor mentioned above, then the iteration equations of the radix-2 CORDIC algorithm in rotation mode at the (i + 1)th step are as follows: To facilitate the subsequent verification of the performance of the CORDIC algorithm, the principles for computing the sine and cosine functions are described below.Combining Equation (1) with Equation (3), a formula for the CORDIC algorithm that calculates the sine and cosine functions can be introduced.If given z 0 = θ, the coordinates of Equation ( 3) are (x n , y n ) after n iterations of calculation.
From Equation ( 4), it can be found that taking x 0 = K and y 0 = 0, after n iterations of calculation, x n and y n will be equal to the values of cos θ and sin θ, respectively.Therefore, the calculation of sine and cosine functions based on the CORDIC algorithm was implemented.

RDP-CORDIC Algorithm
The micro-rotation direction of the conventional CORDIC algorithm is determined by the remaining angle after the last iteration, which leads to the problem of high latency.Although the high latency problem may be solved by way of a parallel pipeline structure, it increases the hardware resource overhead, and the most effective way to solve the high latency is to reduce the number of iterations.In this paper, a rotation direction prediction CORDIC (RDP-CORDIC) algorithm was proposed to reduce the number of iterations by calculating all micro-rotation directions in advance, so that the conventional single-stage iterative structure could be changed into a multi-stage iterative structure.The current direction prediction algorithms mainly include the Booth encoding method and the binary-to-bipolar recoding (BBR) method.The Booth encoding method is responsible for predicting the direction of rotation after [N − log 2  3 ]/3 iterations, thus reducing the number of iterations by about 1/2.The BBR is impressed by decomposing the input angle θ into a combination of a larger angle and several 2 −i radians so that the direction of rotation is determined by the binary bit value of θ each rotation.Note that, the BBR method requires a ROM to store all the computation results after N/3 − 1 iterations, and the ROM consumption increases as precision gets higher, e.g., 16-bit precision requires a ROM of 26 × 16 × 2 (bit) size.

Rotation Direction Prediction
Considering that the BBR method allows the binary bit value of angle to represent the direction of micro-rotation, i.e., θ = ∑ ∞ i=0 d i 2 −i = (d θ ) 2 , this method fixes the rotation angle as 2 −i , resulting in large consumption of ROM resources.Therefore, the micro-rotation angle chosen for the RDP-CORDIC algorithm was tan −1 (2 −i ), and a new rotation direction prediction method needs to be sought.
ε is a constant of about 0.0421115429, the final rotation direction prediction formula is introduced as in Equation (8).
From Equation ( 8), the direction of rotation could be calculated by entering the angle and λ.The binary bit value of the final calculation pointed the direction of rotation.Equation ( 6) gave the calculation of the value of d 1 , d 2 , d 3 , d 4 and d 5 in various combinations for 16-bit precision.In order to determine the rules for the value of λ, the cumulative value of the rotation angle corresponding to λ was viewed as the angle reference, which was denoted as θ cp .It should be noted that in the calculation, the values for d 6 ~d16 were 0. When calculating θ cp , not only the sum of the angles of d 1 ~d5 rotation may be covered, but also the micro-selected rotation angles of d 6 ~d16 should be accumulated.Equation ( 9) is the calculation of the reference angle value θ cp(m) .
Looking at the interval range of the input angle size, the redundant data is removed and the final direction prediction constants are shown in calculation result is affected by the accuracy of the direction prediction constants λ and θ.In order to satisfy the accuracy of d θ , it is necessary to make the accuracy of λ higher than that of d θ .For the 16-bit precision of d θ , λ needs 17-bit size precision.Since the integer bits of both λ and θ are 0, each prediction constant in ROM only needs 16 bits in size.Therefore, to implement the RDP-CORDIC algorithm with N bit precision, the size of the prediction constants λ and θ in ROM is also the same as N bit.The minimum angular reference value λ cp5 for different values of λ is given in Table 3.The process of rotation direction prediction can be summarized as follows: 1.
Compare the input angle with θ cp in the direction prediction constant, and select the value of λ corresponding to a value close to and less than or equal to θ cp ; 2.
The binary value d θ representing the micro-rotation direction was calculated based on λ.Finally, the prediction of the micro-rotation direction in the non-iterative case was performed.

ROM Resource Optimization
A 14 × 16 × 2 bit ROM resource was required to make the above-mentioned 16-bit precision direction prediction.According to the theory of rotation direction prediction algorithm proposed in Section 4.1, the N bit width accuracy required a ROM of 2 [(N−log 2 3)/3] × N bit size, and the ROM consumption increased sharply with the increase of accuracy.The reason for the sharp increase of ROM consumption was that the high accuracy direction prediction asked for more λ values to be selected, leading to an increase in the table of direction prediction constants.It may be useful to analyze the λ expansion and let m = [(N − log 2 3 )/3)], then λ m and λ m+1 are as in Equations ( 10) and (11), respectively. Let , and the first 10 iterations of µ i are given in Table 4. Combined with Taylor's formula, when m ≥ [(N + log 2 (3/20) − 3)/5], Equation ( 11) can be reduced to Equation ( 12) Equation ( 12) is the relationship between λ m+1 and λ m , and similarly, the value of λ m+i can be calculated from λ m .Thus, an accuracy compensation algorithm for λ is proposed, where λ is composed of a fixed λ s and an accuracy compensation λ c .
Since the accuracy of the rotation direction that can be derived is equal to s × 3 + log 2 3 > m, d i in Equation ( 14) can be calculated based on λ s .To sum up, the ROM consumption of the direction prediction constant was reduced from 2 [N−log 2 3]/3 * N bit to 2 [N+log 2 (3/20)−3]/5 * N bit by using the direction prediction constant accuracy compensation method, which achieved high accuracy and reduced the ROM consumption.

Iterative Merging
After getting the direction of rotation, the conventional single-stage iterative calculation Equation (3) now can be changed to a three-stage combined iteration, as shown in Equation (15).
In summary, the flow of rotation iteration is as follows: 1.

Hardware Design of RDP-CORDIC Algorithm
The main hardware structures for implementing the CORDIC algorithm are loop iterative structures and pipelined iterative structures.The loop iterative structure is simple in design and consumes less hardware, but the computation speed slows down as the accuracy increases.The pipelined iterative structure is more complex and consumes more hardware, but the computation speed is much higher.For edge computing devices used in smart grid cyber-physical systems, the faster pipelined iterative structure is more suitable.As shown below, the workflow of the RDP-CORDIC algorithm consists of three steps.

RDP-CORDIC workflow
1. Directional rough prediction Based on the RDP-CORDIC algorithm, the implementation architecture of sine and cosine function calculation is shown in Figure 5, including an angle interval folding module, a direction prediction module, multiple three-stage iteration modules, a multi-stage merge iteration, and a triangular constant change module.The working principle is that the angle interval folding module transforms the input angle of any size into the interval [0~2π] and then sends the 3 bit angle range code to the angle transformation module.The rotation direction prediction module calculates all micro-rotation directions in advance based on the input angle, and then passes the direction values to the back-end iterative calculation module; after three-stage of running iterations and multi-stage of combined iterations, the calculated sine and cosine function values are output.Finally, the sine and cosine signals obtained by simulation with Vivado's Simulation software are shown in Figure 6.
rotation direction prediction module calculates all micro-rotation directions in advance based on the input angle, and then passes the direction values to the back-end iterative calculation module; after three-stage of running iterations and multi-stage of combined iterations, the calculated sine and cosine function values are output.Finally, the sine and cosine signals obtained by simulation with Vivado's Simulation software are shown in Figure 6.   based on the input angle, and then passes the direction values to the back-end iterative calculation module; after three-stage of running iterations and multi-stage of combined iterations, the calculated sine and cosine function values are output.Finally, the sine and cosine signals obtained by simulation with Vivado's Simulation software are shown in Figure 6.

More Applications of the RDP-CORDIC Algorithm
As described in the introduction section, the RDP-CORDIC algorithm can also be used in more areas of smart grids.In a smart grid system, the frequency of the power system is an important indicator of power quality and needs to be detected in real time.If there is a problem in a section of the smart grid, the source of the fault can be cut off in time to protect the grid.Based on CORDIC algorithm to implement FFT, it can efficiently measure the higher harmonic and interference noise of power signal.The core of FFT implementation using CORDIC algorithm is to use CORDIC algorithm to implement complex multiplication operations in FFT.The complex multiplication operation in FFT is as in Equation (18). where N nk into Equation ( 18), the imaginary and real parts of X k after being simplified are as in Equation ( 19).
The multiplication of the complex sequence and the rotation factor can be seen as the vector X 0 rotated by θ = −2nkπ/N, Then, using the CORDIC algorithm idea, we can transform Equation (19) into Equation (1).So the complex multiplication of FFT can then be implemented by the RDP-CORDIC algorithm.
In addition, the CORDIC algorithm can also realize singular value decomposition (SVD) for image denoising, data compression, etc.With the increase of matrix dimension, the computation volume of SVD grows exponentially, which has a great impact on the In Equation ( 20), a, b, c, d are the four elements of the second-order matrix G. θ R and θ L are the left and right rotation angles, calculated by Equation ( 21).The values of δ 1 and δ 2 are the singular values of the matrix G.In the above operation, both the arc tangent funcion and the sine/cosine function can be implemented by the CORDIC algorithm.The structure of the 2 × 2 SVD module based on the RDP-CORDIC algorithm is shown in Figure 7.
computation real-time of edge computing devices.Taking the 2*2 matrix G as an ex its bilateral Jacobi SVD algorithm was calculated as Equations ( 20) and ( 21).

Performance Testing and Analysis
The hardware design of the 16-bit fixed-point decimal RDP-CORDIC algorithm was implemented on the Xilinx Kintex7 325T series FPGA hardware platform using Verilog HDL.In the first place, the effect of ROM resource optimization of the predictive direction CORDIC algorithm proposed in Section 4.2 was tested, and the size of RAM resources consumed before and after the optimization was compared.Subsequently, the proposed RDP-CORDIC algorithm was tested and compared with other related CORDIC algorithms in terms of latency, resource consumption, and power consumption.Based on the hardware structure of the RDP-CORDIC algorithm, the maximum absolute value errors of the sine and cosine functions, logarithmic function, square root function, hyperbolic sine, and hyperbolic cosine function were also tested at 16-bit accuracy.Finally, we analyze the time and maximum absolute value errors of the sine and cosine functions computed using the RDP-CORDIC algorithm.

ROM Optimization Results of the RDP-CORDIC Algorithm
Figure 8 shows the comparison of ROM resource consumption before and after ROM optimization for this algorithm, from which it can be seen that the ROM consumption before the optimization is much higher than that after optimization.For 16-bit precision, the unoptimized algorithm requires a ROM size of 448 bits., while the optimized algorithm requires only 160 bits.for 32-bit precision, the unoptimized algorithm requires a ROM size of 56,320 bits, while the optimized algorithm consumes only 1888 bits.as the data bit width increases, the difference in ROM consumption before and after optimization increases significantly.

Performance Comparison of CORDIC Algorithms
The test results of the RDP-CORDIC algorithm and other related CORDIC algorithms in terms of latency and resource consumption are shown in Table 6.Apparently, in terms of latency, the R-4 CORDIC, R-8 CORDIC and Mixed-R CORDIC reduced latency but increased ROM consumption and had high hardware complexity.The RDP-CORDIC algorithm had a 70% lower latency compared to the conventional CORDIC algorithm, being parallel to the BBR-CORDIC algorithm.In terms of resource consumption, the RDP-CORDIC algorithm was similar to the CORDIC II algorithm, but the CORDIC II algorithm displayed a larger latency.Although the ROM consumption of the new algorithm was slightly higher than that of the conventional R-2 CORDIC algorithm, the RDP-CORDIC algorithm was clearly more advantageous, exchanging the ultra-small ROM capacity for 70% latency and 40% other resource consumption.In terms of power consumption, the proposed RDP-CORDIC algorithm facilitated an ultra-low power consumption of 28 mW.A comprehensive comparison showed that the RDP-CORDIC algorithm illustrated some advantages over other CORDIC algorithms in terms of latency, resource consumption and power consumption.

Test of Calculation Error and Calculation Time of Variousfunctions
Figure 9 shows the absolute error curves for the calculation of sine and cosine functions, logarithmic function, sqrt function and hyperbolic sine and hyperbolic cosine functions based on the RDP-CORDIC algorithm at 16-bit data width.The maximum magnitude error of the sine and cosine functions is clearly less than 3.04 × 10 −5 .The reason for the different errors for each input angle is that the error is 0 only when the accumulated value of the angle of directional rotation is equal to the input angle.However, the direction of rotation is not certain for different input angles, which results in the difference between the totalized rotation value and the input angle value.For other functions, the input test data is limited to a different range due to the characteristic limitations of the CORDIC algorithm.As in Figure 9c-f, the input angles are limited to [0.2, 9.5], [0.03, 2], [−1.12, 1.12] and [−1.12, 1.12], respectively.The test results show that the RDP-CORDIC algorithm performs well on a variety of functions, with maximum absolute errors less than 7.7 × 10 −4 .Because the computation time of each function is the same through the CORDIC algorithm, the sine and cosine functions are used for the test computation time.The time of the single computation of the sine and cosinefunctions for different CORDIC algorithms are compared in Table 7, and it can be found that the RDP-CORDIC algorithm takes only 60 ns at a system clock of 100 MHz. Figure 9 shows the absolute error curves for the calculation of sine and cosine functions, logarithmic function, sqrt function and hyperbolic sine and hyperbolic cosine functions based on the RDP-CORDIC algorithm at 16-bit data width.The maximum magnitude error of the sine and cosine functions is clearly less than 3.04 × 10 −5 .The reason for the different errors for each input angle is that the error is 0 only when the accumulated value of the angle of directional rotation is equal to the input angle.However, the direction of rotation is not certain for different input angles, which results in the difference between the totalized rotation value and the input angle value.For other functions, the input test data is limited to a different range due to the characteristic limitations of the CORDIC algorithm.As in Figure 9c-f, the input angles are limited to [0.2, 9.5], [0.03, 2], [−1.12, 1.12] and [−1.12,1.12],respectively.The test results show that the RDP-CORDIC algorithm performs well on a variety of functions, with maximum absolute errors less than 7.7 × 10 −4 .Because the computation time of each function is the same through the CORDIC algorithm, the sine and cosine functions are used for the test computation time.The time of the single computation of the sine and cosinefunctions for different CORDIC algorithms are compared in Table 7, and it can be found that the RDP-CORDIC algorithm takes only 60 ns at a system clock of 100 MHz.  100 R-8 CORDIC [28] 80 BBR-CORDIC [34] 60 CORDIC II [38] 90 RDP-CORDIC [proposed] 60 Finally, the maximum absolute error of sine and cosine functions realized with each algorithm under different bit widths was tested, and the test results are shown in Figure 10.The results show that the RDP-CORDIC algorithm maintains the optimal performance in various bit width cases, and the error of the RDP-CORDIC algorithm is much lower than that of the CORDIC II algorithm, with similar resource consumption to that of the RDP-CORDIC algorithm.The reason why the RDP-CORDIC algorithm has higher accuracy compared with other algorithms is that the RDP-CORDIC algorithm calculates the rotation direction by formula once before iteration, while other CORDIC algorithms need to calculate the rotation direction each time, and multiple calculations may cause accuracy degradation.

Conclusions
Edge computing devices used in smart grid cyber-physical systems require real-time high-speed data processing capabilities with low power requirements.The CORDIC algorithm is widely used as a high-speed real-time numerical computation algorithm in hardware accelerator for edge computing devices.Limited by the excessive number of iterations and resource consumption, conventional CORDIC algorithms are too large to perform well in edge computing of smart grids.In this paper, a RDP-CORDIC algorithm was proposed in attempt to predict all micro-rotation directions in the non-iterative case, and then transform the conventional single-stage iteration structure into three-stage combined iteration and multi-stage combined iteration structure.An accuracy compensation algorithm for the direction prediction constants of the RDP-CORDIC algorithm was also proposed, which reduces the ROM consumption to 0.33% of the original at 16-bit accuracy.Finally, the hardware design of RDP-CORDIC algorithm was implemented on Xilinx Kintex7 325T series FPGA platform, along with the calculation of sine function, cosine functions, logarithmic function and other functions, accordingly.The test results showed that the RDP-CORDIC algorithm was plainly superior to other CORDIC algorithms in terms of latency, resource consumption and power consumption.The time and the maximum absolute error of the sine and cosine functions computed by the RDP-CORDIC algorithm also had some advantages over other CORDIC algorithms.It was experimentally confirmed that the proposed RDP-CORDIC algorithm was able to reduce resource consumption and increase the computational speed while maintaining the computational accuracy compared to other improved CORDIC algorithms.In the edge computing for signal processing of smart grid cyber-physical systems, the RDP-CORDIC algorithm exhibited its potential to effectively improve the speed of real-time data processing and reduce the power consumption of edge computing.The application of the RDP-CORDIC algorithm will be the focus of our future work: to improve the speed of smart grid topology identification and line loss rate calculation.Further efforts will be made with a particular focus on the signal processing capability of edge computing devices in the smart grid cyber-physical system.

Figure 1 .
Figure 1.Basic Structure of Edge-enabled Smart Grids.

Figure 1 .
Figure 1.Basic Structure of Edge-enabled Smart Grids.

Figure 2 .
Figure 2. Edge computing framework based on heterogeneous computing.
The pipeline structures of the classical optimized CORDIC algorithm and the RDP-CORDIC algorithm are shown in (a) and (b) of Figure 4, respectively.The direction of each rotation of the classical optimized CORDIC algorithm depends on the sign of the remaining angle θ n , and the result needs to be iteratively calculated in a [(N − 1)/2] stage pipeline method.The structure of RDP-CORDIC algorithm consists of two parts: the direction prediction part on the left side and the rotation iteration on the right side.The direction prediction module calculates all micro-rotation directions in advance, while the rotation iteration part transforms the single-stage iterative structure into a three-stage and multistage iterative structure since all micro-rotation directions are known.The new structure cut off part of hardware overhead and latency compared to the conventional structure.Sensors 2022, 22, x FOR PEER REVIEW 10

Figure 4 .
Figure 4. Structure of the classical optimized CORDIC algorithm and the RDP_CORDIC algorithm: (a) Classical optimized CORDIC algorithm pipeline structure; (b) Structure of RDP-CORDIC algorithm.

Figure 4 .
Figure 4. Structure of the classical optimized CORDIC algorithm and the RDP_CORDIC algorithm: (a) Classical optimized CORDIC algorithm pipeline structure; (b) Structure of RDP-CORDIC algorithm.

Figure 5 .
Figure 5. Implementation of RDP-CORDIC algorithm with sine and cosine functions.

Figure 6 .
Figure 6.Simulation waveforms of calculated sine and cosine functions.

Figure 5 .
Figure 5. Implementation of RDP-CORDIC algorithm with sine and cosine functions.

Figure 5 .
Figure 5. Implementation of RDP-CORDIC algorithm with sine and cosine functions.

Figure 6 .
Figure 6.Simulation waveforms of calculated sine and cosine functions.

Figure 6 .
Figure 6.Simulation waveforms of calculated sine and cosine functions.
), a, b, c, d are the four elements of the second-order matrix G. θL are the left and right rotation angles, calculated by Equation (21).The values of  2 are the singular values of the matrix G.In the above operation, both the arc t funcion and the sine/cosine function can be implemented by the CORDIC algorith structure of the 2 × 2 SVD module based on the RDP-CORDIC algorithm is shown ure 7.

Figure 8 .
Figure 8. ROM resources required at different bit widths before and after optimization.

Figure 9 .Table 7 .Figure 9 .
Figure 9.The various functions calculation error base on RDP-CORDIC algorithm: (a) Absolute error of sine signal value; (b) Absolute error of cosine signal value; (c) Absolute error of ln(x) value; (d) Absolute error of sqrt value; (e) Absolute error of sinh value; (f) Absolute error of cosh value.Table 7.Time consumed for the first calculation of the sine and cosine function.CORDIC Algorithm Time (ns) R-2 CORDIC [26] 190 R-4 CORDIC [24] 100

Figure 10 .
Figure 10.Maximum absolute error of sine and cosine signals calculated by various CORDIC algorithms at different bit widths.

Table 1 .
Application of CORDIC Algorithm.

Table 1 .
Application of CORDIC Algorithm.

Table 2 .
Features of related CORDIC algorithms.

Table 3 .
The final direction

Table 3 .
Prediction constant table for 16-bit precision direction.

Table 4 .
Value of µ i constants for the first 10 iterations.

Table 6 .
Performance comparison results of different CORDIC algorithms.

Table 7 .
Time consumed for the first calculation of the sine and cosine function.