Embedding an Electrical System Real-Time Simulator with Floating-Point Arithmetic in a Field Programmable Gate Array

: Real-Time Digital Simulation (RTDS) is a powerful tool in modeling and analyzing electrical and drive systems because it provides an efﬁcient and accurate process. There are several hardware devices for this type of simulation; however, their high costs have led to the increasing use of more affordable and reconﬁgurable technologies. In this context, many logic blocks and storage elements make the Field Programmable Gate Array (FPGA) an ideal device to perform RTDS. This work proposes a technique to embed a real-time digital simulator in an FPGA through Hardware Description Language (HDL) since it provides liberty in the architecture choice and no dependency on commercial ready-made hardware–software packages. The approach proposed focuses on system design developing with expression tree graph, synthesizing and verifying, prioritizing the performance and design accuracy concerning area and power consumption. Thus, the result acquisition occurs at a time step considered in real-time. A simulation of a direct current (DC) motor speed control has been incorporated into this work as an example of application, which includes the embedding and simulation of the electric machine and its drive system. Performance tests have shown that the developed simulator is real-time and makes possible realistic analysis of the interaction between the plant and its control. In addition, an idea of the hardware requirement for real-time simulation is proposed based on the number of mathematical operations.


Introduction
Scientific computing is characterized mainly by heavy computation tasks requiring efficient number-crunching systems and rapid changes in the input data to be processed [1]. The RTDS is a powerful tool in scientific computing because it provides an efficient system design process, higher accuracy when compared with an offline simulation tool, parallel processing, and it is also being used due to its rapid prototyping [2]. To reproduce a phenomenon faithfully, the simulator needs to solve grid-scale model equations for one time-step within the same time as a real-world clock. Thus, the simulation execution time must be shorter than the selected time-step [2].
The RTDS applied to the domain of electrical and drive systems can be classified as (1) fully digital real-time (RT) simulation and (2) hardware-in-the-loop (HIL) RT simulation [3]. The first one consists of supercomputer technology, which makes it able to simulate transients in large networks. During the 90s, the production of fully digital RT simulators started and some of these tools continue to be featured in simulations and research today. The following can be cited: the RealTime Digital Simulator (RTDs) developed by the Center of Research in the HVDC of Manitoba, the HYPERSIM simulator of the Figure 1 shows the equivalent circuit for the independent excitation DC motor type and Table 1 presents all parameters and variables used in this modeling.

DC Motor Drive System Modelling
The test system adopted in this work is the DC motor speed control. Hence, this section gives a brief description of the whole system operation and modeling stages. The DC motor is composed of a field winding in the stator and an armature winding in the rotor [14]. Figure 1 shows the equivalent circuit for the independent excitation DC motor type and Table 1 presents all parameters and variables used in this modeling.  According to the Kirchhoff's Voltage Law, the differential equations obtained from Figure 1 are (1)- (2), where is given by (3).
This back-emf is induced on the armature circuit due to its kinetic movement in relation to the field winding. The motor movement equation is represented as follows:  According to the Kirchhoff's Voltage Law, the differential equations obtained from Figure 1 are (1)- (2), where E a is given by (3).
This back-emf is induced on the armature circuit due to its kinetic movement in relation to the field winding. The motor movement equation is represented as follows: where, T e (t) = K m I f (t)I a (t).
The DC motor speed control requires a transfer function relating the state variable ω r to the control variable V a to be imposed on the motor to establish the angular speed tracking setpoint. Applying Laplace's Transform in Equations (1)-(5) and rewriting them, it is possible to reach the transfer function:

Model Description and Discretization
The DC motor drive system is composed of an electric machine, power converter, low-pass filter and PID controller, whose structure is illustrated in Figure 2. To simulate it in a programmable logic platform is necessary to apply a discretization method in the whole system equations. For this purpose, the most straightforward and classic method is used to solve numerically differential equations, Euler's method. where, The DC motor speed control requires a transfer function relating the state variable ωr to the control variable to be imposed on the motor to establish the angular speed tracking setpoint. Applying Laplace's Transform in Equations (1)-(5) and rewriting them, it is possible to reach the transfer function:

Model Description and Discretization
The DC motor drive system is composed of an electric machine, power converter, low-pass filter and PID controller, whose structure is illustrated in Figure 2. To simulate it in a programmable logic platform is necessary to apply a discretization method in the whole system equations. For this purpose, the most straightforward and classic method is used to solve numerically differential equations, Euler's method.

PID Controller
The PID controller calculates the reference armature voltage, Va*, to be sent to the converter. The controller plant is described in (10).
The PID discrete implementation is obtained through backward finite differences and algebraic manipulations. Thus, (10) is transformed in (11)- (13). These three equations

PID Controller
The PID controller calculates the reference armature voltage, V a *, to be sent to the converter. The controller plant is described in (10).
The PID discrete implementation is obtained through backward finite differences and algebraic manipulations. Thus, (10) is transformed in (11)- (13). These three equations act to regulate the error of different instants of time, been (11) responsible for the current instant, (12) previous instant and (13) for the instant before the PD.

Power Converter
The converter used is of the H-bridge type, which has the schematic diagram shown in Figure 3. The pair of switches S1 and S4 work as the asynchronous buck mode while the pair S2 and S3 work as the boost mode. Thus, this circuit can impose a two-level output voltage. Pulse-width modulation (PWM) signals are used to control the switching to obtain the average value of the output equal to V a *. act to regulate the error of different instants of time, been (11) responsible for the current instant, (12) previous instant and (13) for the instant before the PD.

Power Converter
The converter used is of the H-bridge type, which has the schematic diagram shown in Figure 3. The pair of switches S1 and S4 work as the asynchronous buck mode while the pair S2 and S3 work as the boost mode. Thus, this circuit can impose a two-level output voltage. Pulse-width modulation (PWM) signals are used to control the switching to obtain the average value of the output equal to Va*. Considering the duty-cycle ratio T1/T [10], T1 being the time interval for pair S1 and S4 turned on and T the PWM period, the average value of armature voltage can be considered as in (15).
From (15), it is possible to obtain (16), where the instant T1 calculation can be seen based on Va*. In this way, it is possible to determine the exact time of switching for each PWM period.

Filter
As shown in Figure 3, after Va* is converted into a physical value, it is also addressed by a low-pass filter. This filter attenuates the ripple produced by the converter switching, allowing the passage only of low frequencies. Using converter output voltage, Vc, and the Considering the duty-cycle ratio T 1 /T [10], T 1 being the time interval for pair S1 and S4 turned on and T the PWM period, the average value of armature voltage can be considered as in (15).
From (15), it is possible to obtain (16), where the instant T 1 calculation can be seen based on V a *. In this way, it is possible to determine the exact time of switching for each PWM period.

Filter
As shown in Figure 3, after V a * is converted into a physical value, it is also addressed by a low-pass filter. This filter attenuates the ripple produced by the converter switching, allowing the passage only of low frequencies. Using converter output voltage, V c , and the filter current, I c , and applying the Kirchhoff's Voltage Law on the filter circuit, it is possible to find the converter voltage, described in (17): Utilizing Euler's Method in (17) and performing algebraic manipulations with I c , we can obtain (18): where V c is considered to meet the same value as the average output control signal, V a . Thus, the necessary electric current that will scroll thought the converter is determined.

FPGA Features
In this section all the relevant points in FPGA architecture and numerical patterns are discussed, considering their applications to RT simulator development.
The hardware device employed to embed this digital system design was chosen based on the desired performance, amount of logic cells, hardware efficiency, intellectual property (IP) and memory blocks. The FPGA is composed of 220 K logic elements, 80,330 adaptive logics, 162 variable-precision DSP blocks, 38,418 × 19 multipliers, 312,320 registers, 11,740 M20K and 284 GPIO. This device is optimized for high-bandwidth performance applications and provides 12 units of 12.5 G transceiver-based functions, 1.4 Gbps Low Voltage Differential Signaling (LVDS), and up to a 72 bits wide DDR3 SDRAM interface at up to 1866 Mbps [15].

FPGA Architecture
The most basic configuration of an FPGA is based around a two-dimensional array of configurable logic blocks (CLBs) and I/O blocks interconnected via a switching matrix of wires. The modern FPGAs comprise many memory blocks and specialized circuits that enhance the efficiency of digital signal processing (e.g., transceivers channels, DSP blocks, etc). Figure 4 [15] depicts the Intel Cyclone 10 architecture; it has adopted a column I/O structure with 12.5 Gbps transceivers on the left-hand side of the die. The GPIO in vertical columns is in banks of 48 I/Os, each with a high-efficiency memory controller and an I/O phase-locked loop (PLL) [16].
filter current, Ic, and applying the Kirchhoff's Voltage Law on the filter circuit, it is possible to find the converter voltage, described in (17): Utilizing Euler's Method in (17) and performing algebraic manipulations with , we can obtain (18): where Vc is considered to meet the same value as the average output control signal, Va. Thus, the necessary electric current that will scroll thought the converter is determined.

FPGA Features
In this section all the relevant points in FPGA architecture and numerical patterns are discussed, considering their applications to RT simulator development.
The hardware device employed to embed this digital system design was chosen based on the desired performance, amount of logic cells, hardware efficiency, intellectual property (IP) and memory blocks. The FPGA is composed of 220 K logic elements, 80,330 adaptive logics, 162 variable-precision DSP blocks, 384,18 × 19 multipliers, 312,320 registers, 11,740 M20K and 284 GPIO. This device is optimized for high-bandwidth performance applications and provides 12 units of 12.5 G transceiver-based functions, 1.4 Gbps Low Voltage Differential Signaling (LVDS), and up to a 72 bits wide DDR3 SDRAM interface at up to 1866 Mbps [15].

FPGA Architecture
The most basic configuration of an FPGA is based around a two-dimensional array of configurable logic blocks (CLBs) and I/O blocks interconnected via a switching matrix of wires. The modern FPGAs comprise many memory blocks and specialized circuits that enhance the efficiency of digital signal processing (e.g., transceivers channels, DSP blocks, etc).

DPS-Floating-Point Precision
The Institute of Electrical and Electronics Engineering (IEEE) published the 754 standards for binary floating-point arithmetic in 1985 [6,7]. This standard went through a series of updates until the name of the 32 bits binary baseversion was renamed to "32-bit single-precision" [8]. The IEEE 754 single and double precision formats are shown in Figure 5a,b, respectively. The floating-point representation allows use of a wide dynamic range automatically. In contrast, the fixed-point representation is limited and requires the user to track the magnitude of numbers and deal with the problems that arise during operations with numbers of different magnitudes [8,16].

DPS-Floating-Point Precision
The Institute of Electrical and Electronics Engineering (IEEE) published the 754 standards for binary floating-point arithmetic in 1985 [6,7]. This standard went through a series of updates until the name of the 32 bits binary baseversion was renamed to "32-bit single-precision" [8]. The IEEE 754 single and double precision formats are shown in Figure 5a,b, respectively. The floating-point representation allows use of a wide dynamic range automatically. In contrast, the fixed-point representation is limited and requires the user to track the magnitude of numbers and deal with the problems that arise during operations with numbers of different magnitudes [8,16]. The both versions of IEEE 754 single and double precision convert the value of the decimal number into one floating through its three parts: signal, exponent and mantissa.
The signal is equal to 1 if the number is negative or 0 otherwise. The Exponent is the component of a finite floating-point representation that signifies the integer power; the radix is raised in determining the value of that floating-point representation. It is used when the significand is regarded as an integer digit and fraction field, and the exponent q is used when the significand is regarded as an integer [16].
Lastly, the mantissa contains the significant digits except for the leading digit. The board devices are enhanced with hardened floating-point operators in the digital signal processing (DSP) block. In this work, the DSPs slices were used through the Ips' instances to include the Multi-Cycle Custom Instruction for Floating-point embedded system. Figure 6 shows the circuit designed to execute a floating-point operation.
This work was adopted the IEEE 754 single-precision format. Thus, to realize a floating-point operation the IP's module was instated: Multi-Cycle Custom Instruction for Floating point. It is passed the operands and the code of arithmetic operation to identify the type of arithmetic computation, which could be a sum, multiplication, subtraction or division. Then, these values performed the process illustrated in Figure 6 and as the output module, which had obtained the result of its operation. In this case, the inputs abide by the IEEE 754 standard, and consequently, the output returns in the same standard format.
The Multi-cycle custom instruction supports operations (add, sub, multiply, divide) and adds support for square root, comparisons, negate and other functions [8,17]. Each of these operations has a different processing time, which is illustrated in Table 2. This way, to perform more than one operation during the same clock cycle, several modules of the multi-cycle can be instated and organized to be processed in parallel. This architecture can be customized using the Dynamic Partial Reconfiguration (DPR) feature, a reconfiguration that can be performed all or for a subset of the IPs [18,19]. The both versions of IEEE 754 single and double precision convert the value of the decimal number into one floating through its three parts: signal, exponent and mantissa.
The signal is equal to 1 if the number is negative or 0 otherwise. The Exponent is the component of a finite floating-point representation that signifies the integer power; the radix is raised in determining the value of that floating-point representation. It is used when the significand is regarded as an integer digit and fraction field, and the exponent q is used when the significand is regarded as an integer [16].
Lastly, the mantissa contains the significant digits except for the leading digit. The board devices are enhanced with hardened floating-point operators in the digital signal processing (DSP) block. In this work, the DSPs slices were used through the Ips' instances to include the Multi-Cycle Custom Instruction for Floating-point embedded system. Figure 6 shows the circuit designed to execute a floating-point operation.    This work was adopted the IEEE 754 single-precision format. Thus, to realize a floating-point operation the IP's module was instated: Multi-Cycle Custom Instruction for Floating point. It is passed the operands and the code of arithmetic operation to identify the type of arithmetic computation, which could be a sum, multiplication, subtraction or division. Then, these values performed the process illustrated in Figure 6 and as the output module, which had obtained the result of its operation. In this case, the inputs abide by the IEEE 754 standard, and consequently, the output returns in the same standard format.
The Multi-cycle custom instruction supports operations (add, sub, multiply, divide) and adds support for square root, comparisons, negate and other functions [8,17]. Each of these operations has a different processing time, which is illustrated in Table 2. This way, to perform more than one operation during the same clock cycle, several modules of the multi-cycle can be instated and organized to be processed in parallel. This architecture can be customized using the Dynamic Partial Reconfiguration (DPR) feature, a reconfiguration that can be performed all or for a subset of the IPs [18,19].

Hardware Development
The Quartus Prime Pro software 19.4 edition was the programming environment used to develop the codes regarding the real-time simulator implementation. Its compiler transforms the project algorithm into hardware commands through fast and straightforward operations. Thus, it allows reconfigurable computing in fixed hardware that adapts to the algorithm.
The methodology used for the FPGA design flow consisting of 6 stages: design specification, design development, functional simulation, synthesis and the floor, planning and place and route. This setup is shown in Figure 7.  The specification stage is the reference used for development design. It defines functional characteristics, defining its functionalities (development of a DC motor), performance required (real-time execution) and the device used.
During the development, the circuit must have its definition and description conducted by an HDL. In this work, the DC motor's real-time simulator was divided into three modules: the DC motor, controller and the converter with filter. Figure 8 illustrates this. These blocks are connected by a Datapath and instantiated in the top-level design. The internal structure of each module is composed of its respective operating equations that were previously described in Section 2.  The module architecture must be planned in order to consider the trade-off between time, power and area. Given the physical area constraint of a chip, architecture designers need to determine parameters, such as the size and the amount of the logic and routing resources, so that the designed architectures support high system integration, high performance, and high resource utilization [6,20]. In this work, the system described is a realtime simulator, so it is fundamental to have several operations in parallel. Consequently, there will be higher power consumption and a larger area occupied by the projected circuit.
This work applied a technique similar to the expression of three solutions to optimize the computation process of Figure 9. As it is a hardware project and the logic used to represent the model describes its physical circuit, it is necessary to use an instance of the Multi-Cycle Custom Instruction DSP block for each arithmetic floating operation. Once The module architecture must be planned in order to consider the trade-off between time, power and area. Given the physical area constraint of a chip, architecture designers need to determine parameters, such as the size and the amount of the logic and routing resources, so that the designed architectures support high system integration, high performance, and high resource utilization [6,20]. In this work, the system described is a real-time simulator, so it is fundamental to have several operations in parallel. Consequently, there will be higher power consumption and a larger area occupied by the projected circuit.
This work applied a technique similar to the expression of three solutions to optimize the computation process of Figure 9. As it is a hardware project and the logic used to represent the model describes its physical circuit, it is necessary to use an instance of the Multi-Cycle Custom Instruction DSP block for each arithmetic floating operation. Once this information is processed, the same optimization can be performed with level 1, in which the division and multiplication operations occur in parallel. Thus, all independent operations can be processed simultaneously, as illustrated by levels 2-3. The three modules highlighted in Figure 9 contain this methodology. this information is processed, the same optimization can be performed with level 1, in which the division and multiplication operations occur in parallel. Thus, all independent operations can be processed simultaneously, as illustrated by levels 2-3. The three modules highlighted in Figure 9 contain this methodology. Eventually, the latency time for processing different arithmetic operations will be distinctive, so it is necessary to delay on the system to ensure that all the results are already ready.
In the development stage, a relevant subtask consists of an analysis of the RTL viewer to ensure that the interconnection between the gates and project modules is correct. Thus, with the right setup, the hardware description must be performed according to the preferences and needs of the application.
In the subsequent stage, the functional simulation should be performed by testbench (TB) verification, which demonstrates that a model has met the requirements of the specification. In this work, two types of testbench were developed, one for the blocks of the microarchitecture stage and the other for the complete system. The TB provides incentives in the model (blocks or top, whole system) and analysis of its response compared to the ideal results. If there is a large discrepancy between them, the modeling needs to be reviewed (debugging process). These tests are also described in HDL and guarantee the functional correctness of the model.
Then, in stage 4, the timing analysis is performed with the TimeQuest Timing Analyzer to verify circuit performance and detect possible timing violations. The TimeQuest Eventually, the latency time for processing different arithmetic operations will be distinctive, so it is necessary to delay on the system to ensure that all the results are already ready.
In the development stage, a relevant subtask consists of an analysis of the RTL viewer to ensure that the interconnection between the gates and project modules is correct. Thus, with the right setup, the hardware description must be performed according to the preferences and needs of the application.
In the subsequent stage, the functional simulation should be performed by testbench (TB) verification, which demonstrates that a model has met the requirements of the specification. In this work, two types of testbench were developed, one for the blocks of the microarchitecture stage and the other for the complete system.
The TB provides incentives in the model (blocks or top, whole system) and analysis of its response compared to the ideal results. If there is a large discrepancy between them, the modeling needs to be reviewed (debugging process). These tests are also described in HDL and guarantee the functional correctness of the model.
Then, in stage 4, the timing analysis is performed with the TimeQuest Timing Analyzer to verify circuit performance and detect possible timing violations. The TimeQuest analyzer determines the timing relationships and checks arrival times against the times required to verify timing [21][22][23]. If the report describes that there is no violation of the requirements imposed by the designer, the project is almost finished.
On the last stage of the design flow, the place and route, floor and planning are performed. Then, the project is embedded on the FPGA device.
Thus, it was initialized, after the design flow conclusion, to obtain the simulation data via Signal Tap. Implementing the simulator in real-time is extensive and its proposed algorithm is described in Figure 10. Verifying a block-based design requires planning to ensure visibility of logic inside partitions and communication with the Signal Tap logic analyzer [24,25]. The Signal Tap logic analyzer captures and displays the real-time signal behavior in an FPGA design [26]. This platform connects with the FPGA, which already has the program running, and can access the physical addresses of the entity that have the required information [27].
In this application, the number of samples is required according to the clock as defined in the GUI. Thus, this information traveled through the cable connection from the FPGA device to the computer and was interpreted by Signal Tap. This information was standardized according to the floating-point representation format.
Finally, as the last step in developing of the simulator, a conversion program from the IEEE 754 standard to real decimal numbers was performed using Python as a programming language. The verification of the results of this converter was carried out according to [28]. Thus, with all the information standardized to real numbers, it was possible to plot the data for results analysis.
Such a procedure for the development of a real-time simulator can be replicated for Verifying a block-based design requires planning to ensure visibility of logic inside partitions and communication with the Signal Tap logic analyzer [24,25]. The Signal Tap logic analyzer captures and displays the real-time signal behavior in an FPGA design [26]. This platform connects with the FPGA, which already has the program running, and can access the physical addresses of the entity that have the required information [27].
In this application, the number of samples is required according to the clock as defined in the GUI. Thus, this information traveled through the cable connection from the FPGA device to the computer and was interpreted by Signal Tap. This information was standardized according to the floating-point representation format.
Finally, as the last step in developing of the simulator, a conversion program from the IEEE 754 standard to real decimal numbers was performed using Python as a programming language. The verification of the results of this converter was carried out according to [28]. Thus, with all the information standardized to real numbers, it was possible to plot the data for results analysis.
Such a procedure for the development of a real-time simulator can be replicated for tests and studies proposed in other works of power system area and mathematical modeling techniques. For example: in [4], an RT simulation of an asymmetrical phase domain synchronous machine on FPGA was proposed; the development of automated HIL tests with RTDS for verifying the protective relay performance is described in [4]; techniques for implementing elliptic curve point multiplication on hardware are presented in [29] and an electric field evaluation using a finite element method and proxy models to design stator slots in a PMSG is reported in [30].

Results and Discussion
The algorithm was configured to optimize arithmetic operations in parallel using the same clock cycle. Thus, in (12), for example, the first operation of multiplication was performed in parallel with the subtraction, that is, both operations were processed during the same five clock cycles. Then, three other operations were performed sequentially, multiplication, subtraction and division, so the result was computed with 30 clock cycles.
Similarly, this parallelization was applied to all modules. Thus, it is possible to synchronize the response time of the arithmetic operations of each computation through a delay, which is defined to ensure that all values were finalized at the moment the longest operation is processed. Therefore, when using the 125 MHz clock with the dynamic system with about 45 operations per cycle for this model, the time wasted to process one time-step calculation was approximately 360 ns.
Two test cases were performed to analyze the simulator's performance: 1. a DC motor startup dragging a mechanical load followed by the load withdrawal; 2. Failure in the converter switching during the DC motor speed control operation. The DC motor nominal parameters are: w r = 125 rad/s, V a = 240 V, I a = 10 A, T m = 19 Nm. The controller parameters are presented in Table 3.

DC Motor Startup with Mechanical Load Followed by Load Withdrawal
The vertical graphics of the left column of Figure 11 describe, from the top to the bottom, the PID control signal, converter and filter output voltages, armature current and angular speed. It is noticeable that the reference voltage imposed by the controller is modified from a logic signal through converter switching into a physical signal that is filtered by the low-pass filter. In this case, the motor starts up moving the nominal mechanical load connected to its shaft and, at 2.3 s, the load withdrawal occurs. It notes angular speed, armature voltage and current starting up at their nominal conditions since the motor is dragging its nominal load. After a transient time interval due to the load withdrawal, the angular speed goes to 126 rad/s and the reference voltage imposed by PID slightly decreases, since the motor changes to the empty operation.
During the transient a PID output voltage peak occurs, raising it to approximately 260 V. This variation, in turn, directly impacts the switching dynamics of the converter, thus a slight increase occurs in the output voltage of the low-pass filter. The armature current decreases to almost 0 A, since the motor torque must be enough to overcome the friction force between the rotor and the airgap. Finally, the speed graphic pictures out the speed boost during the instance of removed load, but the control system stabilized it, imposing a voltage to establish w r to track w r *. The following aspects corresponding to the angular speed transitory behavior were obtained for this scenario: •  Figure 11. Test cases performed in an RT simulator.

Failure in the Converter Switching
This scenario consists of a switching failure on the switch pair S2 and S3 of the converter, Figure 3. During the interval from 2.2 to 2.5 s, the switches show a failure, remaining at a constant high logic level. Due to this failure, only switches S1 and S4 are switched during this time interval, while S2 and S3 are kept closed. For this case, the right column of Figure 11 describes PID control signal, converter and filter output voltages, armature

Failure in the Converter Switching
This scenario consists of a switching failure on the switch pair S2 and S3 of the converter, Figure 3. During the interval from 2.2 to 2.5 s, the switches show a failure, remaining at a constant high logic level. Due to this failure, only switches S1 and S4 are switched during this time interval, while S2 and S3 are kept closed. For this case, the right column of Figure 11 describes PID control signal, converter and filter output voltages, armature current and angular speed, from the top to the bottom. During the failure occurrence, there is a high increase in the reference voltage from the PID since the control system does not obtain the angular speed error decreasing and, hence, increases the control signal to force this error to decrease. This voltage increase leads to the overmodulation of the converter, decreasing the filter output voltage and the armature current amplitude. Thus, the angular speed follows the behavior of the armature current, going through a drop during the fault, and a soft peak when the fault is removed and, finally, the operation is normalized.

Real-Time Simulation Performance
The algorithms embedded in the FPGA to simulate the DC motor model required less than 1% of the logic cells, 2.8% of the slice registers, and 4.3% of the DSPs. These low-level cells are due to the use of IPs instances for floating-point operations, which had been implemented with pipeline synthesis in specific blocks.
Considering the development methodology and techniques used to model this simulator, the time-step for simulation of other systems can be estimated, such as with that described in [31]. This system is composed of a wind turbine based on a Permanent Magnet Synchronous Generator (PMSG), whose control system imposes a power income maximization algorithm to a wind turbine. The results obtained would be approximately 81 operations in parallel, which would result in a 648 ns time-step for a physical clock of 125 MHz. It is important to emphasize that this is a time step shorter than that commonly propitiated by commercial RT simulators.
Lastly, Table 4 presents a comparison of the performance of FPGAs. Four FPGAs of different brands and prices were selected; their identifiers consist of the FPGA ID, in which ID 1 refers to [32], ID 2 [33], ID3 [34] and ID 4 [35], in which each device's clock and the number of operations required for the DC motor simulation were considered. Thus, an approximation of the time-step was performed. Among the models analyzed, some FPGAs used in HYPERSIM were selected. It is noteworthy that, among the models presented, the one with the best cost-benefit and processing capacity for larger systems is the FPGA that has ID 1, that is the board used in this work.

Conclusions
This work proposes a methodology to develop of an electrical system real-time simulator using hardware description language and embedding in a field-programmable gate array.
Step by step instructions to embed and simulate a system model in real-time are provided. It introduces the IEEE 754 floating-point standard to realize arithmetic operations and it also addresses the internal structure of field-programmable gate array architecture and its fundamental design flow, in detail. An application example is tested through DC motor speed control simulation and analysis. For the simulated cases, results are com-patible with the expected results, according to the theory of electric machines and power electronics. Furthermore, the results have shown the performance of the proposed methodology, compatible with the real-time simulation requirements. Estimating the proposed methodology's performance for simulation of a more complex system, a permanent magnet synchronous generator-based wind turbine, corroborates this conclusion. Finally, this work concludes that the proposed methodology propitiates low-cost due to the liberty in the architecture choice and no dependency on commercial ready-made hardware-software packages.
Author Contributions: The authors have made equivalent contributions. All authors have read and agreed to the published version of the manuscript.
Funding: This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001, and by the National Council for Scientific and Technological (CNPq).