^{1}

^{*}

^{2}

^{2}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Time derivative estimation of signals plays a very important role in several fields, such as signal processing and control engineering, just to name a few of them. For that purpose, a non-asymptotic algebraic procedure for the approximate estimation of the system states is used in this work. The method is based on results from differential algebra and furnishes some general formulae for the time derivatives of a measurable signal in which two algebraic derivative estimators run simultaneously, but in an overlapping fashion. The algebraic derivative algorithm presented in this paper is computed online and in real-time, offering high robustness properties with regard to corrupting noises, versatility and ease of implementation. Besides, in this work, we introduce a novel architecture to accelerate this algebraic derivative estimator using reconfigurable logic. The core of the algorithm is implemented in an FPGA, improving the speed of the system and achieving real-time performance. Finally, this work proposes a low-cost platform for the integration of hardware in the loop in MATLAB.

The derivative estimation of a measured signal has considerable importance in signal processing, numerical analysis, control engineering or failure diagnostics, among others [

Owing to the measurement, signals are inevitably corrupted by some additive noises—hardware noise of the equipment, background noises, and so on—and so, filtering is a must.

A number of different approaches have been proposed. A common approach is based on least-squares polynomial fitting or interpolation for off-line applications [^{n}

A novel method on derivative estimation based on extensions of techniques for nonlinear closed-loop parametric estimation was introduced by Fliess and Sira-Ramirez in [

FPGA-based SoCs have become an alternative with a growing demand in the solution of computational systems during the last decade. Systems formed by embedded processors plus reconfigurable logic used to accelerate specific application parts can be found in a single chip. FPGA vendors, such as Xilinx, Altera or Actel, offer hard processors, typically ARM multicores, including an AMBAbus interface, in their logic fabric.

The process needed to accelerate applications using reconfigurable hardware has required so far some expertise in hardware design to obtain maximum benefits from the hardware platform where the design is being implemented. The research community has been working on high-level synthesis tools for more than a decade, in order to close the enormous productivity gap for FPGA design, and recently, releases of several of these tools (such as Catapult-C from Mentor Graphics, Vivado HLS from Xilinx or Synphony from Synopsys) are beginning to show good expectations.

In this work, we introduce a novel architecture to accelerate the algebraic derivative estimator using reconfigurable logic. The core of the algorithm that will be described later will be implemented in an FPGA in order to improve the speed of the system and achieve real-time performance.

The purpose of the hardware implementation is double:

On the one hand, it is the acceleration of the computations, in order to obtain real-time hardware performance of the whole system.

On the other hand, the aim is to provide a low-cost hardware platform for a hardware in the loop implementation of a system and make it accessible for users with no previous hardware design experience. Here, the term platform has a broad meaning, in the sense of a heterogeneous solution combining a general purpose node plus a hardware accelerator. However, a concrete implementation has been built to demonstrate the validity of the approach, combining a MATLAB testing framework with a hardware prototyping board to host the accelerated estimator.

Using high-level synthesis (HLS) tools, we developed an algebraic derivative estimator starting from a software description of the algorithm. The MATLAB model does not provide support for real-time, so, in this work, we provide an infrastructure that support the estimation of algebraic time derivatives in real time. This estimator, entirely implemented in hardware, can be used as a plug-in in the MATLAB environment, providing benefits for signal analysis in real time. The plug-in is implemented using the “MATLAB executable” (MEX) file and integrated into the MATLAB system through Gigabit Ethernet, but any communication technology supported by the operating system could be used.

The MATLAB environment allows integration of C language, but in this case, due to the hardware implementation of modules, only a set of synthesizable code is integrated in MATLAB environment. This synthesizable code is described in C, and it is converted into an object file. Once the functionality of the component described in C is validated, it will be synthesized into a register transfer level (RTL) description for simulation. Finally, this synthesized code will be implemented in hardware. In all cases, and in each one of the stages of the validation process, the component is integrated to the modeled system in MATLAB through the MEX file plug-in.

This hardware in the loop testing system introduced in this work is implemented in a low-cost platform formed by a PC and an FPGA. This solution allows one to facilitate the integration of the benefits of acceleration of the algorithm in the hardware plus the versatility of the MATLAB simulation environment for signal processing. In this platform, the FPGA gives support for hard real-time functionality in the simulation and measurement processes that could not be provided by a standard PC.

There are, however, some other alternatives when considering the hardware acceleration of a certain part of a MATLAB computation, as will be described in the related work. Some of them use MATLAB M code as the high-level description algorithm, which will be later synthesized into a hardware implementation. Others take a library-based approach, where the accelerator makes use of a set of highly parameterizable components and vendor-specific hardware translators. The first set of solutions, among which, one can cite the one proposed in this work, allow for more generic algorithms and, hence, provide more opportunities for later reuse. The second set provides a higher degree of tuning of the different components of the system, and results in more optimized accelerators. However, most of them are closed solutions only available for a certain set of development boards, mainly because the proprietary interface between the MATLAB environment and the hardware accelerator generated. Such closeness also affects how the accelerator can be interfaced with other hardware resources, such as, for example, data coming from a sensor or a third party interface.

On the other side, the combination of C to hardware synthesis and the automatic generation of the proper adapters, later described in the Design Flow section, can be easily extended to different types of implementations, such as a fully integrated Hw-Swsolution running on a reconfigurable system-on-chip.

The use of a C-based HLS flow has then certain advantages:

C, the most common system-level language, and there are many algorithms already available, increasing reuse opportunities.

The HLS synthesis process provides a higher degree of control of the synthesized hardware interface, thus allowing for the interaction with other hardware components available in the system.

While it may not provide the optimum hardware implementation, it can be mapped into a large set of hardware platforms

The main contributions of this work can be summarized as follows:

First, it presents an algorithm for algebraic derivative estimation totally implemented in hardware from a pure C description.

Next, the algorithm is implemented using double precision floating point arithmetic.

Finally, it proposes a low-cost platform for the integration of hardware in the loop in MATLAB.

In an observable system, the state estimation problem is intimately related to the problem of computing the successive time derivatives of the output and input signals in a sufficiently large number (see [

Some steps of the derivation presented in [_{r}_{r}^{−str}, originates from the time shift to _{r}^{−str} ≠ 0, the following result is obtained:
^{(}^{i}^{− 1)}(_{r}^{−}^{ν}_{j}_{j}_{r}_{r}_{r}, t_{r}_{r}_{r}, t_{r}

The validity of the formulae for the calculation of the time derivatives, ^{(}^{i}^{)}(_{r}_{r}_{r}

From _{r}_{r}_{1} and _{2}, which are defined as follows:
_{1} and, similarly, _{2} for the second identifier. The first identifier is re-initialized when _{1} = 0 and the second identifier when _{2} = 0. The proposed technique resets one of the estimators while the original remains active and

This policy is called a switched overlapping estimator technique.

In our application, only the measured signal, _{r}_{r}, i.e._{r}_{r}_{e}, ÿ_{e}_{e}

The hardware in the Loop approach has proven to be very effective in the development of complex engineering problems, where an initial mathematical model of a part of the system can be gradually refined into a hardware implementation, while at the same time keeping the advantages of a high-level modeling environment.

The estimator presented in this paper matches this kind of problem, and therefore, one of the objectives of this work was to provide a convenient hardware platform, as well as a high-level design-flow, such that no special hardware design skills would be required to complete the design.

The main difference with respect to other proposals lies in the combination of a low-cost FPGA-based prototyping generic platform, combined with a general-purpose PC, and the use of high-level synthesis tools, as opposed to the proprietary library-based solutions mostly used in HIL platforms.

The general-purpose PC holds the core of the developing environment, and, hence, runs the MATLAB environment, while the hardware accelerators will be hosted by the FPGA. Additionally, in parallel with the physical interconnection of both computational nodes, the logical link between each part of the system will be carried out through MEX files. Those files are provided as a way to extend MATLAB's functionality, and their primary use is the acceleration of performance-critical routines through the execution of native compiled C/C++ code. Additionally, they provide a way to interface to the modeling environment, which matches the case of the mixed architecture proposed (

As will be later described in the design flow Section 4, MEX files will be used for two different purposes in the prototyping platform:

For the implementation of the C version of the derivative estimator computations.

As an interface to the hardware implementation of the routine functionality. This interface will automatically be generated, depending on the underlying communication technology (PCI, USB, Gb Ethernet) and the signature of the routine (arguments and return values). It can also distinguish between the execution of a hardware simulation running on a third-party tool or the real hardware implementation on the FPGA.

The implementation of custom hardware accelerators is not a straightforward task, and there are several questions to take into account, such as the functional equivalence of the hardware component with respect to the mathematical version or the interfacing between the hardware and software environment. In order to simplify the complexity of these tasks and to obtain a correct design on the first try, the following subsections describe the suggested design flow.

The initial step would be the definition of the model and the validation environment. This typically implies the writing of one or several MATLAB M files. Here, the only consideration to take into account is that the functionality to be implemented in hardware should have a clear interface and be modeled in a separate file, such that it can be later replaced transparently.

The main purpose of the MATLAB tool is to provide a powerful and interactive high-level modeling environment, and for that reason, most of the code is run through an interpreter. However, many applications demand higher performance computations, possibly running native compiled C/C++ code. This is done through the use of an extension and an associated compiler, called MEX (MATLAB executable). The use of MEX compiled code may improve performance up to two orders of magnitude when compared to the equivalent M model; however, that is gained at the cost of portability, since the resulting binary files can only be executed on the native architecture for which it was compiled.

It is not strange then to find many applications where native MATLAB code is translated into high-performance C models. Here, the translation is typically done manually, although several contributions have been proposed to automate this tedious and error-prone task.

One alternative for the FPGA-based acceleration of a certain part of the system is the use of the SIMULINK environment and FPGA vendor plugins, such as the DSP library provided by System Generator from Xilinx, so the models can be later compiled into hardware. The main advantage of the solution is the high degree of control of the solution at the expense of extra design effort and less reuse possibilities.

On the other side, the C-based approach taken in this work is more focused on facilitating the use of mixed-mode implementations to designers with no special hardware design skills, than generating the best possible hardware implementation of a function. One of the advantages of the approach is that the resulting model can be used both as a software accelerator or as the source to automatically derive a hardware implementation using high-level C-to-hardware synthesis. In this context, the fact that current high-level synthesis tools provide support for double precision floating point arithmetic, which is the one used by default in MATLAB, thus avoiding the problem of floating-point to fixed-point model conversions, is particularly helpful.

As already described, the interface between C models and the MATLAB run-time is performed through the MEX interface. When the MEX file is only used as a wrapper of a corresponding C function, it can be automatically generated from the signature of that function.

The current state-of-the-art in synthesis tool technology provides support for a big subset of the C language, excluding those aspects that cannot easily be mapped to a hardware implementation, such as dynamic memory management. Furthermore, the coding style and mapping rules have been simplified, broadening the community of users and not just targeting engineers with a hardware design background. It is only required to understand some basic concepts related to how compilers work.

Most of these tools accept a plain C description that can be shaped into a hardware implementation, through the definition of certain directives, such as the type of protocol, to define for the reception of the arguments, loop manipulation or the identification of blocks that may benefit from a parallel implementation. The whole process of directive definition, synthesis and results analysis takes no more than a few minutes, providing a means to quickly explore the design space, which contrasts the several weeks that the same task would require following classic hardware design flows.

Therefore, the synthesis step in the design flow takes the C code available from the previous stage and generates a register-transfer-level hardware description model, where all operations are described at the clock level. Additionally, a set of directives can be defined to control the performance and cost of the resulting design.

There is a big step from a purely functional model to a hardware description: it is necessary to validate the result of the synthesis before proceeding to the next step. This will require the use of a hardware simulator to run the model and an appropriate test-bench. The solution to this problem is depicted in

All the steps in the process, starting from the C code synthesis and ending in the mixed-mode simulation, are automated through a series of scripts, including the generation of the socket-based MEX file.

The final step in the design flow consists in the real hardware implementation of the model obtained from the C synthesis process. Additionally to the corresponding hardware core, it is also necessary to integrate the logic that will provide an interface between the FPGA-based prototyping board and the PC running MATLAB. There are a number of different solutions completely dependent on the concrete communication technology provided by the board.

The example depicted in

On possible solution is the case described, where the socket interface in the simulation model is replaced by an Ethernet MAC plus the corresponding FIFObuffers, which can be directly connected to the hardware component generated from the synthesis. This makes the hardware implementation process quite straightforward and completely independent of the concrete functionality implemented by the core. On the PC side, no other modifications than setting the target address for the socket are required. Again, this process is completely automated.

The resulting performance of the mixed-mode implementation will depend on several factors, but the main ones will be the degree of parallelism inherent to the algorithm and the bandwidth and latency of the communication interface between the hardware and the software.

To assess the validity of the overlapped dynamic estimator, several MATLAB simulations have been performed, where the derivatives of the input signal are previously computed and later checked against the values obtained from the estimator.

^{sin}^{(}^{wt}^{)} + 0.0001 *

Once the mathematical model has been proven to be correct, the next step for the physical implementation of the estimator was the manual translation into a C-language description. This translation is quite straightforward, but for the fact that C operators do not support matrices types, and therefore, they are replaced by function calls.

After the translation of the code, a MEX wrapper for the adaptation of MATLAB types and C arguments was generated, and the whole simulation was run again to prove its correctness.

The C code was synthesized using the default settings of the Vivavo HLS tool. ^{5} iterations of the initial example would be of 1.52 s. This time does not include communication overhead that, as will later be analyzed, is communication technology dependent and may become the bottleneck of the accelerator.

The validation of the correctness of the synthesized component was performed according to the procedure described in Section 4.3, using the QuestaSim simulator and replacing the previous MEX wrapper calling the C estimator code with another MEX file that connects MATLAB and the simulator using UNIX sockets.

The hardware platform for the validation of the estimator described in this work combines a general-purpose, 4-GB RAM and i5 processor PC and the Xilinx ML605 FPGA prototyping board. This board provides a big capacity FPGA and several different communication technologies, and therefore, it is a great platform for experimentation. However, it is possible to find really inexpensive boards for no more than a few hundred dollars, such as the Zedboard [

Among the different hardware interfaces available in the board, the one finally used was 100 Mb/s Ethernet. Communication between the MATLAB environment and the board has been performed though RAWEthernet messages using a point to point connection. Here, the MEX interface has simply been used to transfer data blocks to and from the board using a RAW socket.

The hardware cost of the system has already been described in Section 2.

To measure the improvement in performance, a full MATLAB version and the mixed Hw-Sw architecture have been run, resulting in around 50 s for the fist case and a bit more than two seconds in the second one. In both cases, the parameters used correspond to those shown in the first case in ^{5} iterations of the Runge Kutta-based estimators. From the two seconds, 0.5 where due to the communication overhead introduced by the Ethernet message-passing mechanism, while the remaining 1.5 s purely correspond to the FPGA computation time. This was achieved with a 125-MHZ clock cycle. As an average, the FPGA is able to compute one complete algorithm iteration in 7.6

As described in the introduction, one of the purposes of the work was to provide a low-cost solution accessible to non-hardware designers. In that sense, the use of a heterogeneous solution where MATLAB provides the testing environment, while the estimation algorithm is synthesized into a general-purpose prototyping platform, has clear benefits. On the other side, that is not the best use-case for demonstrating the peek performance of an integrated Hw-Sw sensor system. In that case, many systems-on-chip available nowadays would be a better option. However, the use of the developed derivative estimator is not privative of a MATLAB environment. This derivative estimator can also be integrated with those applications running in an embedded processor requiring derivative calculation. The hardware derivative estimator is interfaced through FIFOs, which are later adapted for a concrete communication channel. Those adapters can easily be replaced to fit into any other communication channel technology, such a system bus, as is described in [

The development of hardware accelerators of specific algorithms is not an easy task, requiring hard work from the designer in order to obtain an optimized component. The optimization process is highly dependent on the target technology and, in general terms, is not easily portable from one technology to another.

Some works found in the literature developed their own compiler in order to automate the hardware generation of accelerated components, such as [

In [

The benefits of FPGA make them good candidates to achieve high control performances in industrial control applications. In this kind of system, tools to generate HDLcode, such as the SIMULINK HDL coder, are used to implement the HDL generated components in FPGAs. In [

In [

Recent research in the design of time derivative signal estimations based on algebraic methods has been carried out [

This paper implements a novel architecture to accelerate the algebraic derivative estimator using reconfigurable logic. The derivative estimator is based on the use of an overlapping implementation of the algebraic derivative estimation method to compute the time derivatives of the input signals. Two shifted estimators are used, to guarantee that one of them is convergent, while the other estimator is starting to diverge and is thus being properly reset. The outcome of this technique is a considerable acceleration of the convergence of the computation transients that occur just after the estimation resetting. Additionally, the algebraic derivative algorithm presents the following advantages: (i) the algorithm is computed online and in real-time; (ii) high robustness properties with regard to corrupting noises, without the need to know their statistical properties; (iii) versatility and easy implementation; and (iv) it can be robustly applied in a wide range of engineering applications, like signal processing and automatic control.

Complementary to the algorithm, a generic prototyping architecture has been proposed. This architecture might not be the best solution for every case, but as is exposed in the Experimental Results section, it will provide very reasonable results with little effort and cost. The quality of the results will depend both on the concrete algorithm to implement and the communication interface between the hardware and software platforms. In the latter case, there is still room for improvement by the use of Gb Ethernet or PCI express, depending on the delay requirements of the model.

One of the advantages of the HIL approach is the improvement in performance with respect to the native MATLAB models, as is the case of the experiment described in the paper, where the mixed mode solution reduces computation time by one order of magnitude. However, that is not the main purpose of the work, since those speed gains can equally be obtained through C-based accelerators. The real advantage lies in that any kind of hardware can be integrated into the mixed Hw-Sw platform, even for real-time operation. For example, the estimator described in this paper has been tested using data computed from a MATLAB function as the input. However, there is no reason why the input cannot be obtained from the digital hardware in the prototyping board or from a real-time A/D converter. In fact, the ultimate purpose of this work is to provide a mixed Hw-Sw prototyping platform where the estimator can be integrated with real sensors into the same chip, with the purpose of providing enhanced sensors, that provide better quality data.

This work is supported by the Spanish Government, Science and Innovation Department under project DREAMS (TEC2011-28666-C04-03).

The authors declare no conflict of interest.

_{1}error criteria

Time setting for each identifier.

MATLAB Hardware prototyping platform.

Overlapped estimator for the first derivative.

Overlapped estimator for the first, second and third derivatives.

Runge Kutta method C code.

Error analysis for the derivatives.

Ts = 0.0001, t_reset = 4 s, error < 0.005 at 0.41 s | ||||

| ||||

first | −8.7077 × 10^{−5} |
1.3058 × 10^{−6} |
−5.0202 × 10^{−3} |
2.3299 × 10^{−3} |

second | −9.9813 × 10^{−4} |
5.3547 × 10^{−4} |
−2.4609 × 10^{−1} |
3.4598 × 10^{−2} |

third | −6.5670 × 10^{−3} |
1.6780 × 10^{−1} |
−7.2022 | 4.1359 × 10^{−1} |

| ||||

Ts = 0.0001, t_reset = 2 s, error < 0.005 at 0.41 s | ||||

| ||||

first | 2.8323 × 10^{−5} |
2.0411 × 10^{−6} |
−5.0202 × 10^{−3} |
4.3080 × 10^{−3} |

second | 4.1176 × 10^{−4} |
1.6769 × 10^{−3} |
−2.4609 × 10^{−1} |
1.7306 × 10^{−1} |

third | 3.6795 × 10^{−3} |
6.5018 × 10^{−1} |
−7.2029 | 4.1523 |

| ||||

Ts = 0.001, t_reset = 4 s, error < 0.05 at 0.32 s | ||||

| ||||

first | −4.7759 × 10^{−4} |
8.01473 × 10^{−5} |
−5.06375 × 10^{−2} |
1.3887 × 10^{−2} |

second | −7.0201 × 10^{−3} |
6.4461 × 10^{−2} |
−3.15929 | 2.7498 × 10^{−1} |

third | −4.2699 × 10^{−2} |
4.6275 × 10^{1} |
−1.1760 × 10^{2} |
3.3094 |

Synthesis results.

| ||||
---|---|---|---|---|

Estimator | 39/768 | 2745/301,440 | 3046/150,720 | 46 |

Runge Kutta | 53/768 | 3942/301,440 | 4400/150,720 | 929 |

Derivative | 53/768 | 4058/301,440 | 4620/150,720 | 950 |