Optimal Digital Implementation of Fractional-Order Models in a Microcontroller

The growing number of operations in implementations of the non-local fractional differentiation operator is cumbersome for real applications with strict performance and memory storage requirements. This demands use of one of the available approximation methods. In this paper, the analysis of the classic integer- (IO) and fractional-order (FO) models of the brushless DC (BLDC) micromotor mounted on a steel rotating arms, and next, the discretization and efficient implementation of the models in a microcontroller (MCU) is performed. Two different methods for the FO model are examined, including the approximation of the fractional-order operator sν (ν∈R) using the Oustaloup Recursive filter and the numerical evaluation of the fractional differintegral operator based on the Grünwald–Letnikov definition and Short Memory Principle. The models are verified against the results of several experiments conducted on an ARM Cortex-M7-based STM32F746ZG unit. Additionally, some software optimization techniques for the Cortex-M microcontroller family are discussed. The described steps are universal and can also be easily adapted to any other microcontroller. The values for integral absolute error (IAE) and integral square error (ISE) performance indices, calculated on the basis of simulations performed in MATLAB, are used to evaluate accuracy.


Introduction
Optimal solutions for implementing models of plants are of great interest to industry, since they enable the extension of the computer-aided simulations performed in computation software such as MATLAB/Simulink. In numerous control systems, it is essential that the process of tuning the controller on the basis of measurements of system output involves as few costly plant identification iterations as possible. A common solution involves synthesizing the plant model from the measured characteristics and implementing it in a dedicated software environment or microprocessor-based hardware platform. Significant difficulties arise when the models are described by fractional-order calculus (FOC) [1][2][3]. Much research on fractional-order control systems uses computational software for analysis and simulations. A noticeably smaller proportion addresses the problem of digital implementation of FOC equations on real devices, not only theoretically, but also practically [4][5][6][7][8]. In contrast to the well-known bounded numerical approximations of a classic integer-order derivative, such as backward or central differences, the problem arises of a constantly increasing number of discrete convolution operations over time. In order to reduce the negative impact of this issue, numerous approximation methods have been proposed [9][10][11][12], divided between the time-domain and frequency-domain. In the time domain, limited memory-based approaches are the most popular, including the Short Memory Principle (SMP) algorithm introduced by Igor Podlubny [3]. In the frequency domain, a selected range of Bode characteristics G(ω) can be approximated using the well-known Oustaloup Recursive filter algorithm (ORA) [10] or modifications thereof. Hardware implementation of fractional order models in control engineering is of great interest for the purposes of offline controller tuning and testing in already developed industrial control systems, which must not be affected in any way. An accurate, equivalent mathematical model realized as a dedicated hardware platform is usually desired. It can be very useful in such cases to use a microcontroller (MCU) as the target plant, with the output signal calculated on the basis of a transfer function of its model. Alternatively, one can consider designing an equivalent, time-continuous fractance circuit. In our research, we focus on the former approach, touching on the problem of discretization of the fractional-order model and combining approximation methods with universal optimization programming techniques to improve performance, reduce computation time, and limit the size of the occupied microcontroller memory with a negligible impact on accuracy. As a rule of a thumb, optimal implementation allows a higher order N of approximation formulas, producing more complex but accurate equations, which can be computed during the same constant sampling period. The presented example of microcontroller implementation is an essential part of the testing hardware platform, which is designed for the purpose of developing a sophisticated variable fractional-order PID (VFOPID) controller to be used in a closed-loop control system with multiple brushless DC (BLDC) motors. The paper is arranged as follows: In Section 2, the proposed testing platform and plant models are described. A description of the MCUs selected for the experiments is also provided. In Sections 3 and 5 approximation and discretization techniques, useful for the implementation of the models on the target platform, are discussed. Two approaches are considered: approximation with an ORA and numerical evaluation of the fractional differential equation using a truncated Grünwald-Letnikov (GL) definition. Some remarks related to implementation are given in Section 4. Conclusions are given in the final Section 6.

Plant Models
The closed-loop control system of an unmanned aerial vehicle (UAV) quadcopter arm, presented in Figures 1 and 2, consists of a hardware platform with two micro BLDC motors, an encoder and a controller for modeling and designing an accurate control law for a dedicated fractional-order PID (FOPID) controller [3]. For small angles we treated the plant as a black box and provided classic, first and second integer-order Küpfmüller models, further enhanced by a model described by the fractional-order transfer function (FOTF). Matching the response of simple fractional order model approximation with the original data exceeded first and second-order Küpfmüller models. However, several assumptions and specific implementation techniques had to be considered, which will be described in the sections that follow. The transfer functions of the models, prepared in MATLAB R2017b computation software using PID Tuner applet, Optimization Toolbox and FOMCON [13,14] are: 1. First-order plus dead time (FOPDT) where K P denotes the gain of the model, T P is a time constant and T D is the delay of the plant. 2. Second-order plus dead time (SOPDT) 3. Non-integer-order plus dead time (NIOPDT) G N IOPDT (s) = K P a n s v n + a n−1 s v n−1 + ... where a n . . . a 0 denote constants and ν n . . . ν 0 values of fractional orders.  For the purpose of experiments on a real hardware platform, the ARM Cortex-M7 core-based 32-bit microcontroller from the STM32 High Performance series, model STM32F746ZG [15] was used. The following configuration was being set up during the main program initialization routine: • maximum value of the main clock frequency f CPU = 216 MHz, • analog-to-digital converter (ADC) synchronized with the internal timer interrupt routine (ISR) to sample the input signal on an ADC pin at f ADC = f s = 1 kHz, • number of ADC domain clock cycles required for a single ADC conversion c ADC = 15 providing the best possible accuracy of 12-bit resolution. Time of conversion t conv ≈ 10 µs, • single-precision hardware floating-point unit (FPU) and compiler warnings on automatic double-precision promotion enabled (software-simulated support for double-precision arithmetic), • 60 Hz PWM output signal with adjustable duty cycle for driving the 11VDC-supplied micro-BLDC driver.

Discretization
As mentioned in the previous chapter, integer-order models can be easily implemented on any microcontroller, in the form of digital finite (FIR) or infinite impulse response (IIR) filters [16,17]. A significantly better performance, due to the lower number of operations (feedforward and feedback taps), can be achieved with models implemented as IIR filters, which is important specifically for real-time calculations performed on a microcontroller at the cost of potential breakdown of stability. Discretization of the transfer functions (1) and (2) was performed using two well-known methods, the zero-order hold and bilinear transform (Tustin's), at a sampling frequency of f s = 1 kHz. The zero-order hold was selected for the best matching with the original characteristics.

Discrete first-order plus dead time (DFOPDT)
where b i and a j are numerator and denominator coefficients, respectively, and T D is the number of delay input samples at a given sample rate f s . The exact values of the coefficients are presented in Table 1.

Discrete non-integer-order plus dead time (DNIOPDT)
Discretization of the non integer-order transfer function (3) was performed in three consecutive steps. First, approximation of the transfer function in the frequency domain was obtained by applying Oustaloup's Recursive filter algorithm (ORA) [10,18], approximating the complex variable s of the fractional order 0 < ν < 1, using the following formula: where [ω b , ω h ] denotes the frequency range of the approximation, N is the order and . As a result, function (6) generates (2N + 1) poles and zeros in total. If ν > 1 then s ν is first replaced with s (n+u) = s n s u , where n is an integer number and u is a fractional part, approximated by the algorithm. Different values for the approximation order N were tested over the selected frequency range ω ∈ 10 −4 , 10 3 rad s .
Step response and Bode characteristics for N ∈ [1,5] are presented in Figures 3 and 4. It is noticeable that all values of the order N provide satisfactory approximations of the initial fractional order transfer function. Nevertheless, we proceeded with approximation orders N ≥ 3. Since the approximation polynomials had over 14 zeros and poles, in the second step, a reduction was performed using the balancing reduction technique [19], available in MATLAB as balred method. Minimization of the cost functions for a new reduced-order plant model of a fixed balred order M = 3, M ∈ N revealed that of several ORA filters, approximation of the order N = 3 ensured the best match between both characteristics.

Implementation Difficulties
It is important to stress two general points related to implementation. When discrete transfer functions (see Table 1) are obtained in MATLAB, one should be aware of the default Short Fixed Decimal display format, which rounds the numbers to four decimal places. Double precision representation of coefficients b i , a j can appear in the Variables explorer or after the activation of the long format display mode using the MATLAB routine: format long. This is necessary to avoid the model from losing stability caused by the implementation of truncated values for the coefficients. It is first necessary to determine the desired precision of the floating-/fixed-point number representation in the microcontroller software and the presence of the hardware floating-point unit, as these factors have a great impact the performance of the algorithm. The transfer functions of the proposed IIR filters were transformed into difference equations and implemented on an STM32F746ZG microcontroller in C programming language. To preserve the asymptotic stability of the designed models, in the case of fractional-order approximation the calculations had to be performed using software simulated double-precision arithmetic. This had a significant impact on performance, increasing the required number of CPU cycles from c avg,sp = 2650 to c avg,dp = 8500 (320%). The alternative approach involved the evaluation of the FOTF in the time domain, as will be described in the next section.  (4) and (5), and of the continuous model (7).

Time-Domain Approach Using the Grünwald-Letnikov Differintegral Operator and SMP
To compare the efficiency and accuracy of the Oustaloup approximation, we considered a fractional-order differential equation evaluated in the time-domain on the basis of the implementation of the truncated Grünwald-Letnikov differintegral operator [20]. This technique is known as the Short Memory Principle and restricts the boundaries of the operations to the most recent N l samples. The principle is applied usually to numerical evaluation but has been also proposed for Riemann-Liouville and Caputo definitions [21]. Several different maximum memory lengths were where N 0 denotes the total number of samples from the start of the simulation t 0 = 0 s, and was used as a reference value. The time responses of the plant were obtained using a modified formula [22]: where w ν i j denotes the Newton binomial weights in the GL definition: .. (9) and N k is the number of previously processed samples:  Table 2 below shows the corresponding performance indices, including those obtained for ORA. As can be seen, reducing the number of past samples below N 4 = N 0 20 (green curve) generates considerable error, which can increase even more for plants characterized by longer transient states. Therefore, SMP lengths of N 5 , N 6 were not considered for further analysis. Reduced-order approximations obtained using the ORA algorithm provided better results than nearly all SMP-based approximations. However, the value of the maximum absolute percentage error (MaxAPE) was usually higher, due to the deviation between the step response characteristics at the beginning of the transient state, near zero. This error dropped rapidly for k → ∞. Only for the memory lengths of SMP N ≥ N 1 were the step response characteristics (red curve) more accurate and similar to the initial step response of the fractional-order transfer function (3). The number of CPU cycles required to evaluate the output signal was measured using the Data Watchpoint and Trace unit of the microcontroller [23], by computing the difference between the values in CYCCNT register, read in two separate sections of the program. Several different software optimization techniques were applied to the algorithm in each iteration.    -up table). During each analog-digital conversion, a new input sample was added to the end of the input array and the values in the input and output arrays were shifted left when the limit N l was reached. Moreover, in the developed functions, only pointers to structures and arrays were accepted as parameters, to reduce the amount of memory occupied by the stack.

Replacing Arrays with Ring Buffers
Instead of shifting the values in the input/output arrays, a structure called a ring (circular) buffer was used. This involves defining a moving writing pointer (e.g., inWrIdx) for each of the arrays. When the buffer limit is reached (inWrIdx = N l ), the value is reset to point to the beginning of the buffer. The input value indicated by the pointer is always the most recent, whereas inWrIdx+1 (or 0 if inWrIdx = N l − 1) points to the oldest sample. For models with delay, an additional delay buffer with two pointers for writing and reading is initialized. Further optimization can be achieved when the array lengths are powers of two. At the cost of higher memory consumption, the conditional operator for checking the limit N l is replaced with a much faster bitwise multiplication of the pointer by N l .

Enabling Optimization Flags
For the purposes of debugging and results verification, the program was initially built without any optimization by the GCC compiler (-O0 flag). Using the values x = [1,3] with the flag -Ox may reduce the code length and the size of the binary [24]. The higher value of x, the more optimizations are performed during the last stage of compilation. It should be noted that, according to the GCC manual, -O3 may affect computation results and generate a binary larger than -O2, due to e.g., loops unrolling. Therefore, -O2 is usually recommended for release building profiles.

Enabling Hardware FPU Unit, Using CMSIS DSP Library
Since the release of the ARM Cortex-M4 core, STM32 microcontrollers have been equipped with IEEE 754 compliant hardware floating-point units. Depending on the model of the microcontroller, single-or double-precision units are available [23,25], supporting hardware accelerated operations on float32_t or float64_t types, respectively. The unit is disabled by default and had to be configured first.
One may also find it helpful to enable double promotion warnings (-Wdouble-promotion in GCC), to eliminate automatic casting of numbers to higher precision. Moreover, for calculations on float32_t or fixed-point q31_t numbers, which were highly optimized by taking advantage of dedicated intrinsic and SIMD operations, the CMSIS DSP library for ARM cores was considered. This library contains implementations of several common DSP algorithms, from among which arm_conv_partial_f32 and arm_scale_f32 functions were used for discrete convolution and vector scaling operations, respectively. The overall performance vastly improved. However, the truncated precision led to significant accumulated error (∆e k = |y DP (k)−y SP (k)| |y DP (k)| 100% = 28.3% for the last computed output sample), disqualifying the model H N IOPDT in this form from practical application.

Other Approaches
Further optimizations are a topic of the ongoing research involving adaptive memory methods, parallel implementation of numerical algorithms and calculations using fixed-point arithmetic. Finally, assembly inlines placed in critical sections of the algorithms and platform-specific enhancements are being considered. However, these approaches are strictly platform-dependent and must be adopted for each architecture individually. The results of subsequent software optimizations are presented in Figure 8. The algorithm processing past N 4 = 250 samples, compiled with a -O2 flag, satisfied the initial timing requirement, calculating the output in a time shorter than the sampling period t s = 1 ms.

Conclusions
This study set out to address the problem of fractional-order model implementation. The non-local fractional differential GL operator involves a constantly growing number of calculations, which can be either bounded or replaced by an integer-order operator using one of the well-known approximation methods. A model of a UAV arm with a BLDC motor was implemented, using two approaches in the frequency-and time-domains: the Oustaloup approximation and numerical evaluation of the differential equation using the Grünwald-Letnikov definition with the Short Memory Principle. The performance of the algorithms was measured and compared. For orders N ORA > 2 of the Oustaloup approximation, similar step response characteristics were obtained. Moreover, in this case, the required buffer size was limited to only four feedforward and four feedback samples, vastly improving the calculation time and memory consumption. Higher accuracy could be obtained by different levels of reduction in the numbers of poles and zeros. Another approach, based on implementation of the Grünwald-Letnikov definition, required the introduction of the Short Memory Principle. In this case, programming optimization techniques allowed the computation time to be reduced by 15% or even 78% if CMSIS DSP and FPU hardware were used. This last result, however, required redesigning of the model. The methods described in this paper can be easily adapted and applied to other fractional-order models or control algorithms. Further work is underway, focusing on parallel implementation and optimization of fractional order numerical algorithms and designing a variable-, fractional-order PID controller with algorithms for determining the function of variable order.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: