# Optimization for Software Implementation of Fractional Calculus Numerical Methods in an Embedded System

## Abstract

**:**

^{®}Cortex

^{®}-M architectures. Reductions in computation times of up to 75% and 87% were achieved compared to the initial implementation, depending on the type of Arm

^{®}core.

## 1. Introduction

^{®}CMSIS-DSP library with intrinsic and Single Instruction Multiple Data (SIMD) functions, as well as other hardware extensions. The research aimed to obtain the highest possible performance, while preserving the ease of middle-level C programming, ensuring software portability and omitting CPU-specific assembly code snippets. Several iterations of the tests were conducted using two 32-bit RISC Arm

^{®}Cortex

^{®}-M microcontrollers manufactured by STMicroelectronics. First, the implementations of fractional-order backward difference and derivative using constant single-precision floating-point ${a}_{j}^{\nu}$ binomial coefficients for varying buffer sizes of ${L}_{1}=32$ and ${L}_{2}=256$ values were analyzed. The memory limitations of the microcontrollers were also investigated. Next, the performance of initial algorithms for fractional backward difference/sum and differentiator/integrator of variable orders was measured. The latter is particularly useful for the realization of adaptive fractional-order ${\mathrm{PI}}^{\mu \left(t\right)}$${\mathrm{D}}^{\nu \left(t\right)}$ controllers with variable orders of I and D terms. The algorithms were then optimized using the described techniques. In the final step, fixed-point arithmetic with the conversion of numbers to $Qm.n$ notation [34] with m bits for the integer part and n bits for the fractional part was applied.

## 2. Mathematical Preliminaries

**Definition**

**1.**

**Definition**

**2.**

**Definition**

**3.**

## 3. Description of the Hardware Testing Platform

^{TM}microcontrollers, models: STM32L152RCT6 [37] and STM32F746ZG [38], designed on the basis of popular 32-bit cores: Arm

^{®}Cortex

^{®}-M3 and Cortex

^{®}-M7, respectively. The main differences between these devices lie in the availability of a hardware floating-point unit (FPU) and a higher maximum CPU clock frequency in the case of the STM32F746ZG. The microcontrollers are distributed in STM32

^{TM}Discovery and Nucleo-144 kits [39,40]. Their key features are listed in Table 1.

^{®}-M microcontrollers usually do not reach the same computation power, often due to much lower CPU clock frequency (e.g., the TMS320C6678 DSP processor operates at 1.4 GHz). However, they have been equipped with numerous extensions for accelerating calculations, including Single Instruction, Multiple Data (SIMD) operations, optimized multiply-accumulate (MAC) and DSP instructions, direct memory access (DMA), and hardware floating-point units (FPU). A significant advantage is the availability of basic peripherals, memories, communication interfaces, and power regulators. This offers a low-cost alternative to multi-core systems, with each core dedicated to specific tasks (e.g., primary DSP core to signal processing tasks and secondary core to an operating system, communication with external peripherals and power management).

^{®}-M7 core has been increased by a 6-stage dual-issue pipeline capable of processing two instructions per clock cycle. At the fourth stage of the pipeline (Issue), processed instructions are split and further executed by one of the separated dedicated blocks—an arithmetic logic unit with a SIMD extension, MAC pipeline, single-precision floating-point pipeline, or branch prediction block.

^{TM}ultra-low-power technology. The primary feature of the microcontroller is the availability of several low power modes dedicated to battery-powered applications (in Standby mode current consumption is reduced to only 0.29 $\mathsf{\mu}$A). Maximum performance is therefore limited to only 33DMIPS at a clock frequency of 32 MHz. Due to the lack of hardware floating-point unit, all operations on real numbers are software emulated, which strongly affects the computation time.

## 4. Implementation of the Grünwald–Letnikov Fractional-Order Operator

#### 4.1. Memory Limitations

#### 4.2. Compiler Settings

^{®}Embedded GCC compiler, distributed as part of the GNU Arm

^{®}Embedded Toolchain v9.2. Several available optimization levels were tested, starting with the default –O0 (no optimizations) flag. In that mode, instructions are translated by the compiler line by line, and breakpoints can be placed and hit anywhere in the executable code. This level is most suitable for the software development process, providing the most accurate debugging experience and the possibility of reading and modifying variables at a debug session. The second level was –O2, which is the highest standard-compliant optimization level that does not introduce a trade-off between the size of the program and its execution speed. This option is commonly enabled in the release building profiles of numerous GNU projects, including the Linux kernel. The last level tested was –O3, in which more code optimizations are applied, however, usually at the cost of the increased size of the output binary. This is an outcome of functions inlining and loops unrolling. Therefore, the program may not become faster in all cases. The Arm

^{®}GCC also supports the more aggressive –Ofast optimization level, which replaces math operations with their fast modifications. However, due to the generation of non-standard-compliant code and potential software vulnerabilities, this setting was not taken into consideration. A detailed description of all compiler optimization options can be found in the GCC user manual [41].

#### 4.3. Measuring the Performance

^{®}Cortex

^{®}microcontrollers is equipped with a peripheral called the Data Watchpoint and Trace (DWT) unit [42]. It contains up to six different counters and four hardware comparators, which can serve as a source for event triggering (Embedded Trace Macrocell, PC sampler, and data address sampler triggers) and configuring hardware watchpoints. Counting the number of elapsed core cycles is also possible and done by reading the values stored in the DWT Cycle Count (DWT_CYCCNT) register. Combining it with the value of CPU clock frequency ${F}_{CPU}$, one can determine the time of a specific set of operations as ${t}_{op}=\frac{{c}_{2}-{c}_{1}}{{F}_{CPU}}$, where ${c}_{1},{c}_{2}$ denote the number of cycles read from the DWT_CYCCNT register before and after the analyzed section, respectively. The CYCCNT counter is counting upwards to ${2}^{32}$ and wraps around to 0. The configuration procedure for Data Watchpoint and Trace unit is as follows:

- TRCENA bit [24] in the Debug Exception and Monitor Control Register (DEMCR) set to 1 to enable use of the trace and debug blocks.
- CYCCNTENA bit [0] in the DWT Control Register (DWT_CTRL) set to 1 to enable the CYCCNT counter.
- Value of the DWT_CYCCNT register initialized to 0.

#### 4.4. Implementation of Fractional-Order Backward Difference

^{®}Core

^{TM}i5-8250U, 16 GB of RAM and MS Windows

^{®}10 Pro operating system.

## 5. Optimization

#### 5.1. SIMD and DSP Instructions in the CMSIS Library

^{®}-M architecture, this operation can be realized on 64- and 32-bit operands ($64b\leftarrow 64b+32b\times 32b$). Since the release of Cortex

^{®}-M4 core, Single Instruction Multiple Data (SIMD) extensions have also been supported. With SIMD, one can increase the processing capability by performing calculations simultaneously on multiple 4 × 8-bit or 2 × 16-bit operands. A complete list of the Cortex

^{®}extensions can be found in [43].

^{®}[44]. It contains over 60 different methods, optimized for various cores, endianness, and data types. Fixed-point (Q7, Q15, Q31) as well as single- and double-precision floating-point arithmetics (float32_t, float64_t) are supported.

^{®}-M4 and M7 microcontrollers, an explicit definition of the __FPU_PRESENT macro is required in order to enable support of the hardware FPU instructions.

#### 5.2. Enabling the Hardware Floating-Point Unit

^{®}GCC compiler). The ABI interface determines the type of registers which are used to pass real variables to the linked functions. The flag -mfloat-abi=hard corresponds to dedicated floating-point registers, while -mfloat-abi=soft (cross-platform compatible) to integer registers. The version of architecture can be specified either by –mfpu=fpv5-sp-d16 for single-precision support or fpv5-dp-d16 for double-precision support.

#### 5.3. Other Optimizations

#### 5.4. Implementation

- The appropriate linked CMSIS-DSP lib file: arm_cortexM3l_math.lib for STM32L152RCT6 (little-endian) and arm_cortexM7lfsp_math.lib for STM32F746ZG (little-endian, single-precision FPU). Required macros defined.

^{®}-M7. The number of CPU cycles was reduced by over 75% for the buffer length ${L}_{2}$ and for the program compiled with the same –O0 flag. The –O2 and –O3 levels generated even better machine code. For Cortex

^{®}-M3, the improvement was smaller but also noticeable, resulting in a reduction of the execution time up to 19% and 20% for the buffers with 32 and 256 samples, respectively. Details are presented in Figure 3.

## 6. Fixed-Point Arithmetic

^{TM}[34] is used to represent real numbers devoting a constant number of m bits for integer parts and a constant number of n bits for fractional parts. One additional bit is reserved for a sign, and a position of the radix point is fixed. Numbers are stored in integer registers, and all calculations are performed using standard hardware arithmetic logic unit. The range of a $Qm.n$ number is defined as $[-{2}^{m-1},{2}^{m-1}-{2}^{-n}]$ and the resolution equals ${2}^{-n}$. Fixed-point has been used most often for low-cost or older microcontrollers without hardware floating-point units but is also implemented in many high-end DSP applications to increase the overall performance of the software. The drawbacks of this approach include potential issues with saturation, precision loss, or selecting an insufficient range of numbers. Thus, more complex implementation capable of handling normalization (bit-shifting) and bounds checking is required. Additional scaling of real numbers may also be needed.

^{−21}] and a resolution of ${2}^{-21}(4.768e-7)$. This allowed Equation (2) to be implemented without scaling the maximum number of coefficients (256) and provided a satisfactory resolution. It needs to be stressed, however, that, for different applications, the above requirements will have to be adapted. As the CMSIS-DSP library supports only Q1.31, Q1.15, and Q1.7 formats, the author’s implementation of the fixed-point arithmetic was introduced. The procedure was as follows:

- The vector of the predefined floating-point input samples, initial fractional-order ${\nu}_{0}=0.7$, and the sampling time h were converted to Q11.21 format by multiplying the values by ${2}^{21}$ and rounding to the nearest integer.
- The recursive function for calculating ${a}_{j}^{\left[\nu \right(t\left)\right]}$ and fractional differintegral algorithm were modified for handling fixed-point arithmetic in Q11.21 notation.
- In the main loop, the ${\nu}_{Q11.21}$ order was incremented by one each step and the vectors of the ${a}_{{j}_{Q11.21}}^{\left[\nu \right(t\left)\right]}$ coefficients, as well as the variable fractional-order backward difference and derivative responses, were recalculated.

^{®}-M3 case (see Figure 4). Calculation time was reduced by over 84% for both sizes of the buffers with –O2 or –O3 optimization levels applied. Moreover, Cortex

^{®}-M3 was found to be faster (assuming the same CPU clock frequency) than Cortex

^{®}-M7 at executing the same algorithm.

## 7. Conclusions

## Supplementary Materials

## Funding

## Conflicts of Interest

## Abbreviations

ABI | Application Binary Interface |

CPACR | Coprocessor Access Control Register |

DEMCR | Debug Exception and Monitor Control Register |

DFT | Discrete Fourier Transform |

DWT | Data Watchpoint and Trace unit |

DWT_CTRL | DWT Control Register |

DWT_CYCCNT | DWT Cycle Count Register |

FIR | Finite Impulse Response |

FPU | Floating-Point Unit |

GL | Grünwald–Letnikov |

IIR | Infinite Impulse Response |

MAC | Multiply-Accumulate |

PID | Proportional-Integral-Derivative Controller |

SIMD | Single Instruction Multiple Data |

(V)FOBD/S | (Variable) Fractional-Order Backward Difference/Sum |

(V)FOD/I | (Variable) Fractional-Order Differintegral |

(V)FOPID | (Variable) Fractional-Order Proportional-Integral-Derivative Controller |

## References

- Oldham, K.B.; Spanier, J. The Fractional Calculus - Theory and Applications of Differentiation and Integration to Arbitrary Order. In Mathematics in Science and Engineering; Academic Press, Inc.: San Diego, CA, USA, 1974; Volume 111, ISBN 978-0-12-525550-9. [Google Scholar] [CrossRef]
- Miller, K.S.; Ross, B. An Introduction to the Fractional Calculus and Fractional Differential Equations, 1st ed.; John Wiley & Sons: New York, NY, USA, 1993; ISBN 978-04-7158-884-9. [Google Scholar]
- Podlubny, I. Fractional Differential Equations—An Introduction to Fractional Derivatives, Fractional Differential Equations, to Methods of their Solution and some of their Applications. In Mathematics in Science and Engineering; Academic Press, Inc.: San Diego, CA, USA, 1999; Volume 198, ISBN 978-01-2558-840-9. [Google Scholar] [CrossRef]
- Parsa Moghaddam, B.; Dabiri, A.; Tenreiro Machado, J.A. Application of variable-order fractional calculus in solid mechanics. In Handbook of Fractional Calculus with Applications. Applications in Engineering, Life and Social Sciences, Part A; Bǎleanu, D., Mendes Lopes, A., Tenreiro Machado, J.A., Eds.; De Gruyter: Berlin, Germany, 2019; Volume 7, pp. 207–224. ISBN 978-3-11-057091-5. [Google Scholar] [CrossRef]
- Sierociuk, D.; Skovranek, T.; Macias, M.; Podlubny, I.; Petras, I.; Dzielinski, A.; Ziubinski, P. Diffusion process modeling by using fractional-order models. Appl. Math. Comput.
**2015**, 257, 2–11. [Google Scholar] [CrossRef] [Green Version] - MacDonald, C.L.; Bhattacharya, N.; Sprouse, B.P.; Silva, G.A. Efficient computation of the Grünwald–Letnikov fractional diffusion derivative using adaptive time step memory. J. Comput. Phys.
**2015**, 297, 221–236. [Google Scholar] [CrossRef] [Green Version] - Wang, S.; He, S.; Yousefpour, A.; Jahanshahi, H.; Repnik, R.; Perc, M. Chaos and complexity in a fractional-order financial system with time delays. Chaos Solitons Fractals
**2020**, 131, 109521. [Google Scholar] [CrossRef] - Tejado, I.; Pérez, E.; Valério, D. Fractional Derivatives for Economic Growth Modelling of the Group of Twenty: Application to Prediction. Mathematics
**2020**, 8, 50. [Google Scholar] [CrossRef] [Green Version] - Sopasakis, P.; Sarimveis, H. Controlled Drug Administration by a Fractional PID. IFAC Proc. Vol.
**2014**, 47, 8421–8426. [Google Scholar] [CrossRef] [Green Version] - Valentim, C.A.; Oliveira, N.A.; Rabi, J.A.; David, S.A. Can fractional calculus help improve tumor growth models? J. Comput. Appl. Math.
**2020**, 379, 112964. [Google Scholar] [CrossRef] - Aliyu, A.I.; Alshomrani, A.S.; Li, Y.; Inc, M.; Baleanu, D. Existence theory and numerical simulation of HIV-I cure model with new fractional derivative possessing a non-singular kernel. Adv. Differ. Equ.
**2019**, 2019, 408. [Google Scholar] [CrossRef] [Green Version] - Al-Shamasneh, A.R.; Jalab, H.A.; Shivakumara, P.; Ibrahim, R.W.; Obaidellah, U.H. Kidney segmentation in MR images using active contour model driven by fractional-based energy minimization. Signal Image Video Process.
**2020**, 1–8. [Google Scholar] [CrossRef] - Lv, T.; Tong, L.; Zhang, J.; Chen, Y. A real-time physiological signal acquisition and analyzing method based on fractional calculus and stream computing. Soft Comput.
**2020**, 1–7. [Google Scholar] [CrossRef] - Huang, L.L.; Park, J.H.; Wu, G.C.; Mo, Z.W. Variable-order fractional discrete-time recurrent neural networks. J. Comput. Appl. Math.
**2020**, 370, 112633. [Google Scholar] [CrossRef] - Patnaik, S.; Hollkamp, J.P.; Semperlotti, F. Applications of variable-order fractional operators: A review. Proc. R. Soc. A Math. Phys. Eng. Sci.
**2020**, 476, 20190498. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Freeborn, T.J.; Maundy, B.; Elwakil, A.S. Fractional-order models of supercapacitors, batteries and fuel cells: A survey. Mater. Renew. Sustain. Energy
**2015**, 4, 9:1–9:7. [Google Scholar] [CrossRef] [Green Version] - Lewandowski, M.; Orzyłowski, M. Fractional-order models: The case study of the supercapacitor capacitance measurement. Bull. Pol. Acad. Sci. Tech. Sci.
**2017**, 65, 449–457. [Google Scholar] [CrossRef] [Green Version] - Zhang, Q.; Li, Y.; Shang, Y.; Duan, B.; Cui, N.; Zhang, C. A Fractional-Order Kinetic Battery Model of Lithium-Ion Batteries Considering a Nonlinear Capacity. Electronics
**2019**, 8, 394. [Google Scholar] [CrossRef] [Green Version] - Majka, L.; Klimas, M. Diagnostic approach in assessment of a ferroresonant circuit. Electr. Eng.
**2019**, 101, 149–164. [Google Scholar] [CrossRef] [Green Version] - Tepljakov, A.; Alagoz, B.B.; Yeroglu, C.; Gonzalez, E.; HosseinNia, S.H.; Petlenkov, E. FOPID Controllers and Their Industrial Applications: A Survey of Recent Results. IFAC-PapersOnLine
**2018**, 51, 25–30. [Google Scholar] [CrossRef] - Ostalczyk, P.; Brzezinski, D.; Duch, P.; Łaski, M.; Sankowski, D. The variable, fractional-order discrete-time PD controller in the IISv1.3 robot arm control. Cent. Eur. J. Phys.
**2013**, 11, 750–759. [Google Scholar] [CrossRef] [Green Version] - El-Khazali, R. Fractional-order PIλDμ controller design. Comput. Math. Appl.
**2013**, 66, 639–646. [Google Scholar] [CrossRef] - Petráš, I.; Vinagre, B.M. Practical application of digital fractional-order controller to temperature control. Acta Montan. Slovaca
**2002**, 7, 131–137. Available online: https://actamont.tuke.sk/pdf/2002/n2/11petras.pdf (accessed on 11 March 2020). - Brzeziński, D.W. Fractional Order Derivative and Integral Computation with a Small Number of Discrete Input Values Using Grünwald–Letnikov Formula. Int. J. Comput. Methods
**2019**, 17, 1940006. [Google Scholar] [CrossRef] - Scherer, R.; Kalla, S.L.; Tang, Y.; Huang, J. The Grünwald–Letnikov method for fractional differential equations. Comput. Math. Appl.
**2011**, 62, 902–917. [Google Scholar] [CrossRef] [Green Version] - Ostalczyk, P. On simplified forms of the fractional-order backward difference and related fractional-order linear discrete-time system description. Bull. Pol. Acad. Sci. Tech. Sci.
**2015**, 63, 423–433. [Google Scholar] [CrossRef] [Green Version] - Oustaloup, A. La commande CRONE: Commande Robuste D’Ordre non Entier; Hermes Science Publications: Paris, France, 1991; ISBN 978-28-6601-289-2. [Google Scholar]
- Oprzȩdkiewicz, K.; Podsiadło, M.; Dziedzic, K. Integer order vs fractional order temperature models in the forced air heating system. Przegla̧d Elektrotechniczny
**2019**, 95, 35–40. [Google Scholar] [CrossRef] - Baranowski, J.; Bauer, W.; Zagórowska, M.; Pia̧tek, P. On Digital Realizations of Non-integer Order Filters. Circuits Syst. Signal Process.
**2016**, 35, 2083–2107. [Google Scholar] [CrossRef] [Green Version] - Monje, C.A.; Chen, Y.; Vinagre, B.M.; Xue, D.; Feliu, V. Fractional-order Systems and Controls. Fundamentals and Applications; Advances in Industrial Control; Springer: London, UK, 2010; ISBN 978-1-84996-334-3. [Google Scholar] [CrossRef]
- Dastjerdi, A.A.; Vinagre, B.M.; Chen, Y.; HosseinNia, S.H. Linear fractional order controllers; A survey in the frequency domain. Annu. Rev. Control
**2019**, 47, 51–70. [Google Scholar] [CrossRef] - Caponetto, R.; Machado, J.T.; Murgano, E.; Xibilia, M.G. Model Order Reduction: A Comparison between Integer and Non-Integer Order Systems Approaches. Entropy
**2019**, 21, 876. [Google Scholar] [CrossRef] [Green Version] - Tepljakov, A.; Petlenkov, E.; Belikov, J. Implementation and real-time simulation of a fractional-order controller using a MATLAB based prototyping platform. In Proceedings of the 13th Biennial Baltic Electronics Conference, Tallinn, Estonia, 3–5 October 2012; pp. 145–148. [Google Scholar] [CrossRef]
- Pyeatt, L.D.; Ughetta, W. Non-Integral Mathematics. In Modern Assembly Language Programming with the ARM Processor; Pyeatt, L.D., Ughetta, W., Eds.; Elsevier: Amsterdam, The Netherlands, 2016; Chapter 8; pp. 239–292. ISBN 978-01-2819-221-4. [Google Scholar] [CrossRef]
- Ostalczyk, P. Discrete Fractional Calculus: Applications in Control and Image Processing; World Scientific Publishing Co., Inc.: Singapore, 2016; ISBN 978-98-1472-566-8. [Google Scholar]
- Mozyrska, D.; Ostalczyk, P. Variable-, fractional-order Grünwald-Letnikov backward difference selected properties. In Proceedings of the 39th International Conference on Telecommunications and Signal Processing (TSP 2016), Vienna, Austria, 27–29 June 2016; pp. 634–637. [Google Scholar] [CrossRef]
- STMicroelectronics. STM32L15xCC STM32L15xRC STM32L15xUC STM32L15xVC Ultra-low-power 32-bit MCU ARM-based Cortex-M3, 256KB Flash, 32KB SRAM, 8KB EEPROM, LCD, USB, ADC, DAC. Datasheet—Production Data. DocID022799 Rev 13. 2017. Available online: https://www.st.com/resource/en/datasheet/stm32l152rc.pdf (accessed on 11 March 2020).
- STMicroelectronics. STM32F745xx STM32F746xx ARM-based Cortex-M7 32b MCU+FPU, 462DMIPS up to 1MB Flash/320+16+4KB RAM, USB OTG HS/FS, ethernet, 18TIMs, 3ADCs, 25 com itf, cam & LCD Datasheet—Production Data. DocID027590 Rev 4. 2016. Available online: https://doi.org/https://www.st.com/resource/en/datasheet/stm32f746zg.pdf (accessed on 11 March 2020).
- STMicroelectronics. UM1079 User Manual. Discovery kits with STM32L152RCT6 and STM32L152RBT6 MCUs. 2017. Available online: http://www.st.com/resource/en/user_manual/dm00093903.pdf (accessed on 11 March 2020).
- STMicroelectronics. UM1974 User Manual STM32 Nucleo-144 Boards. 2017. Available online: http://www.st.com/content/ccc/resource/technical/document/user_manual/group0/26/49/90/2e/33/0d/4a/da/DM00244518/files/DM00244518.pdf/jcr:content/translations/en.DM00244518.pdf (accessed on 11 March 2020).
- Arm Ltd. Using Common Compiler Options. Selecting optimization options. In Arm
^{®}Compiler Version 6.12 User Guide; Arm Ltd.: Cambridge, UK, 2019; pp. 35–37. Available online: https://developer.arm.com/docs/100748/0612 (accessed on 11 March 2020). - Arm Ltd. Data Watchpoint and Trace Unit. In Arm
^{®}Cortex^{®}-M7 Processor Technical Reference Manual, r1p2 ed.; Arm Ltd.: Cambridge, UK, 2018; pp. 139–143. Available online: https://developer.arm.com/docs/ddi0489/d (accessed on 11 March 2020). - Arm Ltd. CMSIS-Core (Cortex-M) Intrinsic Functions for SIMD Instructions [only Cortex-M4 and Cortex-M7]; Arm Ltd.: Cambridge, UK, 2019; Available online: https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__SIMD__gr.html (accessed on 11 March 2020).
- Arm Ltd. CMSIS-DSP Software Library; Arm Ltd.: Cambridge, UK, 2019; Available online: https://www.keil.com/pack/doc/CMSIS/DSP/html/index.html (accessed on 11 March 2020).
- STMicroelectronics. AN4841 Application Note. Digital Signal Processing for STM32 Microcontrollers Using CMSIS. Rev 2. 2018. Available online: https://www.st.com/content/ccc/resource/technical/document/application_note/group0/c1/ee/18/7a/f9/45/45/3b/DM00273990/files/DM00273990.pdf/jcr:content/translations/en.DM00273990.pdf (accessed on 11 March 2020).
- ARM Ltd. Arm Cortex-M7 Processor Technical Reference Manual, r1p2 ed.; ARM Ltd.: Cambridge, UK, 2018; Available online: https://static.docs.arm.com/ddi0489/f/DDI0489F_cortex_m7_trm.pdf (accessed on 11 March 2020).
- Noronha, D.H.; Leong, P.H.; Wilton, S.J. Kibo: An Open-Source Fixed-Point Tool-kit for Training and Inference in FPGA-Based Deep Learning Networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018), Vancouver, BC, Canada, 21–25 May 2018; pp. 178–185. [Google Scholar] [CrossRef]

**Figure 1.**(

**a**) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers as the number of executed CPU cycles, realizing the fractional-order differintegral (Equation (6)) (ν

_{const}(t) = 0.7) for different optimization levels O0, O2, O3, and buffer lengths L

_{1}, L

_{2}. Obtained improvement for both microcontrollers (worst case vs best case, buffer length L

_{1}): 22% and 63%, respectively; (

**b**) sizes of the output binaries (columns) and compilation times (polylines) for different optimization levels of the program. Buffer length L

_{2}= 256.

**Figure 2.**(

**a**) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers realizing the variable fractional-order differintegral (Equation (6)) for different optimization levels O0, O2, O3 and buffer lengths L

_{1}, L

_{2}. Obtained improvement for both microcontrollers (buffer length L

_{1}): 19% and 70%, respectively; (

**b**) sizes of the output binaries (columns) and compilation times (polylines) for different optimization levels of the program. Buffer length L

_{2}= 256.

**Figure 3.**(

**a**) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers realizing the modified implementation of variable fractional-order differintegral (Equation (6)) for different optimization levels O0, O2, O3 and buffer lengths L

_{1}, L

_{2}. Obtained improvement for both microcontrollers (buffer length L

_{1}): 4% and 19%, respectively; (

**b**) sizes of the output binaries (columns) and compilation times (polylines) for different optimization levels of the program. Buffer length L

_{2}= 256.

**Figure 4.**(

**a**) performance of STM32L152RCT6 (blue, gray) and STM32F746ZG (red, yellow) microcontrollers realizing the fixed-point implementation of variable fractional-order differintegral (Equation (6)) for different optimization levels O0, O2, O3 and buffer lengths L

_{1}, L

_{2}. Obtained improvement for both microcontrollers (buffer length L

_{1}): 67% and 65%, respectively; (

**b**) sizes of the output binaries (columns) and compilation times (polylines) for different optimization levels of the program. Buffer length L

_{2}= 256.

Parameter Name | STM32L152RCT6 (Arm ^{®} Cortex^{®}-M3) | STM32F746ZG (Arm^{®} Cortex^{®}-M7) |
---|---|---|

CPU clock frequency (${F}_{CPU}$) | up to 32 MHz | up to 216 MHz |

Memory ($Flash,SRAM$) | 256 KB Flash + 32 KB SRAM + 8 KB EEPROM | 1024 KB Flash + 320 KB SRAM |

Converters ($ADC,DAC$) | 12-bit 1 MSPS ADC, 12-bit DAC | 3× 12-bit 2.4 MSPS ADC, 2× 12-bit DAC |

Power supply (${V}_{DD}$) | 1.65–3.6 V | 1.8–3.6 V |

Other features | ultra-low-power technology, LCD driver, touch sensor channels | floating-point unit real-time accelerator, DSP instructions, LCD and cam interface |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Matusiak, M.
Optimization for Software Implementation of Fractional Calculus Numerical Methods in an Embedded System. *Entropy* **2020**, *22*, 566.
https://doi.org/10.3390/e22050566

**AMA Style**

Matusiak M.
Optimization for Software Implementation of Fractional Calculus Numerical Methods in an Embedded System. *Entropy*. 2020; 22(5):566.
https://doi.org/10.3390/e22050566

**Chicago/Turabian Style**

Matusiak, Mariusz.
2020. "Optimization for Software Implementation of Fractional Calculus Numerical Methods in an Embedded System" *Entropy* 22, no. 5: 566.
https://doi.org/10.3390/e22050566