# ILP-Based and Heuristic Scheduling Techniques for Variable-Cycle Approximate Functional Units in High-Level Synthesis

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Scheduling for approximate computing circuits with accuracy-controllable approximate multipliers is mathematically derived using an ILP formulation. Our proposed scheduling takes account of exact and approximate computations, and determines that each arithmetic operation is scheduled as either exact or approximate under resource and time constraints such that the error at the output is minimized.
- A list-scheduling algorithm is proposed to solve the proposed scheduling problem in polynomial time, which can solve faster than the ILP method.

## 2. Related Work

## 3. ILP-Based Scheduling for Variable-Cycle Approximate Functional Units in High-Level Synthesis

#### 3.1. A Scheduling Example

#### 3.2. ILP Formulation

#### 3.3. Chaining

## 4. Heuristic Scheduling Algorithms Based on List Scheduling

#### 4.1. List-Scheduing Algorithm

Algorithm 1 List-Scheduling Algorithm | |

1 | ListScheduling(G(V,E)) begin |

2 | for $i\in \mathrm{V}$ do |

3 | ${t\_\mathit{alap}}_{i}$ ← ALAP_schedule |

4 | end for |

5 | for $i\in \mathrm{V}$ do |

6 | ${e}_{i}\leftarrow {\mathrm{Obtain}\mathit{res}}_{i}{,\mathit{err}}_{i}\mathrm{from}\mathrm{G}\left(\mathrm{V},\mathrm{E}\right)\mathrm{and}\mathrm{any}\mathrm{input}$ |

7 | end for |

8 | for n in 1..|M|+1 do |

9 | for $i\in \mathrm{V}$ do |

10 | if $i\in \mathrm{A}{\displaystyle \cup}i\in \mathrm{Apx}$ then |

11 | ${p}_{i}{=t\_\mathit{alap}}_{i}$ |

12 | else ${p}_{i}{=t\_\mathit{alap}}_{i}-{(T}_{\mathit{res}}-{T}_{\mathit{apx}})$ end if |

13 | end for |

14 | $t$ = 0,$\pi =\{\varnothing \},\tau =\{\varnothing \}$ |

15 | while $\pi \ne \mathrm{V}$ do |

16 | ${N}_{\mathit{mul}}{=\mathit{Const}}_{\mathit{mul}},t++$ |

17 | for $i\in \mathrm{V}$ do |

18 | for $i\in \tau $ do |

19 | if $i\in \mathrm{M}$ then |

20 | ${\mathit{tr}}_{i}{=\mathit{tr}}_{i}-{1,N}_{\mathit{mul}}{=N}_{\mathit{mul}}-1$ |

21 | if $t{r}_{i}=0$ then |

22 | $\pi =\pi \cup i$, $\tau =\tau \cap {\neg i,\mathit{tf}}_{i}=t$ |

23 | $\sigma \leftarrow \mathrm{successor}\mathrm{nodes}\mathrm{of}\mathrm{operation}i\mathrm{in}\mathrm{V}$ |

24 | end if |

25 | end if |

26 | if $i\in \mathrm{A}$ then |

27 | ${\mathit{tr}}_{i}{=\mathit{tr}}_{i}-1$ |

28 | if ${\mathit{tr}}_{i}=0$ then |

29 | $\pi =\pi \cup i$, $\tau =\tau \cap \neg i$, ${\mathit{tf}}_{i}=t$ |

30 | $\sigma \leftarrow \mathrm{successor}\mathrm{nodes}\mathrm{of}\mathrm{operation}i\mathrm{in}\mathrm{V}$ |

31 | end if |

32 | end if |

33 | end for |

34 | if ${N}_{\mathit{mul}}\ge 1\cap \left\{i\right|i\in \left(\sigma \cap M\right)\cap \mathrm{min}({p}_{i}\left)\right\}$ then |

35 | $t{s}_{i}=t$, ${N}_{\mathit{mul}}{=N}_{\mathit{mul}}-1,\sigma =\sigma \cap \neg i$ |

36 | if $i\in \mathrm{Apx}\cap {T}_{\mathit{apx}}=1$ then |

37 | $\pi =\pi \cup i$, ${\mathit{tf}}_{i}=t$ |

38 | $\sigma \leftarrow \mathrm{successor}\mathrm{nodes}\mathrm{of}\mathrm{operation}i\mathrm{in}\mathrm{V}$ |

39 | elif $i\in \mathrm{Apx}\cap {T}_{\mathit{apx}}\text{}1$ then |

40 | $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{\mathit{apx}}-1$ |

41 | else $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{\mathit{res}}-1$ end if |

42 | end if |

43 | if $\left\{i\right|i\in \left(\sigma \cap \mathrm{A}\right)\cap {\mathrm{min}(p}_{i}\left)\right\}$ then |

44 | ${\mathit{ts}}_{i}=t,\sigma =\sigma \cap \neg i$ |

45 | if $i\in \mathrm{A}\cap {T}_{\mathit{op}}=1$ then |

46 | $\pi =\pi \cup i$, ${\mathit{tf}}_{i}=t$ |

47 | |

48 | else $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{\mathit{op}}-1,$ end if |

49 | end if |

50 | end for |

51 | end while |

52 | if $\forall {i,\mathit{tf}}_{i}\le {\mathit{Const}}_{\mathit{time}}$ then |

53 | ${\mathit{best}\_\mathit{ts}}_{i}{=\mathit{ts}}_{i}$,${\mathit{best}\_\mathit{tf}}_{i}{=\mathit{tf}}_{i},\mathit{best}\_\mathrm{Apx}=\mathrm{Apx}$ |

54 | $\{{\mathrm{M}}^{\prime}={\mathrm{M}}^{\prime}\cap \neg i\}\cap \left\{i\right|i\in {\mathrm{M}}^{\prime}\cap {\mathrm{max}(e}_{i}\left)\right\}$ |

55 | else $\{\mathrm{Apx}=\mathrm{Apx}\cup i\}\cap \left\{i\right|i\in {\mathrm{M}}^{\prime}\cap {\mathrm{max}(e}_{i}\left)\right\}$ |

56 | $\{{\mathrm{M}}^{\prime}={\mathrm{M}}^{\prime}\cap \neg i\}\cap \left\{i\right|i\in {\mathrm{M}}^{\prime}\cap {\mathrm{max}(e}_{i}\left)\right\}$ |

57 | end if |

58 | $\{\mathrm{Apx}=\mathrm{Apx}\cap \neg i\}\cap \left\{i\right|i\in {\mathrm{M}}^{\prime}\cap {\mathrm{max}(e}_{i}\left)\right\}$ |

59 | end for |

60 | end |

#### 4.2. Proposed List-Scheduling Example

#### 4.3. Chaining

Algorithm 2 Add in List-Scheduling Algorithm ① | |

1 | if$i\in \mathrm{D}$then |

2 | ${\mathit{tr}}_{i}{=\mathit{tr}}_{i}-1,$ |

3 | if ${\mathit{tr}}_{i}=0$ then |

4 | $\pi =\pi \cup i$, $\tau =\tau \cap \neg i$, ${\mathit{tf}}_{i}=t$ |

5 | |

6 | End if |

7 | End if |

Algorithm 3 Add in List-Scheduling Algorithm ② | |

1 | if$\left\{i\right|i\in \left(\sigma \cap \mathrm{D}\right)\cap {\mathrm{min}(p}_{i}\left)\right\}$then |

2 | ${\mathit{ts}}_{i}=t,\sigma =\sigma \cap \neg i$ |

3 | if $i\in \mathrm{D}\cap {T}_{add}=1$ then |

4 | $\pi =\pi \cup i$, ${\mathit{tf}}_{i}=t$ |

5 | |

6 | else $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{add}-1,$ end if |

7 | End if |

Algorithm4 Chaining in List-Scheduling Algorithm | |

1 | if$\mathit{Condition}\_\mathit{chaining}\_\mathit{mul}=1$then |

2 | for $i\in \mathrm{V}$ do |

3 | if $\cap}\{i\in \mathrm{M},\pi \cap {\mathit{tf}}_{i}=t\cap i\mathrm{followed}\mathrm{by}j\in \mathrm{D}\$ then |

4 | ${\mathit{ts}}_{j}=t,\sigma =\sigma \cap \neg i$ |

5 | if $j\in \mathrm{D}\cap {T}_{add}=1$ then |

6 | $\pi =\pi \cup j$, ${\mathit{tf}}_{j}=t$ |

7 | $\sigma \leftarrow \mathrm{successor}\mathrm{nodes}\mathrm{of}\mathrm{op}j\mathrm{in}\mathrm{V}$ |

8 | else $\tau =\tau \cup j$, ${\mathit{tr}}_{j}{=T}_{add}-1,$ end if |

9 | end if |

10 | if ${\mathrm{N}}_{mul}\ge 1{\displaystyle \cap}\{i\in \mathrm{D},\pi \cap {\mathit{tf}}_{i}=t\cap i\mathrm{followed}\mathrm{by}j\in \mathrm{M}\}$ then |

11 | ${\mathit{ts}}_{i}=t$, ${\mathrm{N}}_{mul}{=\mathrm{N}}_{mul}-1,\sigma =\sigma \cap \neg i$ |

12 | if $i\in \mathrm{M}\cap {T}_{res}=1$ then |

13 | $\pi =\pi \cup i$, ${\mathit{tf}}_{i}=t$ |

14 | $\sigma \leftarrow \mathrm{successor}\mathrm{nodes}\mathrm{of}\mathrm{op}i\mathrm{in}\mathrm{V}$ |

15 | else $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{res}-1,$ end if |

16 | end if |

17 | end for |

18 | end if |

19 | if$\mathit{Condition}\_\mathit{chaining}\_\mathit{add}=1$then |

20 | for $i\in \mathrm{V}$ do |

21 | if $\{i\in \mathrm{D},\pi \cap {\mathit{tf}}_{i}=t\cap i\mathrm{followed}\mathrm{by}j\in \mathrm{D}\}$ then |

22 | ${\mathit{ts}}_{i}=t$,$\sigma =\sigma \cap \neg i$ |

23 | if $i\in \mathrm{D}\cap {T}_{add}=1$ then |

24 | $\pi =\pi \cup i$, ${\mathit{tf}}_{i}=t$ |

25 | |

26 | else $\tau =\tau \cup i$, ${\mathit{tr}}_{i}{=T}_{\mathit{add}}-1,$ end if |

27 | end if |

28 | end for |

29 | end if |

## 5. Experiment

#### 5.1. Exprimental Setup

- All-exact (AE): each of the multiplications is performed without approximation and takes two cycles.
- All-approximated (AA): each of the multiplications is approximated and performed in one cycle.
- Mixed: each multiplication is determined as being either exact or approximated in two cycles or one cycle, respectively.
- Mixed-chain: each multiplication is determined as being either exact or approximated in two cycles or one cycle, respectively, and considering chaining.

#### 5.2. Exprrimental Results

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Mittal, S. A survey of techniques for approximate computing. ACM Comput. Surv.
**2016**, 48, 1–33. [Google Scholar] [CrossRef] - Xu, Q.; Mytkowicz, T.; Kim, N.S. Approximate computing: A survey. IEEE Des. Test
**2016**, 33, 8–22. [Google Scholar] [CrossRef] - Jie, H.; Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 2013 18th IEEE European Test Symposium (ETS), Avignon, France, 27–30 May 2013. [Google Scholar]
- Ye, R.; Wang, T.; Yuan, F.; Kumar, R.; Xu, Q. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 18–21 November 2013. [Google Scholar]
- Camus, V.; Schlachter, J.; Enz, C. A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision. In Proceedings of the IEEE/ACM Design Automation Conference, Austin, TX, USA, 5–9 June 2016. [Google Scholar]
- Guputa, V.; Mohapatra, D.; Raghnathan, A.; Roy, K. Low-power digital signall processing using approximate adders. IEEE Trans. Comput.-Aided Des. Integr. Circxuits Syst.
**2013**, 32, 124–137. [Google Scholar] [CrossRef] - Pashaeifar, M.; Kamal, M.; Kusha, A.A.; Pedram, M. Appproximate reverse carry propagate adder for energy-efficient dsp applications. IEEE Trans. Very Large Scale Integr. Syst.
**2018**, 26, 2530–2541. [Google Scholar] [CrossRef] - Yang, T.; Ukezono, T.; Sato, T. A low-power configurable adder for approximate applications. In Proceedings of the 2018 19th International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 13–14 March 2018. [Google Scholar]
- Mahdiani, H.R.; Ahmadi, A.; Fakhraie, S.M.; Lucas, C. Bio-Inspired imprecise computational blocks for efficient VLSI implementation of Soft-computing applications. IEEE Trans. Circuits Syst. I Regul. Pap.
**2010**, 57, 850–862. [Google Scholar] [CrossRef] - Lin, C.H.; Lin, I.C. High accuracy approximate multiplier with error correction. In Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD), Asheville, NC, USA, 6–9 October 2013. [Google Scholar]
- Liu, C.; Han, J.; Lombardi, F. A low-power, high-performance approximate multiplier with configurable partial error recovery. In Proceedings of the 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 24–28 March 2014. [Google Scholar]
- Yang, T.; Ukezono, T.; Sato, T. A low-power high-speed accuracy-controllable approximate multiplier design. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Korea, 22–25 January 2018. [Google Scholar]
- Sano, M.; Nishikawa, H.; Kong, X.; Tomiyama, H.; Ukezoko, T. Design of a 32-bit accuracy-controllable approximate multiplier for FPGAs. In Proceedings of the 2021 18th International SoC Design Conference (ISOCC), Jeju, Korea, 6–9 October 2021. [Google Scholar]
- Nepal, K.; Li, Y.; Bahar, R.I.; Reda, S. ABACUS: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, Dresden, Germany, 24–28 March 2014. [Google Scholar]
- Nepal, K.; Hashemi, S.; Tann, H.; Bahar, R.I.; Reda, S. Automated high-level generation of low-power approximate computing circuits. IEEE Trans. Emerg. Top. Comput.
**2019**, 7, 18–30. [Google Scholar] [CrossRef] - Schafer, B.C. Enabling high-level synthesis resource sharing design space exploration in FPGAs through automatic internal bitwidth adjustments. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
**2017**, 36, 97–105. [Google Scholar] [CrossRef] - Lee, S.; John, L.K.; Gerstlauer, A. High-level synthesis of approximate hardware under joint precision and voltage scaling. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017. [Google Scholar]
- Vaverka, F.; Hrbacek, R.; Sekanina, L. Evolving component library for approximate high level synthesis. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016. [Google Scholar]
- Venkatesan, R.; Agarwal, A.; Roy, K.; Raghunathan, A. MACACO: Modeling and analysis of circuits for approximate computing. In Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–10 November 2011. [Google Scholar]
- Chan, W.T.J.; Kahng, A.B.; Kang, S.; Kumar, R.; Sartori, J. Statistical analysis and modeling for error composition in approximate computation circuits. In Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD), Asheville, NC, USA, 6–9 October 2013. [Google Scholar]
- Li, C.; Luo, W.; Sapatnekar, S.S.; Hu, J. Joint precision optimization and high-level synthesis for approximate computing. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 8–12 June 2015. [Google Scholar]
- Godínez, J.C.; Esser, S.; Shafique, M.; Pagani, S.; Henkel, J. Compiler-driven error analysis for designing approximate accelerators. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018. [Google Scholar]
- Godínez, J.C.; Vargas, J.M.; Shafique, M.; Henkel, J. AxHLS: Design space exploration and high-level synthesis of approximate accelerators using approximate functional units and analytical models. In Proceedings of the 2020 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Diego, CA, USA, 2–5 November 2020. [Google Scholar]
- Xu, S.; Schafer, B.C. Exposing approximate computing optimizations at different levels: From behavioral to gate-level. IEEE Trans. Very Large Scale Integr. Syst.
**2017**, 25, 3077–3088. [Google Scholar] [CrossRef] - Leipnitz, M.T.; Nazar, G.L. High-level synthesis of resource-oriented approximate designs for FPGAs. In Proceedings of the Design Automation Conference, Las Vegas, NV, USA, 2–6 June 2019. [Google Scholar]
- Leipnitz, M.T.; Nazar, G.L. High-level synthesis of approximate designs under real-time constraints. ACM Trans. Embed. Comput. Syst.
**2019**, 18, 59. [Google Scholar] [CrossRef] - Leipnitz, M.T.; Perleberg, M.R.; Porto, M.S.; Nazar, G.L. Enhancing Real-Time Motion Estimation through Approximate High-Level Synthesis. In Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 6–8 July 2020. [Google Scholar]
- Leipnitz, M.T.; Nazar, G.L. “Throughput-oriented spatio-temporal optimization in approximate high-level synthesis. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA, 18–21 October 2020. [Google Scholar]
- Shirane, K.; Nishikawa, H.; Kong, X.; Tomiyama, H. High-level synthesis of approximate computing circuits with dual accuracy modes. In Proceedings of the 2021 18th International SoC Design Conference (ISOCC), Jeju, Korea, 6–9 October 2021. [Google Scholar]
- Shin, W.K.; Liu, J.W.S. Algorithms for scheduling imprecise computations with timing constraints to minimize maximum error. IEEE Trans. Comput.
**1995**, 44, 466–471. [Google Scholar] - Lee, C.; Potkonjak, M.; Smith, W.H.M. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture, Research Triangle Park, NC, USA, 3 December 1997. [Google Scholar]

**Figure 1.**An example of scheduling for approximate computing circuits with variable-cycle approximate multipliers. (

**a**) Approximate multiplications. (

**b**) Exact multiplications. (

**c**) Variable-cycle multiplications.

**Figure 2.**Example of chaining in this work. (

**a**) Add to Add (no Chaining). (

**b**) Add to Add (Chaining). (

**c**) Exact Mult to Add (no Chaining). (

**d**) Exact Mult to Add (Chaining).

**Figure 3.**Proposed list-scheduling example. (

**a**) A given DFG. (

**b**) Approximate all multiplications (result of first list scheduling). (

**c**) Exact multiplication with the largest error(result of second list scheduling). (

**d**) Exact multiplication with the second largest error (result of third list scheduling). (

**e**) Exact multiplication with the smallest error (result of fourth list scheduling). (

**f**) Final output that satisfies resource and time constraints.

G(V,E) | Data Flow Graph (DFG) |
---|---|

V | Set of operations |

E | Set of data dependencies between operations |

M | Set of multiplications ($\mathrm{M}\subseteq \mathrm{V}$) |

A | Set of operations other than multiplication (A$\subseteq \mathrm{V}$) |

$\mathrm{Apx}$ | Set of approximate multiplications $(\mathrm{Apx}\subseteq \mathrm{M}$) |

${\mathrm{M}}^{\prime}$ | Set of multiplications that have never been performed as exact multiplications $({\mathrm{M}}^{\prime}\subseteq \mathrm{M})$ |

$\sigma $ | Set of operations that can be executed |

$\tau $ | Set of operations that are being executed (set of operations currently running in multi-cycle) |

$\pi $ | Set of operations that have completed execution |

${T}_{\mathit{apx}}$ | Number of cycles required for approximate multiplication |

${T}_{\mathit{res}}$ | Number of cycles required for exact multiplication |

${T}_{\mathit{op}}$ | Number of cycles required for operations other than multiplication |

$i$ | i-th of all operations |

$t$ | Current clock cycle |

${\mathit{ts}}_{i}$ | Execution start time of the i-th operation |

${\mathit{tf}}_{i}$ | Execution finish time of the i-th operation |

${\mathit{tr}}_{i}$ | Remaining time until the end of the execution of the i-th operation |

${\mathit{Const}}_{\mathit{mul}}$ | The number of available multipliers (resource constraints) |

${\mathit{Const}}_{\mathit{time}}$ | The number of execution cycles for the entire circuit (time constraint) |

${N}_{\mathit{mul}}$ | Remaining number of multipliers available for the current clock cycle |

${t\_\mathit{alap}}_{i}$ | Execution time of the i-th operation in ALAP (result of ALAP) |

${\mathit{res}}_{i}$ | Exact value of the i-th operation |

${\mathit{err}}_{i}$ | Error of the i-th operation |

${e}_{i}$ | The magnitude of error given to the final output when approximating the i-th operation |

${p}_{i}$ | Priority based on ALAP results |

${T}_{add}$ | Number of cycles required for addition |

$j$ | i-th of addition |

$\mathit{Condition}\_\mathit{chaining}\_\mathit{mul}$ | Value indicating whether exact multiplication and addition can be performed chaining |

$\mathit{Condition}\_\mathit{chaining}\_\mathit{add}$ | Value indicating whether additions can be performed chaining |

$\mathrm{D}$ | Set of additions |

Benchmarks | Nodes (Mult) | Designs | Wins | Losses | Draws | ILP Exceeding 1 h |
---|---|---|---|---|---|---|

HAL | 11 (6) | 14 | 0 | 0 | 14 | 0 |

FIR filter | 21 (11) | 19 | 0 | 0 | 19 | 0 |

Auto Regression Filter | 28 (16) | 36 | 0 | 0 | 36 | 0 |

Motion Vectors Decoder | 32 (14) | 37 | 0 | 15 | 22 | 0 |

Elliptic Wave Filter | 34 (8) | 16 | 0 | 3 | 13 | 0 |

Cosine | 42 (14) | 52 | 0 | 1 | 51 | 0 |

Feedback Points | 53 (17) | 43 | 0 | 10 | 33 | 1 |

Matrix Multiplication | 109 (40) | 129 | 1 | 20 | 108 | 4 |

Smooth Triangle | 197 (69) | 257 | 35 | 63 | 159 | 55 |

Matrix Inversion | 333 (140) | 516 | 235 | 104 | 177 | 257 |

Benchmarks | ILP | List Scheduling | ||||
---|---|---|---|---|---|---|

Max | Min | Mean | Max | Min | Mean | |

HAL | 0.130 | 0.010 | 0.083 | 0.006 | 0.005 | 0.006 |

FIR filter | 0.860 | 0.080 | 0.214 | 0.020 | 0.013 | 0.017 |

Auto Regression Filter | 8.730 | 0.080 | 0.899 | 0.049 | 0.023 | 0.037 |

Motion Vectors Decoder | 35.380 | 0.080 | 1.269 | 0.050 | 0.024 | 0.036 |

Elliptic Wave Filter | 0.220 | 0.050 | 0.127 | 0.033 | 0.029 | 0.031 |

Cosine | 2387 | 0.050 | 46.490 | 0.087 | 0.039 | 0.058 |

Feedback Points | >3600 | 0.080 | 85.617 | 0.138 | 0.066 | 0.100 |

Matrix Multiplication | >3600 | 0.200 | 149.748 | 2.044 | 0.527 | 1.092 |

Smooth Triangle | >3600 | 0.300 | 1155 | 15.804 | 2.865 | 7.204 |

Matrix Inversion | >3600 | 1.020 | 2116 | 170.288 | 16.217 | 70.913 |

AE (12 Cycle) | Mixed-ILP (12 Cycle) | Mixed-List (12 Cycle) | |
---|---|---|---|

LUT | 2686 | 2126 | 2228 |

FF | 612 | 582 | 518 |

DSP | 16 | 12 | 12 |

PSNR | $\infty $) | 94.57 | 94.44 |

Power (uW) | 29,189 | 25,924 | 27,777 |

Benchmarks | Nodes (Mult) | Designs | Wins | Losses | Draws | ILP Exceeding 1 h |
---|---|---|---|---|---|---|

HAL | 11 (6) | 13 | 0 | 7 | 6 | 0 |

FIR filter | 21 (11) | 22 | 0 | 3 | 19 | 0 |

Auto Regression Filter | 28 (16) | 36 | 0 | 3 | 33 | 0 |

Motion Vectors Decoder | 32 (14) | 34 | 0 | 31 | 3 | 0 |

Elliptic Wave Filter | 34 (8) | 10 | 0 | 10 | 0 | 0 |

Cosine | 42 (14) | 47 | 0 | 35 | 12 | 0 |

Feedback Points | 53 (17) | 44 | 0 | 2 | 42 | 2 |

Matrix Multiplication | 109 (40) | 124 | 3 | 111 | 10 | 4 |

Smooth Triangle | 197 (69) | 248 | 35 | 91 | 122 | 61 |

Matrix Inversion | 333 (140) | 518 | 235 | 105 | 178 | 321 |

Benchmarks | ILP | List Scheduling | ||||
---|---|---|---|---|---|---|

Max | Min | Mean | Max | Min | Mean | |

HAL | 0.140 | 0.050 | 0.083 | 0.014 | 0.008 | 0.010 |

FIR filter | 17.530 | 0.090 | 1.141 | 0.044 | 0.020 | 0.032 |

Auto Regression Filter | 60.840 | 0.090 | 3.220 | 0.115 | 0.039 | 0.079 |

Motion Vectors Decoder | 43.250 | 0.130 | 2.071 | 0.117 | 0.039 | 0.073 |

Elliptic Wave Filter | 0.530 | 0.160 | 0.049 | 0.058 | 0.042 | 0.380 |

Cosine | 1675 | 0.090 | 38.162 | 0.192 | 0.078 | 0.115 |

Feedback Points | >3600 | 0.110 | 161.273 | 0.291 | 0.116 | 0.188 |

Matrix Multiplication | >3600 | 0.630 | 448.173 | 4.266 | 0.791 | 2.162 |

Smooth Triangle | >3600 | 0.360 | 1273 | 30.939 | 4.555 | 13.714 |

Matrix Inversion | >3600 | 8.750 | 2418 | 327.881 | 24.718 | 132.795 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ohata, K.; Nishikawa, H.; Kong, X.; Tomiyama, H.
ILP-Based and Heuristic Scheduling Techniques for Variable-Cycle Approximate Functional Units in High-Level Synthesis. *Computers* **2022**, *11*, 146.
https://doi.org/10.3390/computers11100146

**AMA Style**

Ohata K, Nishikawa H, Kong X, Tomiyama H.
ILP-Based and Heuristic Scheduling Techniques for Variable-Cycle Approximate Functional Units in High-Level Synthesis. *Computers*. 2022; 11(10):146.
https://doi.org/10.3390/computers11100146

**Chicago/Turabian Style**

Ohata, Koyu, Hiroki Nishikawa, Xiangbo Kong, and Hiroyuki Tomiyama.
2022. "ILP-Based and Heuristic Scheduling Techniques for Variable-Cycle Approximate Functional Units in High-Level Synthesis" *Computers* 11, no. 10: 146.
https://doi.org/10.3390/computers11100146