Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis^{ †}

## Abstract

## 1. Introduction

## 2. Background

## 3. Architectural Implementation

#### 3.1. Datapath

#### 3.2. Control Unit

- START: write input buffer
- WAIT: read input & write output buffer
- WB (Write Buffer): write input & read output buffer
- RWB (Read and Write Buffer): read input & write output & read output buffer
- RB (Read Buffer): read output buffer

#### 3.3. Reduced SDCT Architectures

## 4. Results

- two-dimensional DCT
- SDCT
- reduced SDCT-16
- reduced SDCT-8

#### 4.1. Reduced SDCT Compression Savings

#### 4.2. Comparison with Previous Works

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 2.**Example of Discrete Cosine Transform (DCT) and Steerable Discrete Cosine Transform (SDCT) kernels.

write input buffer | A | START |

read input & write output buffer | B | WAIT |

write input & read output buffer | C, F, I, L | WB |

read input & write output & read output buffer | D, G, H, M | RWB |

read output buffer | $E,{E}_{1},{E}_{2},{E}_{3}$ | RB |

Memory Operation | Number of Cycles | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Write | 16 | X | 16 | X | 8 | X | 4 | X | 4 | 12 | X | X |

Read & Write | X | 16 | X | 16 | X | 8 | X | 4 | X | X | 16 | X |

Read | X | X | 16 | X | 8 | 8 | 4 | 4 | 4 | X | X | 16 |

State | A | B | C | B | C | D | C | D | C | A | B | E |

16 | 16 | 8 | 4 | 16 |

Power | Internal | Switching | Total Dynamic | Leakage |
---|---|---|---|---|

basic DCT | $36.55$ mW | $17.72$ mW | $54.47$ mW | $33\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated DCT | 21 mW | $12.52$ mW | $33.52$ mW | $30\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

basic SDCT | $290.47$ mW | $60.33$ mW | $350.88$ mW | $106\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT | $88.71$ mW | $59.85$ mW | $148.67$ mW | $94\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT-16 | $27.86$ mW | $28.97$ mW | $56.85$ mW | $27\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT-8 | $6.56$ mW | $7.20$ mW | $14.17$ mW | $7\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

Cell | 1× Total Area | 2× Total Area | 4× Total Area | 8× Total Area |
---|---|---|---|---|

SDCT | $\mathrm{4,337,744}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{3,042,226}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,608,759}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,301,522}$$\mathsf{\mu}$m${}^{2}$ |

2D-DCT | $\mathrm{438,866}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{601,970}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{455,150}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{474,167}$$\mathsf{\mu}$m${}^{2}$ |

IM | $\mathrm{1,401,523}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{820,032}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{495,856}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{335,932}$$\mathsf{\mu}$m${}^{2}$ |

OM | $\mathrm{2,377,837}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,418,162}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{482,048}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{319,037}$$\mathsf{\mu}$m${}^{2}$ |

FIFO | $\mathrm{86,542}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{110,594}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{113,008}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{110,604}$$\mathsf{\mu}$m${}^{2}$ |

ROM | 5895 $\mathsf{\mu}$m${}^{2}$ | $\mathrm{22,228}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{13,227}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{33,223}$$\mathsf{\mu}$m${}^{2}$ |

Architecture | DCT | SDCT | SDCT-16 | SDCT-8 |
---|---|---|---|---|

Technology (nm) | 65 | 65 | 65 | 65 |

Frequency (MHz) | 188 | 188 | 188 | 188 |

Power (mW) | $33.52$ | $148.67$ | $56.85$ | $14.17$ |

Throughput | $2.992$ G | $2.992$ G | $1.496$ G | $0.748$ G |

Area (mm${}^{2}$) | $0.321$ | $1.427$ | $0.444$ | $0.110$ |

Sequence | SDCT [8] | SDCT-16 | SDCT-8 |
---|---|---|---|

Kimono | −0.795 | −0.144 | −0.020 |

ParkScene | −0.617 | −0.500 | −0.128 |

Cactus | −0.485 | −0.392 | −0.209 |

BQTerrace | −0.265 | −0.267 | −0.193 |

BasketballDrive | −0.199 | −0.174 | −0.112 |

Average | −0.472 | −0.295 | −0.132 |

Design | Technology | Frequency | Throughput | Power | EPS | |
---|---|---|---|---|---|---|

[nm] | [MHz] | [Gsps] | [mW] | [pJ] | ||

Zhao et al. [16] | 45 | 333 | 0.634 | - | - | |

Ahmed et al. [17] | 90 | 150 | 0.246 | - | - | |

Meher et al. [13] | Folded | 90 | 187 | 2.992 | 40.04 | 13.38 |

Full-parallel | 90 | 187 | 5.984 | 67.57 | 11.29 | |

Masera et al. [18] | Architecture 1 | 90 | 250 | 3.212 | 51.72 | 16.10 |

SDCT | Folded | 65 | 188 | 2.992 | 148.67 | 49.69 |

Folded-16 | 65 | 188 | 1.496 | 56.85 | 38 | |

Folded-8 | 65 | 188 | 0.748 | 14.17 | 18.94 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

