#
Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis^{ †}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Background

## 3. Architectural Implementation

#### 3.1. Datapath

#### 3.2. Control Unit

- START: write input buffer
- WAIT: read input & write output buffer
- WB (Write Buffer): write input & read output buffer
- RWB (Read and Write Buffer): read input & write output & read output buffer
- RB (Read Buffer): read output buffer

#### 3.3. Reduced SDCT Architectures

## 4. Results

- two-dimensional DCT
- SDCT
- reduced SDCT-16
- reduced SDCT-8

#### 4.1. Reduced SDCT Compression Savings

#### 4.2. Comparison with Previous Works

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Sullivan, G.J.; Ohm, J.; Han, W.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol.
**2012**, 22, 1649–1668. [Google Scholar] [CrossRef] - Naccari, M.; Gabriellini, A.; Mrak, M.; Blasi, S.G.; Zupancic, I.; Izquierdo, E. HEVC Coding Optimisation for Ultra High Definition Television Services. In Proceedings of the Picture Coding Symposium, Cairns, Australia, 31 May–3 June 2015; pp. 20–24. [Google Scholar] [CrossRef]
- Masera, M.; Fiorentin, L.R.; Masala, E.; Masera, G.; Martina, M. Analysis of HEVC Transform Throughput Requirements for Hardware Implementations. Elsevier Signal Process. Image Commun.
**2017**, 57, 173–182. [Google Scholar] [CrossRef] - Chen, Z.; Han, Q.; Cham, W. Low-Complexity Order-64 Integer Cosine Transform Design and its Application in HEVC. IEEE Trans. Circuits Syst. Video Technol.
**2018**, 28, 2407–2412. [Google Scholar] [CrossRef] - Sun, H.; Cheng, Z.; Gharehbaghi, A.M.; Kimura, S.; Fujita, M. Approximate DCT Design for Video Encoding Based on Novel Truncation Scheme. IEEE Trans. Circuits Syst. I Regul. Pap.
**2019**, 66, 1517–1530. [Google Scholar] [CrossRef] - Oliveira, R.S.; Cintra, R.J.; Bayer, F.M.; da Silveira, T.L.T.; Madanayake, A.; Leite, A. Low-complexity 8-point DCT approximation based on angle similarity for image and video coding. Multidimens. Syst. Signal Process.
**2019**, 30, 1363–1394. [Google Scholar] [CrossRef] [Green Version] - Fracastoro, G.; Fosson, S.M.; Magli, E. Steerable Discrete Cosine Transform. IEEE Trans. Image Process.
**2017**, 26, 303–314. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Masera, M.; Fracastoro, G.; Martina, M.; Magli, E. A Novel Framework for Designing Directional Linear Transforms with Application to Video Compression. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1812–1816. [Google Scholar] [CrossRef]
- Sole, L.; Peloso, R.; Capra, M.; Roch, M.R.; Masera, G.; Martina, M. VLSI Architectures for the Steerable-Discrete-Cosine-Transform (SDCT). Applications in Electronics Pervading Industry, Environment and Society (ApplePies) 2019, to Appear. Available online: https://applepies.eu/ (accessed on 13 September 2019).
- Sole, L. VLSI architectures for the Steerable Discrete Cosine Transform. Master’s Thesis, Politecnico di Torino, Turin, Italy, 2018. [Google Scholar]
- Zeng, B.; Fu, J. Directional Discrete Cosine Transforms—A New Framework for Image Coding. IEEE Trans. Circuits Syst. Video Technol.
**2008**, 18, 305–313. [Google Scholar] [CrossRef] - Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag.
**2013**, 30, 83–98. [Google Scholar] [CrossRef] [Green Version] - Meher, P.K.; Park, S.Y.; Mohanty, B.K.; Lim, K.S.; Yeo, C. Efficient Integer DCT Architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol.
**2014**, 24, 168–178. [Google Scholar] [CrossRef] - Daubechies, I.; Sweldens, W. Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl.
**1998**, 4, 247–269. [Google Scholar] [CrossRef] - Bjontegard, G. Calculation of Average PSNR Differences between RD-curves. In Proceedings of the ITU—Telecommunications Standardization Sector STUDY GROUP 16 Video Coding Experts Group (VCEG), 13th Meeting, Austin, TX, USA, 2–4 April 2001. [Google Scholar]
- Zhao, W.; Onoye, T.; Song, T. High-performance multiplierless transform architecture for HEVC. In Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS), Beijing, China, 19–23 May 2013; pp. 1668–1671. [Google Scholar] [CrossRef]
- Ahmed, A.; Shahid, U.; Rehman, A. N point DCT VLSI architecture for emerging HEVC standard. VLSI Design
**2012**, 2012. [Google Scholar] [CrossRef] [Green Version] - Masera, M.; Martina, M.; Masera, G. Adaptive Approximated DCT Architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol.
**2017**, 27, 2714–2725. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**Example of Discrete Cosine Transform (DCT) and Steerable Discrete Cosine Transform (SDCT) kernels.

write input buffer | A | START |

read input & write output buffer | B | WAIT |

write input & read output buffer | C, F, I, L | WB |

read input & write output & read output buffer | D, G, H, M | RWB |

read output buffer | $E,{E}_{1},{E}_{2},{E}_{3}$ | RB |

Memory Operation | Number of Cycles | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Write | 16 | X | 16 | X | 8 | X | 4 | X | 4 | 12 | X | X |

Read & Write | X | 16 | X | 16 | X | 8 | X | 4 | X | X | 16 | X |

Read | X | X | 16 | X | 8 | 8 | 4 | 4 | 4 | X | X | 16 |

State | A | B | C | B | C | D | C | D | C | A | B | E |

16 | 16 | 8 | 4 | 16 |

Power | Internal | Switching | Total Dynamic | Leakage |
---|---|---|---|---|

basic DCT | $36.55$ mW | $17.72$ mW | $54.47$ mW | $33\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated DCT | 21 mW | $12.52$ mW | $33.52$ mW | $30\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

basic SDCT | $290.47$ mW | $60.33$ mW | $350.88$ mW | $106\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT | $88.71$ mW | $59.85$ mW | $148.67$ mW | $94\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT-16 | $27.86$ mW | $28.97$ mW | $56.85$ mW | $27\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

clock gated SDCT-8 | $6.56$ mW | $7.20$ mW | $14.17$ mW | $7\phantom{\rule{4pt}{0ex}}\mathsf{\mu}$W |

Cell | 1× Total Area | 2× Total Area | 4× Total Area | 8× Total Area |
---|---|---|---|---|

SDCT | $\mathrm{4,337,744}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{3,042,226}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,608,759}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,301,522}$$\mathsf{\mu}$m${}^{2}$ |

2D-DCT | $\mathrm{438,866}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{601,970}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{455,150}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{474,167}$$\mathsf{\mu}$m${}^{2}$ |

IM | $\mathrm{1,401,523}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{820,032}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{495,856}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{335,932}$$\mathsf{\mu}$m${}^{2}$ |

OM | $\mathrm{2,377,837}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{1,418,162}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{482,048}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{319,037}$$\mathsf{\mu}$m${}^{2}$ |

FIFO | $\mathrm{86,542}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{110,594}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{113,008}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{110,604}$$\mathsf{\mu}$m${}^{2}$ |

ROM | 5895 $\mathsf{\mu}$m${}^{2}$ | $\mathrm{22,228}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{13,227}$$\mathsf{\mu}$m${}^{2}$ | $\mathrm{33,223}$$\mathsf{\mu}$m${}^{2}$ |

Architecture | DCT | SDCT | SDCT-16 | SDCT-8 |
---|---|---|---|---|

Technology (nm) | 65 | 65 | 65 | 65 |

Frequency (MHz) | 188 | 188 | 188 | 188 |

Power (mW) | $33.52$ | $148.67$ | $56.85$ | $14.17$ |

Throughput | $2.992$ G | $2.992$ G | $1.496$ G | $0.748$ G |

Area (mm${}^{2}$) | $0.321$ | $1.427$ | $0.444$ | $0.110$ |

Sequence | SDCT [8] | SDCT-16 | SDCT-8 |
---|---|---|---|

Kimono | −0.795 | −0.144 | −0.020 |

ParkScene | −0.617 | −0.500 | −0.128 |

Cactus | −0.485 | −0.392 | −0.209 |

BQTerrace | −0.265 | −0.267 | −0.193 |

BasketballDrive | −0.199 | −0.174 | −0.112 |

Average | −0.472 | −0.295 | −0.132 |

Design | Technology | Frequency | Throughput | Power | EPS | |
---|---|---|---|---|---|---|

[nm] | [MHz] | [Gsps] | [mW] | [pJ] | ||

Zhao et al. [16] | 45 | 333 | 0.634 | - | - | |

Ahmed et al. [17] | 90 | 150 | 0.246 | - | - | |

Meher et al. [13] | Folded | 90 | 187 | 2.992 | 40.04 | 13.38 |

Full-parallel | 90 | 187 | 5.984 | 67.57 | 11.29 | |

Masera et al. [18] | Architecture 1 | 90 | 250 | 3.212 | 51.72 | 16.10 |

SDCT | Folded | 65 | 188 | 2.992 | 148.67 | 49.69 |

Folded-16 | 65 | 188 | 1.496 | 56.85 | 38 | |

Folded-8 | 65 | 188 | 0.748 | 14.17 | 18.94 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Peloso, R.; Capra, M.; Sole, L.; Ruo Roch, M.; Masera, G.; Martina, M.
Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis. *Sensors* **2020**, *20*, 1405.
https://doi.org/10.3390/s20051405

**AMA Style**

Peloso R, Capra M, Sole L, Ruo Roch M, Masera G, Martina M.
Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis. *Sensors*. 2020; 20(5):1405.
https://doi.org/10.3390/s20051405

**Chicago/Turabian Style**

Peloso, Riccardo, Maurizio Capra, Luigi Sole, Massimo Ruo Roch, Guido Masera, and Maurizio Martina.
2020. "Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis" *Sensors* 20, no. 5: 1405.
https://doi.org/10.3390/s20051405