# VLSI Implementation of a Cost-Efficient Loeffler DCT Algorithm with Recursive CORDIC for DCT-Based Encoder

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

^{2}. Its operating frequency was 100 MHz and power consumption was 4.17 mW. Moreover, this work had at least a 64.1% gate count reduction and saved at least 22.5% in power consumption compared to previous designs.

## 1. Introduction

## 2. The Image Compression Algorithm

#### 2.1. JPEG

#### 2.2. DCT

#### 2.3. 2-D DCT Using Row-Column 1-D DCT Architecture

#### 2.4. CORDIC Algorithm

#### 2.5. Loeffler DCT Algorithm with Recursive CORDIC

## 3. VLSI Architecture

## 4. Experimental Results of the Proposed Loeffler DCT Algorithm with Recursive CORDIC

_{u}is the original image in the uth layer, and K

_{u}is the reconstructed image in the uth layer for u ∈ {R,G,B} corresponding to the colors red, green, and blue, respectively. Moreover, the image size of each image in Table 1 is 512 × 512 pixels in which each pixel is 24-bits. The image size of each image in Table 2 is 768 × 512 pixels in which each pixel is 24-bits. Taking the image compression of Table 1 for example, the procedure of obtaining the reconstructed image K

_{u}is conducted by following these four steps: (1) the original images are divided into the 4096 8 × 8 image sub-blocks; (2) the value of each pixel in the image is shifted to [−128,127] from [0,255] to reduce the dynamic range requirements of the 2-D DCT; (3) after performing the 2-D DCT and quantization matrix on the shifted-version image block, the 2-D frequency contents for the image data are obtained, in which high-frequency components are relatively small or equal to zero; (4) the reconstructed image is obtained by using the opposite operations of the above first three steps, steps (1)–(3).

^{2}. Compared with previous works, the proposed design in this study achieved a 64.1% gate count reduction. Moreover, its operating frequency and power consumption are 100 MHz and 4.17 mW, respectively. Thus, the proposed design can save at least 22.5% power compared to that of the previous designs.

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Xu, K.; Qu, Y.; Yang, K. A tutorial on the internet of things: From a heterogeneous network integration perspective. IEEE Network
**2016**, 30, 102–108. [Google Scholar] [CrossRef] [Green Version] - Movassaghi, S.; Abolhasan, M.; Lipman, J.; Smith, D.; Jamalipour, A. Wireless body area networks: A survey. IEEE Commun. Surv. Tutor.
**2014**, 16, 1658–1686. [Google Scholar] [CrossRef] - Khan, I.; Belqasmi, F.; Glitho, R.; Crespi, N.; Morrow, M.; Polakos, P. Wireless sensor network virtualization: A survey. IEEE Commun. Surv. Tutor.
**2016**, 18, 553–576. [Google Scholar] [CrossRef] [Green Version] - Misra, S.; Reisslein, M.; Xue, G. A survey of multimedia streaming in wireless sensor networks. IEEE Commun. Surv. Tutor.
**2008**, 10, 18–39. [Google Scholar] [CrossRef] - Noel, A.B.; Abdaoui, A.; Elfouly, T.; Ahmed, M.H.; Badawy, A.; Shehata, M.S. Structural health monitoring using wireless sensor networks: A comprehensive survey. IEEE Commun. Surv. Tutor.
**2017**, 19, 1403–1423. [Google Scholar] [CrossRef] - Goldstein, P. Ericsson Backs Away from Expectation of 50B Connected Devices by 2020, Now Sees 26B. Available online: https://www.fiercewireless.com/wireless/ericsson-backs-away-from-expectation-50b-connected-devices-by-2020-now-sees-26b (accessed on 3 June 2015).
- Kobo, H.I.; Abu-Mahfouz, A.M.; Hancke, G.P. A survey on software-defined wireless sensor networks: Challenges and design requirements. IEEE Access
**2017**, 5, 1872–1899. [Google Scholar] [CrossRef] - Chen, C.-A.; Wu, C.; Abu, P.A.R.; Chen, S.-L. VLSI implementation of an efficient lossless EEG compression design for wireless body area network. Appl. Sci.
**2018**, 8, 1474. [Google Scholar] [CrossRef] [Green Version] - Chiang, W.-Y.; Ku, C.-H.; Chen, C.-A.; Wang, L.-Y.; Abu, P.A.R.; Rao, P.-Z.; Liu, C.-K.; Liao, C.-H.; Chen, S.-L. A power-efficient multiband planar USB dongle antenna for wireless sensor networks. Sensors
**2019**, 19, 2568. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chen, S.-L.; Chi, T.-K.; Tuan, M.-C.; Chen, C.-A.; Wang, L.-H.; Chiang, W.-Y.; Lin, M.-Y.; Abu, P.A.R. A novel low-power synchronous preamble data line chip design for oscillator control interface. Electronics
**2020**, 9, 1–16. [Google Scholar] - Zhou, L.; Chao, H.-C. Multimedia traffic security architecture for the internet of things. IEEE Network
**2011**, 25, 35–40. [Google Scholar] [CrossRef] - Mekonnen, T.; Porambage, P.; Harjula, E.; Ylianttila, M. Energy consumption analysis of high quality multi-tier wireless multimedia sensor network. IEEE Access
**2017**, 5, 15848–15858. [Google Scholar] [CrossRef] - Aurangzeb, K.; Alhussein, M.; O’Nils, M. Analysis of binary image coding methods for outdoor applications of wireless vision sensor networks. IEEE Access
**2018**, 6, 16932–16941. [Google Scholar] [CrossRef] - Chen, S.-L.; Liu, T.-Y.; Shen, C.-W.; Tuan, M.-C. VLSI implementation of a cost-efficient near- lossless CFA image compressor for wireless capsule endoscopy. IEEE Access
**2016**, 4, 10235–10245. [Google Scholar] [CrossRef] [Green Version] - Kougianos, E.; Mohanty, S.P.; Coelho, G.; Albalawi, U.; Sundaravadivel, P. Design of a high-performance system for secure image communication in the internet of things. IEEE Access
**2016**, 4, 1222–1242. [Google Scholar] [CrossRef] - Alletto, S.; Cucchiara, R.; Fiore, G.D.; Mainetti, L.; Mighali, V.; Patrono, L.; Serra, G. An indoor location-aware system for an IoT-based smart museum. IEEE Internet Things J.
**2016**, 3, 244–253. [Google Scholar] [CrossRef] - Schwager, M.; Julian, B.J.; Angermann, M.; Rus, D. Eyes in the sky: Decentralized control for the deployment of robotic camera networks. Proc. IEEE
**2011**, 99, 1541–1561. [Google Scholar] [CrossRef] [Green Version] - Pennebaker, W.; Mitchell, J. JPEG still Image Data Compression Standard; Van Nostrand Reinhold: New York, NY, USA, 1992. [Google Scholar]
- Andrea, P.; Scavongelli, C.; Orcioni, S.; Conti, M. Performance analysis of JPEG 2000 over 802.15.4 wireless image sensor network. In Proceedings of the 8th Workshop on Intelligent Solutions in Embedded Systems, Heraklion, Greece, 8–9 July 2010; pp. 55–60. [Google Scholar]
- Mohanty, S.P.; Kougianos, E.; Guturu, P. SBPG: Secure better portable graphics for trustworthy media communications in the IoT. IEEE Access
**2018**, 6, 5939–5953. [Google Scholar] [CrossRef] - Cohen, A.; Nissim, N.; Elovici, Y. MalJPEG: Machine learning based solution for the detection of malicious JPEG images. IEEE Access
**2020**, 30, 19997–20011. [Google Scholar] [CrossRef] - Harish, A.N.; Nissim, N.; Verma, V.; Khanna, N. Double JPEG compression detection for distinguishable blocks in images compressed with same quantization matrix. In Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), Espoo, Finland, 21–24 September 2020. [Google Scholar]
- Zeng, J.; Tan, S.; Li, B.; Huang, J. Large-scale JPEG image steganalysis using hybrid deep-learning framework. IEEE Trans. Inf. Forensics Secur.
**2018**, 13, 1200–1214. [Google Scholar] [CrossRef] [Green Version] - Coelho, D.F.G.; Cintra, R.J.; Kulasekera, S.; Madanayake, A.; Dimitrov, V.S. Error-free computation of 8-point discrete cosine transform based on the Loeffler factorisation and algebraic integers. IET Signal Process.
**2016**, 10, 633–640. [Google Scholar] [CrossRef] - Pastuszak, G. Hardware architectures for the H.265/HEVC discrete cosine transform. IET Image Process.
**2015**, 9, 468–477. [Google Scholar] [CrossRef] - Kalali, E.; Mert, A.C.; Hamzaoglu, I. A computation and energy reduction technique for HEVC discrete cosine transform. IEEE Trans. Consum. Electron.
**2016**, 62, 166–174. [Google Scholar] [CrossRef] - Masera, M.; Martina, M.; Masera, G. Adaptive approximated DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol.
**2017**, 27, 2714–2725. [Google Scholar] [CrossRef] [Green Version] - Loeffler, C.; Lightenberg, A.; Moschytz, G.S. Practical fast 1-D DCT algorithms with 11-multiplications. In Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing, Glasgow, UK, 23–26 May 1989; pp. 988–991. [Google Scholar]
- Sun, C.-C.; Ruan, S.-J.; Heyne, B.; Goetze, J. Low-power and high-quality Cordic-based Loeffler DCT for signal processing. IET Circuits Devices Syst.
**2007**, 1, 453–461. [Google Scholar] [CrossRef] - Lee, M.-W.; Yoon, J.-H.; Park, J. Reconfigurable CORDIC-based low-power DCT architecture based on data priority. IEEE Trans. VLSI Systems
**2014**, 22, 1060–1068. [Google Scholar] - Meher, P.K.; Valls, J.; Juang, T.-B.; Sridharan, K.; Maharatna, K. 50 years of CORDIC: Algorithms, architectures, and applications. IEEE Trans. Circuits Syst. -I
**2009**, 56, 1893–1907. [Google Scholar] [CrossRef] [Green Version] - Volder, J.E. The CORDIC Trigonometric Computing Technique. IRE Trans. Electron. Comput.
**1959**, EC-8, 330–334. [Google Scholar] [CrossRef] - Aggarwal, S.; Meher, P.K.; Khare, K. Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
**2016**, 24, 1588–1592. [Google Scholar] [CrossRef] - Chen, L.; Han, J.; Liu, W.; Lombardi, F. Algorithm and Design of a Fully Parallel Approximate Coordinate Rotation Digital Computer (CORDIC). IEEE Trans. on Multi-Scale Comput. Syst.
**2017**, 3, 139–151. [Google Scholar] [CrossRef] - Chung, R.-L.; Zhang, Y.-Q.; Chen, S.-L. Fully pipelined CORDIC-based inverse kinematics FPGA design for biped robots. Electron. Lett.
**2015**, 51, 1241–1243. [Google Scholar] [CrossRef] - Kim, B.; Ziavras, S.G. Low-power multiplierless DCT for image/video coders. In Proceedings of the 2009 IEEE 13th International Symposium on Consumer Electronics, Kyoto, Japan, 25–28 May 2009; pp. 133–136. [Google Scholar]
- Wu, Z.; Sha, J.; Wang, Z.; Li, L. An improved scaled DCT architecture. IEEE Trans. Consum. Electron.
**2009**, 55, 685–689. [Google Scholar] [CrossRef]

**Figure 3.**The two-dimensional discrete cosine transform (2-D DCT) realized by the row–column method based on the 1-D DCT.

**Figure 4.**Flow graph of the Loeffler DCT with recursive coordinate rotation digital computer (CORDIC).

**Figure 6.**Very large scale integration (VLSI) architecture of the proposed 9-stage pipeline 1-D DCT circuit.

**Figure 11.**Eight images used for the PSNR comparison in Table 3.

**Figure 12.**24 images used for the PSNR comparison in Table 4.

**Table 1.**Scale factors used in the work of Sun et al. [29] and in this work.

Scale Factor | Quantization Value | Quantization Erorr | Add | Shift |
---|---|---|---|---|

$\frac{1}{2}$ | ${2}^{-1}$ | 0 | 0 | 1 |

$\frac{1}{2\sqrt{2}}$ | ${2}^{-2}+{2}^{-4}+{2}^{-5}+{2}^{-7}+{2}^{-9}$ | $1.68\times {10}^{-4}$ | 4 | 5 |

$\frac{1}{3.1694}$ | ${2}^{-2}+{2}^{-4}+{2}^{-9}+{2}^{-10}$ | $2.77\times {10}^{-4}$ | 3 | 4 |

Iteration (i) | Angle = 3π/8 | Angle = 3π/16 | Angle = π/16 |
---|---|---|---|

0 | σ = 1 | σ = 1 | σ = 1 |

1 | σ = 1 | σ = −1 | σ = −1 |

2 | σ = −1 | σ = 1 | σ = −1 |

3 | σ = 1 | σ = 1 | σ = 1 |

4 | σ = 1 | σ = −1 | σ = −1 |

5 | σ = −1 | σ = −1 | σ = 1 |

6 | σ = 1 | σ = −1 | σ = 1 |

7 | σ = 1 | σ = 1 | σ = 1 |

8 | σ = −1 | σ = −1 | σ = 1 |

9 | σ = −1 | σ = 1 | σ = −1 |

10 | σ = 1 | σ = 1 | σ = 1 |

**Table 3.**Peak signal–noise ratio (PSNR) (dB) comparison of previous DCT algorithms and this work using the first image dataset shown in Figure 11.

Loeffler [28] | Sun [29] | Lee [30] | This Work | |
---|---|---|---|---|

Airplane | 35.85 | 34.83 | 35.48 | 35.84 |

Splash | 37.72 | 37.02 | 37.42 | 37.70 |

Lena | 34.51 | 33.96 | 34.37 | 34.50 |

Mandrill | 27.61 | 27.13 | 27.40 | 27.60 |

Girl | 34.68 | 34.29 | 34.48 | 34.67 |

House | 33.74 | 32.76 | 33.31 | 33.72 |

Peppers | 33.25 | 32.82 | 33.07 | 33.24 |

Sailboat | 31.04 | 30.49 | 30.85 | 31.04 |

Average | 33.55 | 32.91 | 33.30 | 33.54 |

**Table 4.**PSNR (dB) comparison of previous DCT algorithms and this work using the second image dataset shown in Figure 12.

Loeffler [28] | Sun [29] | Lee [30] | This Work | |
---|---|---|---|---|

Kodak01 | 28.57 | 27.99 | 28.33 | 28.56 |

Kodak02 | 32.93 | 32.58 | 32.81 | 32.92 |

Kodak03 | 34.33 | 33.86 | 34.17 | 34.32 |

Kodak04 | 33.10 | 32.55 | 32.96 | 33.09 |

Kodak05 | 28.87 | 27.91 | 28.57 | 28.86 |

Kodak06 | 30.02 | 29.49 | 29.81 | 30.00 |

Kodak07 | 33.94 | 32.93 | 33.71 | 33.93 |

Kodak08 | 28.35 | 27.37 | 27.86 | 28.34 |

Kodak09 | 33.84 | 33.01 | 33.59 | 33.83 |

Kodak10 | 33.62 | 32.86 | 33.36 | 33.61 |

Kodak11 | 30.81 | 30.27 | 30.61 | 30.80 |

Kodak12 | 33.96 | 33.32 | 33.71 | 33.94 |

Kodak13 | 26.25 | 25.67 | 26.02 | 26.24 |

Kodak14 | 30.12 | 29.51 | 29.94 | 30.19 |

Kodak15 | 32.88 | 32.33 | 32.62 | 32.87 |

Kodak16 | 32.32 | 31.99 | 32.19 | 32.31 |

Kodak17 | 32.73 | 32.13 | 32.52 | 32.72 |

Kodak18 | 29.52 | 28.93 | 29.32 | 29.51 |

Kodak19 | 31.35 | 30.59 | 31.02 | 31.34 |

Kodak20 | 32.72 | 32.03 | 32.42 | 32.71 |

Kodak21 | 30.40 | 29.78 | 30.16 | 30.40 |

Kodak22 | 31.39 | 30.92 | 31.23 | 31.38 |

Kodak23 | 35.84 | 34.95 | 35.57 | 35.82 |

Kodak24 | 29.27 | 28.61 | 29.00 | 29.26 |

Average | 31.55 | 30.90 | 31.31 | 31.54 |

**Table 5.**Comparison of computing resources of previous DCT algorithms and this work showing the multiply, add, and shift operations.

DCT Type | Multiply | Add | Shift |
---|---|---|---|

Loeffler DCT [28] | 22 | 58 | 8 |

Sun [29] | 0 | 120 | 92 |

Lee [30] | 0 | 192 | 172 |

This work without hardware sharing machine | 0 | 108 | 96 |

This work with hardware sharing machine | 0 | 28 | 11 |

Original | Loeffler [28] | Sun [29] | Lee [30] | |
---|---|---|---|---|

Lena | ||||

PSNR | 34.51 dB | 33.96 dB | 34.37 dB | |

Kodak03 | ||||

PSNR | 34.33 dB | 33.96 dB | 34.37 dB |

Performance Metric | Sun et al. [29] | Lee et al. [30] | Kim et al. [36] | Wu et al. [37] | This Study |
---|---|---|---|---|---|

PSNR (dB) | 30.90 | 31.31 | 31.49 | 31.55 | 31.54 |

Compression Ratio | 9.86 | 9.86 | 9.86 | 9.86 | 9.86 |

Process (µm) | TSMC 0.13 | TSMC 0.13 | TSMC 0.13 | TSMC 0.13 | UMC 0.18 |

Operating Frequency (MHz) | 100 | 100 | 100 | 100 | 100 |

Gate Count (k) | 27.30 | 22.40 | 24.60 | 31.50 | 8.04 |

Power (mW) | 6.54 | 5.11 | 5.42 | 5.62 | 4.17 |

Core Area (µm^{2}) | 255 k | 209.2 k | 229.8 k | 294.2 k | 75.1 k |

Memory | 96 | 96 | 96 | 96 | 96 |

Normalized Gate Count | 3.40 | 2.79 | 3.06 | 3.92 | 1.00 |

FOM | 11.16 | 13.78 | 12.62 | 9.88 | 38.68 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chung, R.-L.; Chen, C.-W.; Chen, C.-A.; Abu, P.A.R.; Chen, S.-L.
VLSI Implementation of a Cost-Efficient Loeffler DCT Algorithm with Recursive CORDIC for DCT-Based Encoder. *Electronics* **2021**, *10*, 862.
https://doi.org/10.3390/electronics10070862

**AMA Style**

Chung R-L, Chen C-W, Chen C-A, Abu PAR, Chen S-L.
VLSI Implementation of a Cost-Efficient Loeffler DCT Algorithm with Recursive CORDIC for DCT-Based Encoder. *Electronics*. 2021; 10(7):862.
https://doi.org/10.3390/electronics10070862

**Chicago/Turabian Style**

Chung, Rih-Lung, Chen-Wei Chen, Chiung-An Chen, Patricia Angela R. Abu, and Shih-Lun Chen.
2021. "VLSI Implementation of a Cost-Efficient Loeffler DCT Algorithm with Recursive CORDIC for DCT-Based Encoder" *Electronics* 10, no. 7: 862.
https://doi.org/10.3390/electronics10070862