# Real-Time CLAHE Algorithm Implementation in SoC FPGA Device for 4K UHD Video Stream

^{*}

## Abstract

**:**

## 1. Introduction

- determining the histogram for the entire image frame (the image is assumed to be in greyscale);
- calculating the cumulative histogram and normalising it (to the range of 0–255); and
- performing look-up table (LUT) operations on the image, with recoding in the form of a normalised cumulative histogram.

- We propose a hardware implementation of the CLAHE algorithm on an FPGA platform, enabling real-time processing of a 4K (Ultra HD) video stream, which to our best knowledge has not been done before; and
- We use a vector stream format (4 ppc) to implement the CLAHE algorithm, which should be considered as an architectural novelty due to required redesign of its components.

## 2. CLAHE Algorithm

- the division of the image into rectangular, non-overlapping windows;
- the computation of the histogram for each window and its redistribution;
- the calculation of the LUT mapping function; and
- the interpolation of the resulting pixel values.

#### 2.1. Tiles Generation

#### 2.2. Histogram Calculation and Redistribution

- n is the grey level, histogram bin;
- $h\left(n\right)$ is the histogram value for the n-th bin;
- N is the number of histogram bins (256 in this case);
- $XX,YY$ are the dimensions of the image block;
- $i,j$ are the coordinates of a pixel;
- $g(n,i,j)$ is the function that determines whether the value of a pixel with coordinates $(i,j)$ is equal to n; and
- $I(i,j)$ is the value of the pixel with coordinates $(i,j)$.

**Listing 1.**The redistribution algorithm in its basic version, described in [10].

excess = 0; |

for (i = 0; i < N; ++i) { |

if (h[ i ] > β) { |

excess += h[ i ] − β; |

} |

} |

m = excess / N; |

for (i = 0; i < N; ++i) { |

if (h[ i ] < β − m) { |

h[ i ] += m; |

excess −= m; |

} |

else if (h[ i ] < β) { |

excess += h[ i ] − β; |

h[ i ] = β; |

} |

} |

while (excess > 0) { |

for ( i = 0; i < N; ++i ) { |

if (excess > 0) { |

if (h[ i ] < β) { |

h[ i ] += 1; |

excess −= 1; |

} |

} |

} |

} |

#### 2.3. Mapping Function

- $i,j$ are the coordinates of the image window;
- M is the number of pixels in the window;
- N is the number of grey levels (histogram bins); and
- ${h}_{i,j}$ is the histogram of the image window with coordinates $(i,j)$.

#### 2.4. Bilinear Interpolation

#### 2.5. Applications of CLAHE

## 3. Related Work

## 4. Hardware Implementation

#### 4.1. Generation of Tiles

#### 4.2. Histogram Calculation

#### 4.3. Redistribution

#### 4.4. LUT-Based Mapping Function

#### 4.5. Interpolation Method

## 5. Results

## 6. Discussion

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 3rd ed.; Prentice-Hall, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
- Tom, V.T.; Wolfe, G.J. Adaptive histogram equalization and its applications. In Applications of Digital Image Processing IV; SPIE: Bellingham, WA, USA, 1983; pp. 204–209. [Google Scholar]
- Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; Romeny, B.H.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vision Graph. Image Process.
**1987**, 39, 355–368. [Google Scholar] [CrossRef] - Blachut, K.; Kryjak, T. Real-Time Efficient FPGA Implementation of the Multi-Scale Lucas-Kanade and Horn-Schunck Optical Flow Algorithms for a 4K Video Stream. Sensors
**2022**, 22, 5017. [Google Scholar] [CrossRef] [PubMed] - Shrivastava, S.; Choudhury, Z.; Khandelwal, S.; Purini, S. FPGA Accelerator for Stereo Vision using Semi-Global Matching through Dependency Relaxation. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden, 31 August–4 September 2020; pp. 304–309. [Google Scholar] [CrossRef]
- Chen, Z.; Li, S.; Zhang, N.; Hao, Y.; Zhang, X. Eye-to-Hand Robotic Visual Tracking Based on Template Matching on FPGAs. IEEE Access
**2019**, 7, 88870–88880. [Google Scholar] [CrossRef] - Yu, Y.H.; Ting, Y.S.; Kwok, N.; Mayer, N.M. High-speed gaze detection using a single FPGA for driver assistance systems. J. Real-Time Image Proc.
**2021**, 18, 681–690. [Google Scholar] [CrossRef] - Boikos, K.; Bouganis, C. Semi-dense SLAM on an FPGA SoC. In Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 29 August–2 September 2016; pp. 1–4. [Google Scholar] [CrossRef][Green Version]
- Guo, K.; Zeng, S.; Yu, J.; Wang, Y.; Yang, H. [DL] A Survey of FPGA-based Neural Network Inference Accelerators. ACM Trans. Reconfigurable Technol. Syst.
**2019**, 12, 2. [Google Scholar] [CrossRef] - Kokufuta, K.; Maruyama, T. Real-time processing of contrast limited adaptive histogram equalization on FPGA. In Proceedings of the 2010 International Conference on Field Programmable Logic and Applications, Milano, Italy, 31 August–2 September 2010; pp. 155–158. [Google Scholar]
- Koonsanit, K.; Thongvigitmanee, S.; Pongnapang, N.; Thajchayapong, P. Image enhancement on digital X-ray images using N-CLAHE. In Proceedings of the 2017 10th Biomedical Engineering International Conference (BMEiCON), Hokkaido, Japan, 31 August–2 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Umri, B.K.; Wafa Akhyari, M.; Kusrini, K. Detection of Covid-19 in Chest X-ray Image using CLAHE and Convolutional Neural Network. In Proceedings of the 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), Manado, Indonesia, 27–28 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Sahu, S.; Singh, A.K.; Ghrera, S.P.; Elhoseny, M. An approach for de-noising and contrast enhancement of retinal fundus image using CLAHE. Opt. Laser Technol.
**2019**, 110, 87–98. [Google Scholar] [CrossRef] - Muzammil, N.; Shah, S.A.A.; Shahzad, A.; Khan, M.A.; Ghoniem, R.M. Multifilters-Based Unsupervised Method for Retinal Blood Vessel Segmentation. Appl. Sci.
**2022**, 12, 6393. [Google Scholar] [CrossRef] - Konyar, M.Z.; Ertürk, S. Enhancement of ultrasound images with bilateral filter and Rayleigh CLAHE. In Proceedings of the 2015 23nd Signal Processing and Communications Applications Conference (SIU), Malatya, Turkey, 16–19 May 2015; pp. 1861–1864. [Google Scholar] [CrossRef]
- Kharel, N.; Alsadoon, A.; Prasad, P.W.C.; Elchouemi, A. Early diagnosis of breast cancer using contrast limited adaptive histogram equalization (CLAHE) and Morphology methods. In Proceedings of the 2017 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017; pp. 120–124. [Google Scholar] [CrossRef]
- Garg, D.; Garg, N.K.; Kumar, M. Underwater image enhancement using blending of CLAHE and percentile methodologies. Multimed. Tools Appl.
**2018**, 77, 26545–26561. [Google Scholar] [CrossRef] - Zheng, L.; Shi, H.; Sun, S. Underwater image enhancement algorithm based on CLAHE and USM. In Proceedings of the 2016 IEEE International Conference on Information and Automation (ICIA), Macau, China, 18–20 July 2017 2016; pp. 585–590. [Google Scholar] [CrossRef]
- Cherian, A.K.; Poovammal, E.; Philip, N.S.; Ramana, K.; Singh, S.; Ra, I.-H. Deep Learning Based Filtering Algorithm for Noise Removal in Underwater Images. Water
**2021**, 13, 2742. [Google Scholar] [CrossRef] - Yanfeng, L.; Zhuanzhuan, M.; Fengrong, Z.; Huamin, Y. Infrared and Visible Image Fusion Based on CLAHE and Sparse Representation. In Proceedings of the 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 12–14 July 2019; pp. 472–475. [Google Scholar] [CrossRef]
- Musa, P.; Rafi, F.A.; Lamsani, M. A Review: Contrast-Limited Adaptive Histogram Equalization (CLAHE) methods to help the application of face recognition. In Proceedings of the 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia, 17–18 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Kumar, M.; Jindal, S.R. Fusion of RGB and HSV colour space for foggy image quality enhancement. Multimed. Tools Appl.
**2019**, 78, 9791–9799. [Google Scholar] [CrossRef] - Honda, K.; Wei, K.; Arai, M.; Amano, H. CLAHE Implementation and Evaluation on a Low-End FPGA Board by High-Level Synthesis. IEICE Trans. Inf. Syst.
**2021**, 104, 2048–2056. [Google Scholar] [CrossRef] - Unal, B.; Akoglu, A. Resource efficient real-time processing of contrast limited adaptive histogram equalization. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 29 August–2 September 2016; pp. 1–8. [Google Scholar]
- Kim, D.; Hyun, J.; Moon, B. Memory-efficient architecture for contrast enhancement and integral image computation. In Proceedings of the 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, Spain, 19–22 January 2020; pp. 1–4. [Google Scholar]
- Xu, C.; Peng, Z.; Hu, X.; Zhang, W.; Chen, L.; An, F. FPGA-based low-visibility enhancement accelerator for video sequence by adaptive histogram equalization with dynamic clip-threshold. IEEE Trans. Circuits Syst. I Regul. Pap.
**2020**, 67, 3954–3964. [Google Scholar] [CrossRef] - Kowalczyk, M.; Przewlocka, D.; Kryjak, T. Real-time implementation of contextual image processing operations for 4K video stream in Zynq ultrascale+ MPSoC. In Proceedings of the 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP), Porto, Portugal, 10–12 October 2018; pp. 37–42. [Google Scholar]
- Kowalczyk, M.; Ciarach, P.; Przewlocka-Rus, D.; Szolc, H.; Kryjak, T. Real-time FPGA implementation of parallel connected component labelling for a 4K video stream. J. Signal Process. Syst.
**2021**, 93, 481–498. [Google Scholar] [CrossRef] - Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools
**2000**, 25, 120–123. [Google Scholar] - Yakun, C.; Cheolkon, J.; Peng, K.; Hyoseob, S.; Jungmee, H. Automatic Contrast-Limited Adaptive Histogram Equalization with Dual Gamma Correction. IEEE Access
**2018**, 6, 11782–11792. [Google Scholar] [CrossRef] - Siti, A.; Nasir, T.M.; Khalid, A.; Elaiza, N.; Rohana, A.; Haslina, T. The effect of sharp contrast-limited adaptive histogram equalization (SCLAHE) on Intra-oral dental radiograph images. In Proceedings of the 2010 IEEE EMBS Conference on Biomedical Engineering and Sciences, IECBES, Kuala Lumpur, Malaysia, 30 November–2 December 2010; pp. 400–405. [Google Scholar] [CrossRef]

**Figure 1.**Comparison of Global Histogram Equalization (GHE), Adaptive Histogram Equalization (AHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE). The figures show successively (from the left) the original image, the image after applying GHE, AHE, and CLAHE algorithms. The GHE algorithm does not perform well when there is a significant difference between the highest and the lowest intensity in the image. The main drawback of the AHE is visible in the homogeneous regions (like sky in the mountains image). The CLAHE algorithm performs well in both situations.

**Figure 2.**Overview of the CLAHE algorithm. First, the input image is divided into rectangular, non-overlapping windows (blocks, tiles). Second, for each of the tiles, a clipped histogram is computed. The excess is then redistributed and the cumulative distribution function (CDF) is computed. Finally, by using the CFD as the mapping function and interpolating the pixel values, the new image is obtained. Note that for each tile, the histograms are more uniform.

**Figure 3.**Example of division of an input image into blocks, corner regions (CR), border regions (BR), and inner regions (IR).

**Figure 5.**The second stage of the redistribution procedure: the first iteration of redistribution may result in some remaining excess.

**Figure 8.**Linear interpolation scheme (for pixels on the image edge). C stands for corner region and B for border region.

**Figure 9.**Simplified scheme of the system architecture. We use an HDMI source (camera/PC) and sink (UHD LCD monitor). All computations are performed in the PL/FPGA part of the used Zynq UltraScale+ MPSoC device.

**Figure 10.**Block diagram of the implemented CLAHE algorithm in the 4 ppc mode. For a stream in the 4 ppc format, a conversion to greyscale (RGB2grey) is performed 4 times simultaneously and 4 partial histograms with 4 partial excesses (omitted from the diagram for clarity) are calculated. The histograms are then integrated, a redistribution is performed, and a mapping function is determined—these steps do not require working with a vector format. The final step—interpolation—is again performed on the 4 ppc stream.

**Figure 11.**Block diagram of the implemented histograms’ calculation. We use 4 partial histograms and excesses (one for each pixel from the 4 ppc), which are then aggregated by using summation trees. They are grouped into two buffers – one for the odd lines of the image and the other for the even lines. Thanks to that, we can simultaneously read the prepared data from one buffer and perform further calculations using the other.

**Figure 12.**Scheme of the implementation of the redistribution process. We read the input histogram sequentially (bin after bin), count the bins filled to the limit $\beta $, and combine this value together with the input excess to calculate m (average excess) and e (residual). The input histogram is then stored in the DRAM buffer. After calculating the mentioned parameters, we read the histogram from the DRAM buffer and perform the first iteration of the redistribution, in which we try to add $m+1$ to each bin. If it is not possible, we fill the bin to the limit $\beta $ and accordingly modify the value of e. The resulting histogram is stored in another DRAM buffer. After that, we perform the next 3 iterations, this time adding only 1 pixel to each bin (thus decreasing the value of e), where it is possible. Between each iteration there are also DRAM buffers (not present in the diagram for clarity). In this way, we finally obtain a redistributed histogram.

**Figure 13.**Example results for test images. The figures show successively (from the left) the original image, the result of the CLAHE algorithm for 4 × 4 blocks, the result of the CLAHE algorithm for 8 × 8 blocks, and the result of the CLAHE algorithm for 16 × 16 blocks.

**Figure 14.**Photo of the proposed system in operation. The input video signal is transmitted from the source (a computer) to the ZCU 104 board, equipped with the AMD Xilinx Zynq UltraScale+ MPSoC chip. The output image—after applying the CLAHE algorithm—is transmitted and displayed on a 4K monitor.

**Figure 15.**Comparison of output images obtained with the “classic” (

**left**) and OpenCV (

**right**) redistribution. In this case, the redistribution from OpenCV results in a brighter image.

**Figure 16.**Comparison of different methods of generating the output image: bilinear interpolation (

**top left**), no interpolation with $4\times 4$ grid of blocks (

**top right**), mean CDF with $4\times 4$ grid of blocks (

**bottom left**) and mean CDF with $16\times 16$ grid of blocks (

**bottom right**). Bilinear interpolation provides the best results with almost no visible artefacts.

**Figure 17.**Comparison of output images obtained with constant (

**left**) and adaptive (

**right**) limit $\beta $. The difference between them is hardly noticeable for the considered image.

**Figure 18.**Comparison of output images obtained without (

**left**) and with (

**right**) a Laplace filter on the input image.

**Figure 19.**Results of the CLAHE algorithm for an exemplary colour image (

**top left**). We use CLAHE on different image channels: V from HSV model (

**top centre**), R, B and G from RGB model together (

**top right**), and separately R (

**bottom left**), G (

**bottom centre**), B (

**bottom right**).

**Table 1.**Comparison of the most important parameters of hardware implementations of the CLAHE algorithm on FPGA platforms and resource utilisation for the CLAHE module. Only our solution supports the 4K resolution, but despite processing 4 pixels at once, the resource utilisation is comparable to Full HD solutions, e.g., [23]. Low memory utilisation of our solution is also worth noting, which was possible to achieve by using the ping-pong buffering technique during histogram calculation. The utilisation of DSP resources in not presented, as it was rarely reported in other works.

Implementation | Platform | Resolution | FPS | Frequency [MHz] | # of LUTs | # of Flip-Flops | # of Block RAMs |
---|---|---|---|---|---|---|---|

Kokufuta [10] | AMD Xilinx XC4VLX160 | 640 × 480 | 538 | 209 | 43,915 | - | 192 |

ine Unal [24] | AMD Xilinx Zynq-7000 | 640 × 480 | 354 | 109 | 4766 | 440 | 16 |

ine Unal [24] | AMD Xilinx Zynq-7000 | 1920 × 1080 | 33 | 69 | - | - | - |

ine Kim [25] | AMD Xilinx XC7Z045 | 512 × 512 | 492 | 129 | 98,945 | 85,600 | 8 |

ine Xu [26] | Altera Cyclone V | 1920 × 1080 | 30 | 76 | 14,807 * | 4794 | 9 |

ine Honda [23] | AMD Xilinx PYNQ Z1 | 1920 × 1080 | 47 | 111 | 29,800 | 38,500 | 33 |

ine
This work | AMD XilinxZCU 104 | 3840 × 2160 | 60 | 150 | 30,972 | 21,178 | 16 |

**Table 2.**The use of hardware resources for the CLAHE algorithm on the ZCU 104 board. Note a relatively low resource utilisation of our module. It consumes even less LUTs and Flip-Flops than basic video pass-through, which is needed to provide the input and output images. Therefore, our implementation can be used in combination with other hardware modules of the entire embedded vision system as it consumes only a small part of the available resources. The utilisation of DSP resources is not reported as our implementation of the CLAHE module uses none of them.

Resource Type | Available | Pass-Through | CLAHE Module | Full Algorithm |
---|---|---|---|---|

LUT | 230,400 | 38,097 (17%) | 30,972 (13%) | 68,932 (30%) |

ine Flip-Flop | 460,800 | 44,673 (10%) | 21,178 (5%) | 63,703 (14%) |

ine Block RAM | 312 | 7 (2%) | 16 (5%) | 23 (7%) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kryjak, T.; Blachut, K.; Szolc, H.; Wasala, M. Real-Time CLAHE Algorithm Implementation in SoC FPGA Device for 4K UHD Video Stream. *Electronics* **2022**, *11*, 2248.
https://doi.org/10.3390/electronics11142248

**AMA Style**

Kryjak T, Blachut K, Szolc H, Wasala M. Real-Time CLAHE Algorithm Implementation in SoC FPGA Device for 4K UHD Video Stream. *Electronics*. 2022; 11(14):2248.
https://doi.org/10.3390/electronics11142248

**Chicago/Turabian Style**

Kryjak, Tomasz, Krzysztof Blachut, Hubert Szolc, and Mateusz Wasala. 2022. "Real-Time CLAHE Algorithm Implementation in SoC FPGA Device for 4K UHD Video Stream" *Electronics* 11, no. 14: 2248.
https://doi.org/10.3390/electronics11142248