Multi-Scale FPGA-Based Infrared Image Enhancement by Using RGF and CLAHE

Infrared sensors capture thermal radiation emitted by objects. They can operate in all weather conditions and are thus employed in fields such as military surveillance, autonomous driving, and medical diagnostics. However, infrared imagery poses challenges such as low contrast and indistinct textures due to the long wavelength of infrared radiation and susceptibility to interference. In addition, complex enhancement algorithms make real-time processing challenging. To address these problems and improve visual quality, in this paper, we propose a multi-scale FPGA-based method for real-time enhancement of infrared images by using rolling guidance filter (RGF) and contrast-limited adaptive histogram equalization (CLAHE). Specifically, the original image is first decomposed into various scales of detail layers and a base layer using RGF. Secondly, we fuse detail layers of diverse scales, then enhance the detail information by using gain coefficients and employ CLAHE to improve the contrast of the base layer. Thirdly, we fuse the detail layers and base layer to obtain the image with global details of the input image. Finally, the proposed algorithm is implemented on an FPGA using advanced high-level synthesis tools. Comprehensive testing of our proposed method on the AXU15EG board demonstrates its effectiveness in significantly improving image contrast and enhancing detail information. At the same time, real-time enhancement at a speed of 147 FPS is achieved for infrared images with a resolution of 640 × 480.


Introduction
Visible light sensors can capture high-resolution images with rich textures and detailed information. However, the image quality captured by visible light sensors is greatly affected by the light environment. Poor illumination can degrade the visual image's quality, leading to issues such as glare, smoke, and overexposure. In contrast, infrared imaging technology utilizes differences in infrared radiation intensity for object detection, thereby making it less susceptible to varying lighting and weather conditions [1]. Therefore, infrared image enhancement has become a hot topic in current research and has the potential to bring significant benefits to the field of multi-modal information fusion [2][3][4][5]. Simultaneously, it is extensively employed in fields such as target detection [6][7][8] and medical diagnostics [9,10]. However, atmospheric attenuation, scattering, and refraction can introduce noise in infrared images, resulting in low contrast, reduced signal-to-noise ratio, and blurred details, which greatly affect the detection, recognition, and infrared tracking of targets [11,12]. To satisfy the requirements for practical applications, effective image enhancement algorithms that improve contrast, reduce noise, and address the problem of detail blurring caused by interference must be employed. Infrared image enhancement algorithms can be broadly categorized into three main domains: spatial domain, frequency domain, and convolutional neural networks (CNNs). Spatial domain enhancement algorithms are primarily based on histogram equalization (HE). HE [13] is widely employed to enhance the contrast of infrared images. It alters the distribution of gray values in the original image according to the frequency of each gray level in the image histogram to achieve a uniform distribution. However, because HE is applied to the entire image globally, it may result in noise enhancement and the weakening of image details, thus affecting image quality [14]. In recent years, some improved HE-based algorithms have been proposed, such as brightness-preserving bi-histogram equalization (BPBHE) [15], dualistic sub-image histogram equalization (DSIHE) [16], and minimum mean brightness error bi-histogram equalization (MMBEBHE) [17]. BPBHE divides the histogram of the image into two sub-histograms around the average gray value of the image and then independently equalizes each part to preserve brightness, thus overcoming the challenge of maintaining the same level of brightness throughout the image. DSIHE employs a similar process as BPBHE, although it separates the histogram by median instead of mean. However, DSIHE is suitable only for images that exhibit uniform intensity distribution and has limited effectiveness in preserving the original brightness. MMBEBHE is a variant of BPBHE. It first separates the histogram by using a designated threshold that maintains the minimum mean brightness error between the input and output images. Next, it independently equalizes the two parts. MMBEBHE is an improvement over BPBHE and DSIHE; however, it has limitations in terms of preserving contrast and brightness. Recursive mean-separate histogram equalization [18] and recursive sub-image histogram equalization [19] are the recursive versions of BPBHE and DSIHE, respectively. They provide a flexible way of monitoring the degree of over-enhancement but overly emphasize the mean brightness. Adaptive histogram equalization (AHE) [20] is based on HE; it processes the image in blocks to address the problem of excessive enhancement. However, due to the independent processing of each pixel block, AHE lacks smooth transitions between blocks, resulting in suboptimal visual effects. CLAHE [21], a generalization of AHE, incorporates a threshold to constrain the contrast, thereby mitigating the problem of noise amplification. In addition, CLAHE utilizes bilinear interpolation to optimize the transitions between blocks, thus resulting in a more harmonious appearance. Although HE-based methods can effectively enhance image contrast, they neglect crucial details, resulting in a diminished portrayal of features such as contours and edge textures within the image.
The frequency domain algorithm is primarily based on multi-scale decomposition (MSD). The methods based on MSD represent various types of spatial and frequency domain information of the source image by decomposing it into different layers. Specific fusion rules are then applied to these different layers to obtain the fused layer. This method is widely applied in the fields of image processing and image quality assessment [22][23][24]. To enhance image details, some state-of-the-art technologies have been proposed. For example, in 2009, Branchitta et al. [25] proposed the bilateral filter and dynamic range partitioning (BF&DRP) algorithm, which utilizes a bilateral filter to decompose raw images into two independent components: the base layer (containing the background) and the detail layer (containing the texture). These two components are processed separately and then combined to reconstruct the final output image. As a result, BF&DRP can retain image details while improving the contrast. However, due to the unstable weight of the bilateral filter kernel near strong edges, gradient reversal artifacts appear in the output image. In 2011, Zuo et al. [26] proposed the bilateral filter and digital detail enhancement (BF&DDE) algorithm, which uses an adaptive Gaussian filter to refine the base and detail layer. However, BF&DDE can only diminish the possibility of gradient reversal artifacts rather than avoiding them completely. In 2014, to achieve detail enhancement, Liu et al. proposed GF&DDE [27], which uses a guided image filter to separate the raw image. However, like BF&DDE, GF&DDE cannot completely eliminate gradient reversal artifacts when the image contains strong edges. In 2016, inspired by the joint bilateral filter, Liu et al. proposed an algorithm called JBF&DDE, which calculates the kernel function by using two adjacent images to distinguish detail information from the raw image. This kernel function is sensitive to the gradient structure within the image, which better enables the elimination of gradient reversal artifacts. In 2020, Xie et al. [28] studied infrared thermal imagers and discovered that their overall response model can be described by a double-exponential statistical fitting model. Consequently, they proposed an algorithm called bi-exponential edge-preserving filtering (BEEPS) to enhance the details of infrared images.
With the rapid development of artificial intelligence, image-processing-method-based CNNs have exhibited outstanding performance. Dong [29] first proposed a convolutional neural network for image super-resolution (SRCNN), which can directly learn an end-toend mapping between the low-and high-resolution images. Kim [30] found that increasing network depth can significantly improve accuracy and proposed LLCNN based on SRCNN. Zhang [31] developed a skip-connection-based residual channel attention network (RCAN) for image super-resolution, enabling adaptive learning of crucial channel features and enhancing its expressive capabilities. Kuang [32] incorporated a generative adversarial network (GAN) into the conventional CNN framework and introducing IE-CGAN for enhancing single infrared images. This innovative approach effectively mitigates background noise while simultaneously enhancing image contrast and fine details. Wang [33] proposed an innovative target attention deep neural network (TADNN) to achieve discriminative enhancement in an end-to-end manner. However, in practical applications, the calculation of these methods is complex and time consuming. Therefore, the implementation of the aforementioned algorithm is not very hardware friendly.
Due to their parallel computing capabilities, FPGAs have emerged as a promising platform for accelerating computational tasks. Numerous researchers have achieved significant advancements in infrared image enhancement by using FPGAs. For instance, various FPGA-based methods based on CLAHE have been extensively employed to meet real-time processing requirements. Kokufuta et al. [34] processed the image as a whole instead of dividing it into smaller blocks, thereby avoiding interpolation. Unal et al. [35] proposed a look-ahead mechanism for redistribution and redefined the interpolation step to address issues related to image segmentation and correlation interpolation. Chen et al. [36] proposed the use of a fast guided filter and plateau equalization for accelerated enhancement processing. However, this approach introduces gradient reversal artifacts in regions with strong edges. Although the aforementioned methods can achieve fast infrared image enhancement by using FPGAs, the constraints of the enhancement algorithm affect the performance of the enhanced images.

Rolling Guidance Filter
Multi-scale image decomposition has been extensively employed in the field of infrared image enhancement. The choice of an appropriate decomposition method considerably affects the performance of the enhanced images. Multi-scale image decomposition involves obtaining images with different levels of blurring through filtering. Commonly used decomposition methods include Gaussian filtering, bilateral filtering [37], guided filtering [38], and WLS filtering [39]. However, these filters do not fully address issues related to noise and gradient reversal. The rolling guidance filter employs a rapidly converging iterative approach to achieve rolling guidance and can produce artifact-free results when separating different scale structures [40]. As a result, it does not rely on local denoising methods but controls the level of detail by controlling the number of iterations. As shown in Figure 1, RGF comprises two main steps: small-structure removal and edge recovery.

Removal of Small Structures
First, the small-structure information in the input image is removed using a Gaussian filter. Assuming that p and q are the pixel coordinates of an image, I is the input image, and G is the output image, the result of Gaussian filtering applied to the input image at the central pixel p can be as follows: is used for normalization, and () Np is the set of neighboring pixels of p . The structural scale parameter s  is the standard deviation of the Gaussian filter, and the structures whose scale is smaller than s  are removed completely.

Edge Recovery
Edge recovery involves an iterative approach using joint bilateral filtering (JBF) [41] to iteratively recover the blurred large-scale edge structures. During the iterative pro- In each iteration, a modified guidance image is obtained from the previous output. All input images are set as I . This iterative processing can be defined as follows: where 2 2 is used for normalization, s  and r  respectively control the spatial and range weights, p and q denote the central pixel and neighbor pixels, and 1 t J + denotes the result of the t-th iteration. In this paper, we defined the RGF operation as follows: where out I is the result of the input image in I after undergoing the RGF operation for

Removal of Small Structures
First, the small-structure information in the input image is removed using a Gaussian filter. Assuming that p and q are the pixel coordinates of an image, I is the input image, and G is the output image, the result of Gaussian filtering applied to the input image at the central pixel p can be as follows: where ) is used for normalization, and N(p) is the set of neighboring pixels of p. The structural scale parameter σ s is the standard deviation of the Gaussian filter, and the structures whose scale is smaller than σ s are removed completely.

Edge Recovery
Edge recovery involves an iterative approach using joint bilateral filtering (JBF) [41] to iteratively recover the blurred large-scale edge structures. During the iterative processing, let J 1 = G. In each iteration, a modified guidance image is obtained from the previous output. All input images are set as I. This iterative processing can be defined as follows: where is used for normalization, σ s and σ r respectively control the spatial and range weights, p and q denote the central pixel and neighbor pixels, and J t+1 denotes the result of the t-th iteration.
In this paper, we defined the RGF operation as follows: where I out is the result of the input image I in after undergoing the RGF operation for n iterations. We found that when the number of iterations n reached a large value, the enhanced image exhibited gradient reversal artifacts. However, setting n = 3 resulted in the algorithm achieving the best performance.

Image Enhancement Strategy
The proposed algorithm framework is illustrated in Figure 2. First, the detail layers of the input image are extracted at different scales by using three consecutive rounds of RGF. The output of the third filtering round serves as the base layer. Next, the detail layers from the three scales are merged, and the base layer and detail layers are enhanced. Finally, the enhanced detail layers are combined with the base layer to create an improved infrared image.

Image Enhancement Strategy
The proposed algorithm framework is illustrated in Figure 2. First, the detail layers of the input image are extracted at different scales by using three consecutive rounds of RGF. The output of the third filtering round serves as the base layer. Next, the detail layers from the three scales are merged, and the base layer and detail layers are enhanced. Finally, the enhanced detail layers are combined with the base layer to create an improved infrared image.

Image Decomposition
The smoothing process realized using the RGF can be described as follows: ( , , , ), 1, 2,..., 1 where i D I represents the detail layers obtained after the i-th decomposition, K denotes the decomposition level (in this paper, 4 K = ), and

Detail Layer Enhancement
where coe is the enhancement coefficient ( coe = 3 in this implementation).

Base Layer Enhancement
CLAHE enhances an image by dividing it into multiple sub-blocks and then performing HE on each sub-block. It restricts the degree of contrast enhancement in each subblock, thereby avoiding excessive enhancement and effectively improving image contrast. CLAHE comprises four main steps: image block division and sub-block histogram statistics, sub-block histogram clipping and redistribution, histogram equalization, and pixel interpolation reconstruction.

Image Decomposition
The smoothing process realized using the RGF can be described as follows: where I i B represents the result of the i-th filtering. If σ i+1 s > σ s , σ i+1 r > σ r then the smoothing degree is I i+1 B > I i B . As a result, I i B contains more structural information than I i+1 B . Subsequently, by setting I 0 B = I in , the detailed layers can be extracted using the following relationships: where I i D represents the detail layers obtained after the i-th decomposition, K denotes the decomposition level (in this paper, K = 4), and I K−1 B is the smoothest version of the original image and serves as base layer I B .

Detail Layer Enhancement
where coe is the enhancement coefficient (coe = 3 in this implementation).

Base Layer Enhancement
CLAHE enhances an image by dividing it into multiple sub-blocks and then performing HE on each sub-block. It restricts the degree of contrast enhancement in each sub-block, thereby avoiding excessive enhancement and effectively improving image contrast. CLAHE comprises four main steps: image block division and sub-block histogram statistics, sub-block histogram clipping and redistribution, histogram equalization, and pixel interpolation reconstruction. For reliable statistical estimation, the size of each sub-block is set as W (W = 64 in this implementation) × H (H = 64 in this implementation). Next, the histogram of each sub-block is computed using the following formulas: where n is the gray level, histogram bin, (i, j) are the coordinates of a pixel, h(n) is the histogram value for the n-th bin, and g(n, i, j) is the function that determines whether the value of a pixel I(i, j) is equal to n. A common challenge with standard HE is its tendency to increase the contrast of the sub-regions to the maximum value, resulting in noise amplification. To constrain the contrast of the sub-regions within a certain range and suppress noise, a limiting threshold is introduced, expressed as follows: where β represents the clip limit for each sub-block's histogram, M represents the number of pixels in each sub-block, N represents the number of gray levels in each sub-block, α is the clip factor, S is a parameter used to control the degree of contrast amplification during the contrast limiting process, U and Q respectively represent the mean and variance of each sub-block. These parameters are used in the calculations to determine the suitable limiting threshold β for each sub-block's histogram to achieve effective contrast enhancement while suppressing noise. As shown in Figure 3, the portion above β is clipped and redistributed to the bottom of the histogram.
For reliable statistical estimation, the size of each sub-block is set as W ( W this implementation) × H ( 64 H = in this implementation). Next, the histogram sub-block is computed using the following formulas: where n is the gray level, histogram bin, ( , ) ij are the coordinates of a pixel, the histogram value for the n -th bin, and ( , , ) g n i j is the function that dete whether the value of a pixel ( , ) I i j is equal to n . A common challenge with standard HE is its tendency to increase the contras sub-regions to the maximum value, resulting in noise amplification. To constrain t trast of the sub-regions within a certain range and suppress noise, a limiting thres introduced, expressed as follows: where β represents the clip limit for each sub-block's histogram, M represe number of pixels in each sub-block, N represents the number of gray levels in ea block, α is the clip factor, S is a parameter used to control the degree of contr plification during the contrast limiting process, U and Q respectively repres mean and variance of each sub-block. These parameters are used in the calculat determine the suitable limiting threshold β for each sub-block's histogram to a effective contrast enhancement while suppressing noise. As shown in Figure 3, the above β is clipped and redistributed to the bottom of the histogram. The redistribution algorithm can be represented in the form of pseudocode, as in Algorithm 1.

Algorithm 1: Redistribution process
Input: the histogram value () hn , the clip limit β Output: the histogram value after redistributing () hn The redistribution algorithm can be represented in the form of pseudocode, as shown in Algorithm 1.

Algorithm 1: Redistribution process
Input: the histogram value h(n), the clip limit β Output: the histogram value after redistributing h(n) h [n] += 1; 18. excess −= 1;}}}} excess: the value above the threshold After performing the contrast limiting process to ensure that the sub-blocks of the histogram do not exceed the clip limit, the cumulative distribution function is computed and pixel value equalization is performed to obtain the new pixel values as follows: where (i, j) are the coordinates of the sub-block, M is the number of pixels in each sub-block, N is the number of gray levels in each sub-block, and h i,j (k) is the histogram of the image window with coordinates (i, j). To achieve smoother transitions at block boundaries, interpolation is performed using different methods based on the sub-block's position. As shown in Figure 4, sub-blocks are categorized into three regions: (1) CR represents sub-blocks that have no connections to others and retain the original pixel mapping function; (2) BR sub-blocks, which undergo linear interpolation for mapping; and (3) IR sub-blocks, which undergo bilinear interpolation based on their four nearest neighboring sub-blocks. The final expression is as follows: where I B out represents the result obtained after enhancing the base layer, LT represents linear interpolation, and BT represents bilinear interpolation. The detailed formulas for these two interpolation methods are explained in the Section 4.

Image Reconstruction
To merge the enhanced detail layer and enhanced base layer, the fused image is obtained through inverse transformation as follows: In summary, the decomposition process is accelerated, the details are enhanced, and the noise is suppressed using the RGF. Finally, the base and the detail components are merged, and an output image with excellent performance is generated.

Algorithm Experiment and Analysis
To assess the effect and efficiency of the proposed method, we selected a set of infrared images from the TNO Image Fusion Dataset [42] and the M3FD Dataset [43] for experimentation. These selected test datasets comprised diverse scenes, thus offering a comprehensive challenge for the proposed algorithm. We compared the proposed method with five existing infrared image enhancement methods: traditional infrared image enhancement algorithms HE and CLAHE, guided filter-based infrared image enhancement algorithm GF&DDE, the bi-exponential edge-preserving filter-based infrared image enhancement algorithm BEEPS&DDE, and the CNN-based method IE-CGAN. For these methods, we selected the parameters as advised by the authors or through our experience.

Subjective Analysis
Subjective analysis involves assessing the quality of the enhanced image based on an individual's subjective perception and visual experience. We selected three representative infrared images for a subjective visual evaluation. A high-contrast scene with abundant texture information on rooftops and trees is shown in Figure 5. A scene with strong edges between the person and the surrounding background, which includes mountain peaks with rich textures, is shown in Figure 6. A scene of urban architecture, with tall buildings and towering cranes at great heights, all displaying intricate details, is shown in Figure 7.
The enhancement results obtained using five methods in a high-contrast scene are shown in Figure 5, with the focused areas highlighted by red boxes. In this scene, the HE-based enhancement method improved the contrast of the infrared image but produced overexposure artifacts at the car engine. The CLAHE method effectively enhanced the contrast; however, the details of the houses and trees were not sufficiently prominent. The GF&DDE method performed well in smoothing background noise and enhancing contrast; however, the presence of gain masks caused the smoothing out of some details in the regions of interest. The IE-CGAN method performs well in image denoising but loses some information of the fine details. Compared with the other four enhancement methods, the images processed using the BEEPS&DDE method and the proposed method exhibited rich texture details, such as the abundant leaf details on trees. However, in terms of overall image performance, the proposed method exhibited higher contrast and better representation.
The enhancement effects of the enhancement algorithms on the "thermal" image are shown in Figure 6. The HE algorithm yielded higher overall contrast among all the enhancement algorithms; however, it produced overexposure artifacts on the thermal target, resulting in a considerable loss of fine-grained details in the target (e.g., the person enclosed within the red box lacks discernible details). Although the CLAHE algorithm effectively mitigated overexposure caused by HE and yielded relatively favorable results in terms of contrast enhancement, it struggled in preserving intricate details, consequently resulting in a somewhat blurred perception. The IE-CGAN method enhances the contrast of image but the visual improvement is not very pronounced. In contrast, GF&DDE and BEEPS&DDE effectively improved the overall brightness. GF&DDE slightly outperformed BEEPS&DDE in handling thermal targets, whereas the latter excelled in enhancing texture information, such as shrubs and mountains in the background. The proposed algorithm greatly improved image contrast and exhibited a better effect on detail enhancement and maintenance (e.g., the details of the mountain peaks and the person in the image). Furthermore, the outline of the infrared target was visible without gradient reversal artifacts, thereby demonstrating its excellent visual effect. The enhancement effects of the enhancement algorithms on the "thermal" image are shown in Figure 6. The HE algorithm yielded higher overall contrast among all the enhancement algorithms; however, it produced overexposure artifacts on the thermal target, resulting in a considerable loss of fine-grained details in the target (e.g., the person enclosed within the red box lacks discernible details). Although the CLAHE algorithm effectively mitigated overexposure caused by HE and yielded relatively favorable results in terms of contrast enhancement, it struggled in preserving intricate details, consequently resulting in a somewhat blurred perception. The IE-CGAN method enhances the contrast The enhancement results obtained using the enhancement algorithms on urban scenes are shown in Figure 7. Image processing using HE yielded a very bright image, and a lot of the detailed information about the target scene was lost. CLAHE yielded visually pleasing results but failed to improve the perceptibility of the small details in the image. GF&DDE performed well in noise suppression, whereas BEEPS&DDE excelled in highlighting texture details. Although both algorithms improved the overall brightness to a certain extent, the overall contrast of the image was not high, and the detailed information was not sufficiently prominent, such as the details of the bushes in the lower right corner. In this scenario, the performance of the IE-CGAN method was not satisfactory, which may be attributed to the insufficiency of the training dataset. The proposed algorithm improved the contrast and clarity of different areas of the image to different degrees, such as the edge outline of the tower crane being more explicit and the contrast of the building part being improved. The proposed algorithm yielded an image wherein the details of the scene were highlighted and the visual effect was more realistic. To validate the applicability of the proposed enhancement algorithm across various scenarios, we performed a comparative analysis by evaluating six methods in seven different scenes, such as texture-rich wire fences, streets with numerous thermal targets, and dense forests with intricate texture details. As can be seen from the enhancement results of these methods applied to the seven scenes shown in Figure 8, the proposed method outperformed the other five methods in terms of enhancement performance. The enhancement results obtained using the enhancement algorithms on urban scenes are shown in Figure 7. Image processing using HE yielded a very bright image, and a lot of the detailed information about the target scene was lost. CLAHE yielded visually pleasing results but failed to improve the perceptibility of the small details in the image. GF&DDE performed well in noise suppression, whereas BEEPS&DDE excelled in highlighting texture details. Although both algorithms improved the overall brightness to a certain extent, the overall contrast of the image was not high, and the detailed information was not sufficiently prominent, such as the details of the bushes in the lower right corner. In this scenario, the performance of the IE-CGAN method was not satisfactory, which may be attributed to the insufficiency of the training dataset. The proposed algorithm improved the contrast and clarity of different areas of the image to different degrees, such as the edge outline of the tower crane being more explicit and the contrast of the building part being improved. The proposed algorithm yielded an image wherein the details of the scene were highlighted and the visual effect was more realistic.
To validate the applicability of the proposed enhancement algorithm across various scenarios, we performed a comparative analysis by evaluating six methods in seven different scenes, such as texture-rich wire fences, streets with numerous thermal targets, and dense forests with intricate texture details. As can be seen from the enhancement results of these methods applied to the seven scenes shown in Figure 8, the proposed method outperformed the other five methods in terms of enhancement performance. To validate the applicability of the proposed enhancement algorithm across various scenarios, we performed a comparative analysis by evaluating six methods in seven different scenes, such as texture-rich wire fences, streets with numerous thermal targets, and dense forests with intricate texture details. As can be seen from the enhancement results of these methods applied to the seven scenes shown in Figure 8, the proposed method outperformed the other five methods in terms of enhancement performance.

Objective Analysis
Currently, the field of image processing has become a research hotspot, and assessing the quality of processed images remains a challenge. Quality image assessment (IQA) methods can be categorized into subjective and objective ones [44]. Since the fact that the human visual system is the ultimate recipient of visual signals, subjective evaluation is usually the most accurate and reliable method. However, because subjective test consumes significant resources, it is typically not employed as an optimization metric in practice. Objective quality assessment methods are usually designed or trained using subjective evaluation data. They serve as an ideal approach for timely image performance assessment and optimizing. Objective quality assessment can be divided into traditional metrics such as PSNR, SSIM, MSE, and so on, and emerging metrics such as UCA [45], BPRI [46], BMPRI [47], and so on.
To objectively evaluate the enhancement effects of the different methods in the ten aforementioned infrared scenes, five traditional image evaluation metrics were employed, such as average gradient (AG) [48] and edge intensity (EI) [49], which are based on image features; figure definition (FD), which quantifies the level of detail and distinctness present in the visual content of the image; and root mean square contrast (RMSC) [50], which quantifies the contrast level of the image. These metrics are widely used for evaluating the quality of an image. The evaluation results are presented in Tables 1-4. The average values of all evaluation parameters are presented in Table 5, and the optimal value of each parameter is marked in bold. AG represents the average magnitude of variations in pixel values across the image. A higher AG value indicates that the enhancement effect of this algorithm contains richer gradient information and detailed textures. The AG calculation results are presented in Table 1. The formula for calculating AG is as follows: where (i, j) is a coordinate of the image, and ∂I(i,j) ∂i and ∂I(i,j) ∂j are the horizontal and vertical gradient values, respectively. M and N are the height and width of the image, respectively. EI refers to the strength or magnitude of the edges in the image. A higher EI value indicates that the image has higher contrast and more abundant detail information. The calculation EI results are listed in Table 2. The formula of EI is as follows: where s x (i, j) and s y (i, j) are Sobel operators for the x and y directions, respectively. FD quantifies the level of detail and distinctness in the image. A higher FD value indicates that the image contains high levels of sharpness and visual information. The FD calculation results are presented in Table 3, and the formula for calculating FD is as follows: RMSC is used to evaluate the degree of image denoising and enhancement. The larger the value of RMSC, the higher the contrast of the image. The proposed algorithm yielded a high RMSC value, thus indicating that it effectively increases the contrast of infrared images. The RMSC calculation results are presented in Table 4. The formula for calculating RMSC is as follows: where I is the average intensity of all pixel values of the experiment image.
The average values of the four aforementioned metrics obtained by applying six different methods to enhance ten infrared images are presented in Table 5. These metrics were used to objectively evaluate the performance of each method. As can be observed from the values in Table 5, the proposed method outperformed the others in terms of AG, EI, and FD values, thus indicating its superiority in enhancing image texture details and improving image clarity. However, the enhanced images generated by the proposed method did not have a high RMSC value when compared to other methods. This can be attributed to the adoption of the CLAHE method in the base layer, which effectively maintains the overall contrast within an appropriate range. In contrast, the other two decomposition-based image enhancement methods use the HE method at the base layer, resulting in higher overall contrast but sometimes causing overexposure in certain images. This overexposure results in a relatively poor overall visual perception of the enhanced images. Therefore, it is evident that the proposed method possesses distinct advantages in terms of increasing the contrast and enhancing the edge details compared to the other methods.

Hardware Architecture
To facilitate swift algorithm functionality validation and optimization, we designed and implemented the image enhancement module by using the high-level synthesis (HLS) tool, which can convert high-level programming languages (C/C++) into hardware description languages (HDL/VHDL), thereby elevating the level of abstraction and offering advantages such as shorter development cycles, increased development efficiency, and simplified algorithm hardware implementation.
The hardware architecture of the proposed method is shown in Figure 9. For hardware implementation, we used AXU15EG as the development platform. The heterogeneous architecture includes a processing system (PS) and programmable logic (PL). In the PS, an ARM processor performs system control and scheduling tasks, such as data preprocessing, IP configuration, and image streaming. The PL includes the RGF module and the CLAHE module, which are used for enhancing infrared images. The AXI bus facilitates high-speed communication and data interaction between PS and PL components. Video direct memory access is used for reading infrared images and storing enhanced images. To achieve computational optimization, dataflow instructions are used to optimize the processing flow. These instructions ensure that the intermediate data generated in each processing stage are stored using FIFO buffers. This approach enables parallel processing between the modules, thereby facilitating efficient data handling and promoting parallelization among the processing stages.
Video direct memory access is used for reading infrared images and storing enhanced images. To achieve computational optimization, dataflow instructions are used to optimize the processing flow. These instructions ensure that the intermediate data generated in each processing stage are stored using FIFO buffers. This approach enables parallel processing between the modules, thereby facilitating efficient data handling and promoting parallelization among the processing stages.

RGF Unit Design
The RGF process involves two main steps. First, Gaussian filtering is employed to remove small structures, followed by joint bilateral filtering to restore edges. To achieve a balance between resource allocation and filtering performance, we selected a 5 × 5 filter kernel. The architecture of the RGF is shown in Figure 10. The input pixel data are cached through row buffers. Four row buffers are required to accommodate the 5 × 5 filter kernel, and the data in these buffers are used for calculations within the processing window.
In the first step of RGF, the row calculation unit requires only the original pixel values as the input. The result of Gaussian filtering is then calculated using the 5 × 5 Gaussian filter. Subsequently, in the joint bilateral filtering process, the row calculation unit takes the original pixel values of the input image and the pixel values of the previously computed guidance image. Unlike Gaussian filtering, joint bilateral filtering considers both spatial and grayscale weights, enabling the removal of small structures while restoring large-scale edge information. The five row calculation units produce the results for the current window, which are then sent to Sum2 for accumulation. The normalization coefficient results are sent to Sum1 for accumulation. Finally, the calculation results are divided by the normalization coefficient to obtain the filtered output pixel value. This iterative process is continued until the entire restoration process is completed.

RGF Unit Design
The RGF process involves two main steps. First, Gaussian filtering is employed to remove small structures, followed by joint bilateral filtering to restore edges. To achieve a balance between resource allocation and filtering performance, we selected a 5 × 5 filter kernel. The architecture of the RGF is shown in Figure 10. The input pixel data are cached through row buffers. Four row buffers are required to accommodate the 5 × 5 filter kernel, and the data in these buffers are used for calculations within the processing window.
where ( , ) mn represents the pixel coordinates within a 5 × 5 neighborhood, ( , ) ij represents the coordinates of the center pixel, and guide f represents the guide image.
As can be observed from Equation (18), division and exponentiation operations are required to compute the spatial and range kernel within the processing window. To reduce computational load, the precomputed results can be stored in a ROM, enabling the calculation results to be obtained through LUTs. The row calculation unit design is shown in Figure 11. To ensure high processing speed, we implemented parallel computations for all five row calculation units and their five corresponding cached pixels. In the first step of RGF, the row calculation unit requires only the original pixel values as the input. The result of Gaussian filtering is then calculated using the 5 × 5 Gaussian filter. Subsequently, in the joint bilateral filtering process, the row calculation unit takes the original pixel values of the input image and the pixel values of the previously computed guidance image. Unlike Gaussian filtering, joint bilateral filtering considers both spatial and grayscale weights, enabling the removal of small structures while restoring large-scale edge information. The five row calculation units produce the results for the current window, which are then sent to Sum2 for accumulation. The normalization coefficient results are sent to Sum1 for accumulation. Finally, the calculation results are divided by the normalization coefficient to obtain the filtered output pixel value. This iterative process is continued until the entire restoration process is completed.
The spatial weight and range kernel of the guidance image are denoted as W s and W r , respectively, and their formulas are as follows: where (m, n) represents the pixel coordinates within a 5 × 5 neighborhood, (i, j) represents the coordinates of the center pixel, and f guide represents the guide image. As can be observed from Equation (18), division and exponentiation operations are required to compute the spatial and range kernel within the processing window. To reduce computational load, the precomputed results can be stored in a ROM, enabling the calculation results to be obtained through LUTs. The row calculation unit design is shown in Figure 11. To ensure high processing speed, we implemented parallel computations for all five row calculation units and their five corresponding cached pixels. The spatial weight and range kernel of the guidance image are denoted as s W and r W , respectively, and their formulas are as follows: where ( , ) mn represents the pixel coordinates within a 5 × 5 neighborhood, ( , ) ij represents the coordinates of the center pixel, and guide f represents the guide image.
As can be observed from Equation (18), division and exponentiation operations are required to compute the spatial and range kernel within the processing window. To reduce computational load, the precomputed results can be stored in a ROM, enabling the calculation results to be obtained through LUTs. The row calculation unit design is shown in Figure 11. To ensure high processing speed, we implemented parallel computations for all five row calculation units and their five corresponding cached pixels. For the LUT implementation of the spatial and range kernel, we set σ s = 40, σ r = 0.1. To facilitate computation, we scaled the obtained floating-point results by a factor of 256 and then right-shifted the final output result by eight bits to obtain the desired result.
Because the values of f guide (i, j) − f guide (m, n) lie within the range of [0, 255], we were able to directly determine the results of W r and stored them in a ROM. As can be seen from Figure 12, when the pixel value differences exceeded a certain threshold (in this paper, 86), the corresponding output results tended toward 0. Leveraging this characteristic, we optimized the LUTs by setting the output to 0 for pixel value differences exceeding 86, thereby reducing the amount of data stored in the table by approximately 66%. This optimization greatly minimized the hardware resources required for our approach. optimized the LUTs by setting the output to 0 for pixel value differences exceeding 86, thereby reducing the amount of data stored in the table by approximately 66%. This optimization greatly minimized the hardware resources required for our approach.

CLAHE Unit Design
To meet real-time requirements, the CLAHE algorithm has been designed with a focus on parallel computation and pipelining. The modules are interconnected using hls::stream, which enables data flow between them. By incorporating dataflow directives,

CLAHE Unit Design
To meet real-time requirements, the CLAHE algorithm has been designed with a focus on parallel computation and pipelining. The modules are interconnected using hls::stream, which enables data flow between them. By incorporating dataflow directives, the HLS tool synthesizes the design to enable overlapping execution, thereby maximizing the utilization of available resources and improving the overall throughput.

Histogram Calculation
First, the input image is partitioned into sub-block regions, as shown in Figure 13.

CLAHE Unit Design
To meet real-time requirements, the CLAHE algorithm has been designed with a focus on parallel computation and pipelining. The modules are interconnected using hls::stream, which enables data flow between them. By incorporating dataflow directives, the HLS tool synthesizes the design to enable overlapping execution, thereby maximizing the utilization of available resources and improving the overall throughput.

Histogram Calculation
First, the input image is partitioned into sub-block regions, as shown in Figure 13. In our implementation, the resolution of the infrared images is 640 × 480 pixels. The input image is divided into 12 sub-blocks, each measuring 160 × 160 pixels. Subsequently, histogram statistics are computed for each sub-block. The obtained results are then inputted to the sub-block histogram clipping and redistribution module.

Histogram Clipping and Redistribution
The sub-block histogram clipping and redistribution module is illustrated in Figure  14. The caching and histogram statistics of each sub-block are computed before being fed into the histogram clipping unit, which then calculates the total sum excess of pixel values in the range of 0-255. This excess sum is evenly redistributed across the intervals, and the In our implementation, the resolution of the infrared images is 640 × 480 pixels. The input image is divided into 12 sub-blocks, each measuring 160 × 160 pixels. Subsequently, histogram statistics are computed for each sub-block. The obtained results are then inputted to the sub-block histogram clipping and redistribution module.

Histogram Clipping and Redistribution
The sub-block histogram clipping and redistribution module is illustrated in Figure 14. The caching and histogram statistics of each sub-block are computed before being fed into the histogram clipping unit, which then calculates the total sum excess of pixel values in the range of 0-255. This excess sum is evenly redistributed across the intervals, and the results are stored in a dual-port RAM. This iterative process is continued until the values within each interval no longer exceed the clipping threshold, indicating the completion of the computation. To enhance the processing speed, parallel execution is employed for the sub-blocks, with a dedicated dual-port data cache unit allocated for each one. results are stored in a dual-port RAM. This iterative process is continued until the values within each interval no longer exceed the clipping threshold, indicating the completion of the computation. To enhance the processing speed, parallel execution is employed for the sub-blocks, with a dedicated dual-port data cache unit allocated for each one.

Mapping Function
To enhance the efficiency and optimize on-chip memory usage, a row-based buffering strategy is employed instead of a frame-based approach for the hardware implementation of the CLAHE algorithm. This design addresses the problem of uneven enhance-

Mapping Function
To enhance the efficiency and optimize on-chip memory usage, a row-based buffering strategy is employed instead of a frame-based approach for the hardware implementation of the CLAHE algorithm. This design addresses the problem of uneven enhancements between adjacent image blocks by introducing interpolation between them.
Bilinear interpolation is employed in the interpolation circuit for most sub-blocks, necessitating the caching of mapping functions from the four surrounding sub-blocks. To achieve this, mapping functions of at least two rows of sub-blocks are stored in buffers. In addition, a dedicated buffer is used to seamlessly receive mapping functions for the subsequent sub-block.
As shown in Figure 15, the pipeline incorporates three buffers to enable continuous interpolation operations and thereby enhance the system's operating frequency. The caching procedure follows a three-cycle pattern: Cycle N, Cycle N + 1, and Cycle N + 2. In Cycle N, Line Buffers N and N + 1 store two rows of sub-blocks required for interpolation, and Line Buffer N + 2 caches the mapping functions of the next sub-block. In Cycle N + 1, interpolation calculations are performed for Line Buffers N + 1 and N + 2, and Line Buffer N is cleared to accommodate the data of the next row of sub-blocks. In Cycle N + 2, Line Buffer N + 2 is cleared for caching Line Buffer N + 4 data, and interpolation results for Line Buffers N + 2 and N + 3 are computed. This three-cycle loop is continued until the interpolation process covers the entire image and the final enhanced result is obtained. This approach makes the interpolation operation highly efficient, resulting in improved system performance in terms of operating frequency and optimal utilization of on-chip storage resources in hardware implementations of the CLAHE algorithm.

Interpolation
The pixel interpolation reconstruction module involves two steps. First, the weights are calculated. Next, the interpolation calculations are performed. As shown in Figure 16, different interpolation methods are employed based on the sub-block's position.
For sub-blocks situated in the corners of the image (CR), interpolation is performed using the sub-block's mapping function. For sub-blocks situated along the image edges (BR), linear interpolation is performed using the mapping functions of the two surrounding sub-blocks. For the majority of sub-blocks (IR), bilinear interpolation is performed. First, interpolation is performed in the x-direction by using the following formula:

Interpolation
The pixel interpolation reconstruction module involves two steps. First, the weights are calculated. Next, the interpolation calculations are performed. As shown in Figure 16, different interpolation methods are employed based on the sub-block's position.

Interpolation
The pixel interpolation reconstruction module involves two steps. First, the weights are calculated. Next, the interpolation calculations are performed. As shown in Figure 16, different interpolation methods are employed based on the sub-block's position.
For sub-blocks situated in the corners of the image (CR), interpolation is performed using the sub-block's mapping function. For sub-blocks situated along the image edges (BR), linear interpolation is performed using the mapping functions of the two surrounding sub-blocks. For the majority of sub-blocks (IR), bilinear interpolation is performed. First, interpolation is performed in the x-direction by using the following formula: For sub-blocks situated in the corners of the image (CR), interpolation is performed using the sub-block's mapping function. For sub-blocks situated along the image edges (BR), linear interpolation is performed using the mapping functions of the two surrounding sub-blocks. For the majority of sub-blocks (IR), bilinear interpolation is performed.
First, interpolation is performed in the x-direction by using the following formula: where R 1 = (x, y 1 ) and R 2 = (x, y 2 ). Next, interpolation is performed in the y-direction by using the following formula: Finally, the interpolation result is obtained using the following formula: In the absence of optimization, the interpolation process requires a considerable number of multiplier resources. To enhance the efficiency of the interpolation process and make it more suitable for implementation, the bilinear interpolation formula must be revised. Let the weights in the horizontal and vertical directions be denoted as α and β, respectively: Equation (21) can be transformed as follows: Substituting Equation (23) into Equation (24), the final formula can be simplified as follows: The optimized interpolation unit is illustrated in Figure 17. After optimization, the interpolation unit requires only three multipliers, three subtractors, and three adders. Variables f (Q 11 ) and f (Q 12 ) are obtained from the cached mapping function values of the previous row in the Line Buffer, whereas variables f (Q 21 ) and f (Q 22 ) are obtained from the current row in the Line Buffer. After the completion of computation for each row of sub-blocks, the Line Buffer is updated according to the pattern shown in Figure 15, finalizing the computations for the entire image. The "Weights" component in Figure 15 is a division unit that is used to generate the weights for rows and columns according to the input pixel address, representing the parameters α and β, respectively, in Equation (25). from the current row in the Line Buffer. After the completion of computation for each row of sub-blocks, the Line Buffer is updated according to the pattern shown in Figure 15, finalizing the computations for the entire image. The "Weights" component in Figure 15 is a division unit that is used to generate the weights for rows and columns according to the input pixel address, representing the parameters α and β , respectively, in Equation (25).

FPGA Implementation Results
The proposed algorithm is implemented on the AXU15EG development board with AMD Xilinx Zynq UltraScale+ XCZU15EG-FFVB1156-2-I MPSoC device. Throughout the design process, the utmost care is taken to preserve data accuracy to prevent any significant loss in data precision and ensure that the integrity of the enhanced images remains intact. Resource utilization details of the developed image enhancement module implemented on the FPGA are presented in Table 6. From Table 6, it can be observed that the utilization percentages of BRAM_18K and LUT are relatively higher compared to DSP48E and FF resources. This is attributed to the consideration of real-time applications during the architecture design process. To enhance processing speed, on-chip caching of image data was implemented, resulting in a higher utilization of BRAM_18K resources. As illustrated in Figure 12, weight data was preloaded into the LUT, thereby eliminating a portion of nonlinear operations, leading to higher LUT resource utilization while reducing the utilization of DSP48E resources. Additionally, we simplified the bilinear interpolation algorithm, which can reduce the utilization of DSP48E and FF resources.
As can be observed from the processing speeds achieved using FPGA and PC platforms (Table 7), the image enhancement module exhibited a processing speed of approximately 6.86 ms (147 fps) when operating under a 114 MHz reference clock. In comparison to the processing speed achieved on a PC, the FPGA-based processing speed was approximately 29.4 times faster, thereby enabling nearly real-time output of the enhanced image. The enhanced infrared images of three scenes on the PC and FPGA platforms are shown in Figure 18. Overall, the enhanced images obtained from both platforms exhibited good visual representation. However, due to hardware limitations, there were some differences in the results. Compared to the enhancement results on the PC, the FPGAenhanced images exhibited poorer contrast and detail processing. For instance, in the first scene, the house appeared darker, and the targets within the red boxes in the second and third scenes appeared blurry. Despite a minor precision loss in the FPGA enhancement results, the overall visual representation and enhancement speed of the processed images were within acceptable ranges.

Discussion
In the field of infrared image enhancement, enhancement quality and speed are of the utmost importance. Many advanced algorithms have been proposed for improving the performance of infrared images. However, their high computational complexity results in decreased enhancement speed. Thus, achieving a balance between image enhancement quality and speed to meet real-time application requirements remains a challenge. Our research has a very broad range of applications, such as military security, medical diagnostics, and autonomous driving, making it highly meaningful. Furthermore, in future research, we can apply this technology to multimodal image fusion techniques and other areas within the field of image processing.

Conclusions
In this paper, we proposed a novel method for infrared image enhancement and implemented it on an FPGA. Compared with other enhancement methods, the proposed method exhibits superior performance in enhancing details, improving contrast, and reducing gradient reversal artifacts. In the proposed method, first, the image is decomposed into a base layer and multiple detail layers of different scales by using the RGF. Detail enhancement factors are used for the detail layers, whereas CLAHE is used for the base layer. Finally, the enhanced images from each layer are fused, yielding an image with globally enhanced details from the input image. For deploying the proposed algorithm on an FPGA, we adopted a parallel dataflow approach for image processing and strived to The average metrics of the enhanced images obtained using different platforms in the three aforementioned test datasets are presented in Table 8. FPGA's enhancement results were inferior to those of PC in terms of all four metrics: AG, EI, FD, and RMSC. The objective analysis results were consistent with the subjective analysis results, thus indicating that FPGA's enhancement results suffer only minor losses in texture and detail information along with a decrease in contrast, thereby resulting in an overall performance reduction in the enhanced images. From the results, it can be concluded that FPGA achieves a good balance between enhancement effectiveness, resource consumption, and enhancement speed.

Discussion
In the field of infrared image enhancement, enhancement quality and speed are of the utmost importance. Many advanced algorithms have been proposed for improving the performance of infrared images. However, their high computational complexity results in decreased enhancement speed. Thus, achieving a balance between image enhancement quality and speed to meet real-time application requirements remains a challenge. Our research has a very broad range of applications, such as military security, medical diagnostics, and autonomous driving, making it highly meaningful. Furthermore, in future research, we can apply this technology to multimodal image fusion techniques and other areas within the field of image processing.

Conclusions
In this paper, we proposed a novel method for infrared image enhancement and implemented it on an FPGA. Compared with other enhancement methods, the proposed method exhibits superior performance in enhancing details, improving contrast, and reducing gradient reversal artifacts. In the proposed method, first, the image is decomposed into a base layer and multiple detail layers of different scales by using the RGF. Detail enhancement factors are used for the detail layers, whereas CLAHE is used for the base layer. Finally, the enhanced images from each layer are fused, yielding an image with globally enhanced details from the input image. For deploying the proposed algorithm on an FPGA, we adopted a parallel dataflow approach for image processing and strived to minimize the utilization of hardware resources. The proposed method yielded enhanced images with excellent expressiveness, with each image having a resolution of 640 × 480 pixels, achieving a processing speed of 147 fps. Due to its real-time processing capability, the proposed method offers a feasible solution for real-time scenarios.
Author Contributions: Methodology, X.Z. and Y.L.; algorithm validation, FPGA design and implementation, and writing-original draft preparation, J.L., X.Z. and X.Y.; writing-review and editing, Z.W., W.H. and Y.L.; research content analysis, software debugging, and formula analysis, Z.W., R.H. and Y.L. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.