1. Introduction
Both highlight and shadow regions perceptible to the human eye cannot be simultaneously captured by cameras through sole adjustment of exposure duration and ISO sensitivity. This limitation is fundamentally attributed to the exceptionally broad luminance range spanning from complete darkness to maximum brightness in natural environments, wherein the preservation of both highlight and shadow details is inevitably compromised by data acquisition systems with limited bit depth [
1,
2,
3,
4,
5]. Researchers have conducted explorations in different fields from the perspectives of both hardware design and software technology [
6]. While it is true that designing sensors capable of capturing higher bit-depth images can solve the problem at its root, such image sensors also imply significantly higher hardware costs. On the other hand, the HDR (High Dynamic Range) fusion algorithm enables the integration of image information with a wide dynamic range into a single image through systematic integration of pre-captured images containing highlight details and shadow details.
Conventional HDR fusion algorithms are generally divided into two stages: the initial acquisition of multiple images with varying exposure durations, followed by computational processing of these captured frames. The majority of conventional HDR fusion methodologies primarily focus on the second stage. For instance, Mertens et al. [
7] computed perceptual quality metrics (e.g., saturation, contrast) at each pixel across multi-exposure sequences and employed these metrics to drive exposure blending through weighted fusion. Liu et al. [
8] employed dense Scale-Invariant Feature Transform (SIFT) descriptors as activity-level metrics to extract localized structural details from multi-exposure source images. Li et al. [
9] decomposed input images into base layers (encoding large-scale luminance variations) and detail layers (capturing high-frequency textures) via two-scale decomposition, followed by a guided filtering-based weighted fusion framework to preserve spatial coherence during layer integration. However, these conventional methods remain constrained by their reliance on pre-captured images with varying exposure durations and timings, rendering them incapable of real-time video stream processing. Furthermore, inevitable object motion during the imaging process results in imperfect spatial registration of pre-acquired images, leading to ghosting artifacts in the fused composite image. Conventional ghosting artifacts suppression algorithms address this issue at the expense of computational efficiency yet merely achieve partial suppression rather than complete elimination. In terms of achieving similar HDR effects, Z. Li et al. [
10] proposed a method for automatic exposure correction based on a single input image. However, when parts of the input image are completely black or white, the corrected image cannot fully recover these areas due to the lack of suitable input data for reference. This limitation is inherent to approaches that rely on a single input image. In addition, to achieve favorable HDR fusion results, deep learning approaches have been emerging continuously. Among them, Lucas Nedel Kirsten et al. [
11] propose MobileMEF, a novel method for multi-exposure fusion based on an encoder–decoder deep learning architecture with efficient building blocks tailored for mobile devices. Although this method can process 4K-resolution images in less than 2 s, deep learning approaches still remain overly complex and challenging to deploy when real-time processing of more than 40 frames of images to be fused is required.
Clear HDR revolutionizes image acquisition by simultaneously capturing dual-exposure frames—defined by a High Conversion Gain (HCG) and a Low Conversion Gain (LCG)—that share an identical exposure duration and are taken at the same precise moment [
12]. This approach completely solves the inherent problems of ghosting artifacts and real-time processing constraints. However, a key limitation of their approach was its inability to account for read noise amplification during processing, thus failing to resolve the persistent banding artifacts in the final image. Furthermore, they neglected environmental illumination variations caused by scene dynamics in video stream acquisition, thereby failing to eliminate overbrightness or underexposure phenomena in composite outputs. Simultaneously, similar to conventional HDR algorithms, this method remains constrained by the requirement to maintain consistent bit depth before and after fusion, which significantly limits its applicability in professional fields that necessitate the preservation of original image data for post-processing workflows.
An optimized algorithm, named the Buffer Optimization Algorithm, is proposed to leverage Clear HDR capability for synthesizing dual co-exposure images (HCG and LCG outputs) through adaptive parameter optimization. A feedback mechanism based on post-fusion metrics—such as mean luminance and read noise distribution—is designed to dynamically adjust fusion weights. This closed-loop control framework enables real-time HDR video stream processing with enhanced dynamic range preservation and computational efficiency, while ensuring robust performance across varying lighting conditions. The design of this adaptive mechanism also constitutes one of the core objectives of the present study.
3. Experimental Results
To validate the effectiveness and real-time capability of the proposed adaptive Clear HDR fusion algorithm, a series of experiments were conducted on an FPGA-based hardware platform. The Efinix T35F324I4 FPGA platform is adopted in this work as a case study, where fused HDR image data is transferred to a host computer via USB 3.0 interface for real-time visualization. The FPGA architecture employs Verilog HDL (Hardware Description Language) for the following computational modules. The Sony IMX664 CMOS image sensor serves as the front-end image acquisition module, featuring a Clear HDR mode that enables dual concurrent output of HCG and LCG image frames within a single readout cycle. The image bit depth is 12 bits in this case, i.e., N = 12. Based on the prevalence of the 16-bit image format, the case of M = 16 is adopted here, where the 12-bit data is upconverted to 16-bit depth.
To accommodate the limitation of standard displays, which are typically capable of rendering only 8-bit depth, all 16-bit images were processed in Adobe Photoshop with a standardized parameter set (detailed below) to ensure correct display. Adobe Photoshop 2022 is utilized as the software for post-processing. After opening the image in the software, go to the menu bar and select “Image > Adjustments > HDR Toning” [
15]. Then, process the image with the following settings: Method: Local Adaptation; Edge Glow: Radius = 1 pixel, Strength = 0.1; Tone and Detail: Gamma = 1, Exposure = 0, Detail = 0%; Advanced: Shadow = 0%, Highlight = 0%, Vibrance = 0%, Saturation = 0%; Toning Curve and Histogram: Default (no adjustment). The uniformity of these parameters ensured that no additional variables were introduced. In contrast, such processing was omitted for the 12-bit LCG and HCG source images, as the fusion quality assessment remains unaffected by their display accuracy on 8-bit monitors.
The IMX664 sensor’s registers are first configured via the Inter-Integrated Circuit (I2C) protocol to activate its Clear HDR mode, enabling simultaneous output of HCG and LCG image streams. The fusion curve is optimized by modulating the core parameters governing Clear HDR synthesis to achieve perceptually refined HDR output. The processed image data is subsequently transferred via USB 3.0 hardware interface to a host computer for real-time visualization and analysis.
The proposed adaptive algorithm was evaluated and compared against the HDR fusion algorithms by Liu [
8], Li [
9], and Mertens [
7], along with the method by Xu [
12] that also employs the Clear HDR algorithm.
Figure 3a and
Figure 3b, respectively, show the original LCG and HCG images of the garage entrance night scene, which were simultaneously output by the sensor and used for HDR fusion. As can be observed, in the LCG image, while the details of the illuminated wall are well-preserved, the remaining areas are too dark, resulting in a loss of detail. In contrast, the HCG image has an overall higher brightness and retains most of the information in the darker regions. However, the illuminated wall area is overexposed, leading to washed-out details. Individually, neither of these two images can satisfactorily represent the actual scene.
For the scene shown in
Figure 3,
Figure 4a presents the garage entrance night scene generated by our proposed HDR fusion algorithm, while (b)–(e)
Figure 4b–e display the results obtained by the algorithms of Li, Liu, Mertens, and Xu, respectively. All algorithms were fed with the same 12-bit LCG and HCG source images shown in
Figure 3, producing 16-bit outputs that were uniformly converted to 8-bit for correct display on standard monitors. The proposed HDR fusion result appears more visually natural, as the fused data preserves the original sensor response curve, thereby retaining greater flexibility during HDR tone mapping. In contrast, the algorithms by Li and Liu exhibit unnatural shadow regions overall, with noticeable halo artifacts particularly along edges between bright and dark areas, as highlighted in the red boxes in
Figure 4b,c. The Mertens’ algorithm shows significant improvement over Li and Liu, with relatively smooth brightness transitions. However, severe banding artifacts appear at the bottom of the image as highlighted in the red box in
Figure 4d. The fusion algorithm by Xu relies on a fixed ratio between high and low gain levels, limiting its adaptability to scenarios with varying gain relationships. As a result, it fails to adequately suppress overexposure as highlighted in the red box in
Figure 4e.
To evaluate the robustness of these HDR fusion algorithms, we introduced additional test scenarios.
Figure 5 and
Figure 6 present the HDR fusion results generated by our algorithm and the comparative algorithms for two distinct scenarios: a camera with a lit screen and an interchange landscape, respectively.
As shown in
Figure 5, Liu’s algorithm exhibits unnatural shadow regions overall, with noticeable halo artifacts along both the upper edge of the image and the top contour of the camera. Li’s algorithm shows some improvement in reducing halo effects around the camera, but the brightness balance between the ambient lighting and the camera appears inconsistent, resulting in an unnatural representation of the main subject. Additionally, obvious halos are present around the content displayed on the camera screen. Mertens’ algorithm produces more natural results compared to the former two; however, color distortion is observable, and banding artifacts occur in red-boxed area. Li, Liu and Mertens’ algorithms demonstrate unnatural transitions at the junction of the screen borders. The issue with Xu’s method persists, as its reliance on fixed fusion parameters leads to inadequate suppression of overexposed regions and loss of background context. In contrast, the HDR image generated by our algorithm successfully preserves the camera’s external features, on-screen content, and background information without introducing any halo artifacts or color cast. This demonstrates that our algorithm achieves satisfactory HDR fusion results for the camera scene.
As shown in
Figure 6, halo artifacts along brightness transition regions remain a major issue for both Liu’s and Li’s algorithms. Although Mertens’ algorithm avoids halo artifacts, the white balance of the fused image is noticeably distorted—a phenomenon not observed in other traditional HDR fusion algorithms. Xu’s algorithm successfully suppresses overexposure in the tree areas, but due to its fixed fusion parameters, overexposure in the building regions is not adequately controlled. In contrast, the HDR image produced by our algorithm successfully preserves the darker vegetation in the foreground, as well as distant high-rises and mountains, while introducing no halo artifacts, color cast, and maintaining smooth brightness transitions. These results demonstrate that our algorithm also achieves satisfactory HDR fusion performance for the interchange landscape scene.
Based on
Figure 4,
Figure 5 and
Figure 6, the algorithms proposed by Li, Liu, and Mertens demonstrate satisfactory performance in the garage entrance night scene, but their fusion results become increasingly unnatural in the camera and interchange landscape scenes. The primary reason is that these algorithms impose stringent requirements on the source images: details in the scene must be well preserved in the input exposures to achieve a natural fused result. When the LCG image is too dark to adequately represent highlight details, or when the HCG image is too bright to capture shadow details effectively, these conventional HDR fusion methods produce visibly artificial outcomes. For instance, in the Camera scene, the background brightness captured in the LCG image is significantly darker than the camera body brightness recorded in the HCG image. This discrepancy prevents conventional algorithms from achieving smooth transitions during fusion. Similarly, in the Street scene, the dark trees captured in the HCG image are considerably brighter than the buildings recorded in the LCG image, leading to unnatural transitions in the fusion results produced by conventional methods. In contrast, the proposed method leverages the intrinsic relationship between the LCG and HCG sources, enabling effective utilization of both underexposed and overexposed image data to generate a more natural fused image.
Subjective observation can only provide a general assessment of the merits of a fusion algorithm based on personal preference, while analyzing the objective parameters of images before and after fusion across three scenarios can offer a concrete basis for comparing and selecting algorithms. Among these, the metricMI [
16] parameter reflects the amount of information in the fused image derived from the input image sequence, indicating, to some extent, the extent to which the fusion algorithm utilizes and retains the original data. As shown in
Table 1, a higher metricMI indicates that the fused image contains more information derived from the input image sequence. The values in bold represent the best results for each scenario. By comparing the five algorithms horizontally, it can be observed that the proposed method achieves nearly the best metricMI across all three scenarios. This is attributed to its ability to more effectively utilize overexposed or underexposed images in the fusion process.
Based on the fundamental fusion model, the adaptive fusion model proposed in this paper can dynamically adjust fusion parameters in real-time according to scene brightness, thereby ensuring the capture of complete image information for fusion.
Figure 7 illustrates a scenario where the brightness of the subject changes, simulated by adjusting the screen brightness of the camera. The varying brightness levels of the LCG fusion sources represent different subject brightness levels, with screen luminance of 15 Lux, 45 Lux, and 100 Lux from top to bottom. Faced with changing subject brightness, the algorithm adaptively increases
KHCG to produce fused images with appropriate brightness—specifically,
KHCG values of 118.75, 36.875, and 18 from top to bottom. As shown in
Figure 7, despite noticeable variations in the LCG (a, d, g) and HCG (b, e, h) caused by different screen brightness levels, our adaptive algorithm achieves favorable HDR fusion results in all cases, as evidenced in
Figure 7c,f,i. Both the pre-fusion source images and the resulting HDR images demonstrate that, without altering the exposure time, adaptive adjustment of
KHCG enables the acquisition of suitable source data for generating high-quality fused images.
As illustrated in Equation (3), a buffer is introduced between the threshold
TH and 2
N − 1 to mitigate the HDR image banding artifacts caused by amplified noise discontinuities during the transition from HCG to the amplified LCG. A comparative experiment is conducted evaluating the algorithm with and without this buffer component to validate the functionality of the buffer layer.
Figure 8a and
Figure 8b display the HDR fusion images of a corn slice without and with the buffer module, respectively. By magnifying the identical region within the central red box, it is clearly observed that the result from the algorithm without the buffer exhibits noticeable banding artifacts, whereas the result incorporating the buffer appears significantly smoother. This demonstrates that the introduction of the buffer effectively mitigates the banding artifacts in the HDR fused image caused by noise discontinuities.
Unlike the relatively uniform lighting conditions in microscopy, everyday imaging scenarios often involve complex lighting variations, making banding artifacts more likely to occur and more visually noticeable.
Figure 9 and
Figure 10 present the HDR fusion images of a roof (outdoor scene) and a mug (indoor scene) without and with the buffer module, respectively. As shown in
Figure 9, it can be observed that the banding artifacts in the canopy area (marked by the red box on the left) and the road surface (marked by the green box on the upper right) are significantly improved. The highlighted area within the red box in
Figure 10a exhibits noticeable banding and color shift at the transition between high-gain and low-gain regions. After applying the buffer zone, these artifacts are significantly reduced, and the desktop color remains consistent across the transition, as shown in
Figure 10b. These examples collectively demonstrate that introducing a buffer effectively mitigates the image banding artifact, thereby creating the condition and possibility for subsequent image processing.
Finally, A runtime comparison was also performed to ensure a comprehensive evaluation of all algorithms. For algorithms implemented and deployed on an FPGA, the execution follows a pipelined architecture. Therefore, the runtime depends solely on the image size and the operating clock frequency of the FPGA chip. Let
H and
W denote the number of pixels along the height and width of the image, respectively,
c denote the number of channels, and
f denote the clock frequency. Then, the execution time
t of the fusion algorithm on the FPGA can be expressed as [
12]
To ensure fairness in the comparative experiments, both Xu’s method and the proposed method in this paper adopted identical parameters, i.e., an image size of 2688 × 1520 pixels, the sensor operating in 4-channel mode, and an FPGA operating frequency of 100 MHz. By substituting these data into Equation (10), the execution time for both Xu’s method and the proposed method is calculated as 0.0102144 s per frame, which demonstrates 97.83% faster processing compared to conventional methods (Mertens, Liu, and Li) as benchmarked in
Table 2. In other words, our proposed HDR fusion algorithm is capable of processing 2688 × 1520 resolution video streams at 46 frames per second, thus enabling real-time video processing. This breakthrough performance is attributed to both FPGA-optimized parallel computing architecture and a streamlined Clear HDR fusion pipeline requiring.