Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss

Gao, Liang; Tong, Xiaoyun; Zhang, Laixian

doi:10.3390/app151910431

Open AccessArticle

Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss

by

Liang Gao

¹

,

Xiaoyun Tong

^2,3,* and

Laixian Zhang

^2,3

¹

Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China

²

Key Laboratory of Intelligent Space TTC&O Ministry of Education, Space Engineering University, Beijing 101416, China

³

National Key Laboratory of Space Target Awareness, Space Engineering University, Beijing 101416, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10431; https://doi.org/10.3390/app151910431

Submission received: 11 August 2025 / Revised: 31 August 2025 / Accepted: 19 September 2025 / Published: 25 September 2025

Download

Browse Figures

Versions Notes

Abstract

High dynamic range (HDR) imaging aims to overcome the limited dynamic range of traditional imaging systems and achieve effective restoration of the brightness and color of the real world. In recent years, single-image HDR (SI-HDR) reconstruction technology has become a research hotspot due to its simple acquisition process and applicability to dynamic scenes. This paper proposes an improved SI-HDR reconstruction method based on HDRUNet, which systematically integrates channel, spatial attention mechanism, brightness expansion, and color-enhancement branches, and constructs an adaptive multi-component loss function. This effectively enhances the detail restoration in extreme exposure areas and improves the overall color expressiveness. Experiments on public datasets such as NTIRE 2021, VDS, and HDR-Eye show that the proposed method outperforms the mainstream SI-HDR methods in terms of PSNR, SSIM, and VDP evaluation metrics. It performs particularly well in complex scenarios, demonstrating greater robustness and generalization ability.

Keywords:

high dynamic range imaging; single image reconstruction; deep learning; attention mechanism

1. Introduction

High dynamic range (HDR) imaging has become a key technology to overcome the limitations of traditional imaging systems, as traditional imaging systems often fail to capture the complete brightness range present in real scenes. Traditional low dynamic range (LDR) cameras cannot simultaneously retain the details of the brightest and darkest areas, resulting in information loss and visual quality degradation. HDR imaging can solve this challenge by obtaining information from different levels of LDR images and reconstructing images with a wider dynamic range, thereby providing more realistic and visually appealing results. Early HDR techniques mainly relied on multi-exposure fusion (MEF), which involves fusing multiple LDR images captured at different exposure levels to generate an HDR image [1,2]. However, the MEF method is susceptible to motion artifacts, requires precise image alignment, and is limited to dynamic scenes or real-time applications. In recent years, single-image HDR (SI-HDR) reconstruction has gained increasing attention due to its simple acquisition process and suitability for dynamic scenes. Deep learning-based methods have made significant progress in SI-HDR reconstruction, such as U-Net and its variants, which have achieved top performance [3,4,5,6]. However, there are still challenges in recovering details in extreme exposure regions and maintaining color consistency. This study proposes an improved SI-HDR reconstruction method based on HDRUNet. This method reconstructs higher-quality high dynamic range images by utilizing channel and spatial attention mechanisms, brightness expansion, and color-enhancement branches, and introduces an adaptive multi-component loss function. The main contributions are as follows [7,8,9]:

A network architecture that is designed to enhance feature extraction and detail recovery, which is suitable for restoring challenging regions.

An adaptive loss function is introduced that can balance pixel accuracy, structural similarity, and perceptual quality.

Experiments on public datasets show that this method has outstanding performance and stronger robustness.

2. Materials and Methods

2.1. Model Architecture

This section provides a detailed introduction to the architecture, key modules, and innovative points of the single-image high dynamic range (HDR) reconstruction model proposed in this paper. The model is an improvement upon the HDRUNet open-source framework and adopts the U-Net [10] structure. It introduces innovative mechanisms in the network to achieve high-quality HDR reconstruction [11,12,13].

2.1.1. Overall Network Structure

As shown in Figure 1, the proposed high dynamic range reconstruction model framework. This model consists of three main parts: (1) a basic network with a U-Net structure; (2) a conditional network for spatial modulation; and (3) a weighted network for adaptive fusion. The key innovative components include the attention mechanism, the brightness/color-enhancement branch, and the multi-component loss function. This hierarchical structure enables the extraction of content features of different exposure regions, spatial adaptive modulation, and weight fusion in the single-frame LDR image reconstruction task [14,15].

(1): Base Network

The backbone network adopts the U-Net encoder–decoder structure, taking 8-bit low dynamic range images as input and outputting 16-bit preliminary reconstructed high dynamic range images. The encoder part gradually extracts multi-scale features through multiple layers of convolution and downsampling operations, capturing features from low-level edges and textures to high-level semantic information. The decoder part gradually restores spatial resolution through upsampling and convolution operations, and fuses features from each encoder layer to improve the reconstruction quality. To enhance the feature representation ability, the backbone network introduces skip connections between encoder and decoder layers at different stages and widely uses improved residual blocks (Residual Block) in the structure. Each residual block integrates structures such as SFT layers, attention mechanisms (channel/spatial), and noise-aware modules, effectively alleviating the problem of gradient disappearance and improving the convergence speed and generalization ability of the model [16].

Furthermore, the backbone network introduces a brightness expansion and color-enhancement branch between the bottleneck layer and the decoding stage. It performs adaptive enhancement on the brightness and color of low dynamic range images and incorporates a multi-scale wavelet decomposition and reconstruction module, which is used to extract and fuse global and local features at different scales, thereby further improving the detail restoration capability.

(2): Condition Network

For problems such as uneven exposure, large brightness differences, and complex noise distribution in low dynamic range (LDR) images, the model introduces a specially designed conditional network to generate modulation parameters with spatial variation characteristics. This conditional network is based on the input LDR image and outputs a set of conditional maps through multiple layers of convolution and downsampling, which are provided to the subsequent SFT (Spatial Feature Transform) layer for modulation. Through the SFT mechanism, this network can adaptively enhance and suppress different regions according to the spatial distribution and global features of the input image, thereby significantly improving the detail restoration ability of under-exposed and over-exposed regions.

(3): Weighting Network

In order to further enhance the HDR reconstruction capability, the model designs a weighted network to predict a weighted graph, which is used to perform weighted fusion of the preliminary HDR results output by the main network and the input low dynamic range image. This helps the network focus on the areas that are difficult to restore (such as overexposed and underexposed areas), thereby improving the overall reconstruction quality.

The calculation formula for the final output HDR image is as follows:

\hat{Y} = W ⊙ I + G (I)

(1)

Among them, I is the input LDR image, G(I) is the preliminary HDR reconstruction result output by the backbone network, W is the weight map predicted by the weighting network, ⊙ represents element-wise multiplication, and Y is the final reconstructed HDR image.

2.1.2. Basic Network Structure

As shown in Figure 2, the core of the basic U-Net network structure lies in the encoder–decoder architecture, skip connections, and the widely used residual blocks (Residual Block) and spatial feature transformation layers (SFT Layer) [17,18].

In the basic network and conditional network, residual blocks and SFT layers are important building blocks. The residual blocks adopt the idea of skip connections, allowing gradients to flow directly through this structure, which helps to train deeper networks. Their basic structure is as follows: the input features are transformed through several layers and then added to the original input, expressed in the following form:

Fout = F(Fin) + Fin

(2)

Among them, Fin is the input feature, F represents the transformation, including operations such as convolution, normalization, and activation functions, and Fout is the output feature.

Addressing issues such as uneven exposure and large brightness differences in LDR images, the model introduces spatial feature transform (SFT) layers for implementing spatial adaptive processing. SFT layers can perform point-wise linear transformation on intermediate features of the base network according to modulation parameters generated by the condition network, thereby adjusting the scale and offset of features. The formula is as follows:

Fout′ = γ⊙Fin + β

(3)

Among these, Fin represents the intermediate feature to be modulated, while γ and β denote the spatially variable modulation coefficients predicted by the conditional network (corresponding to scale and offset, respectively), and ⊙ denotes element-wise multiplication. Each residual block typically integrates an SFT layer to combine spatial adaptive modulation of features with residual learning, with the following structure: input features are modulated by the SFT layer, then sequentially processed through convolution, normalization, and activation functions; the output features are added to the input features to achieve residual learning, enhancing the network’s expressive capability and training stability [19].

2.1.3. Model Improvements

Based on the original HDRUNet-main, targeting the practical requirements of single-frame HDR reconstruction tasks and the limitations of existing models, this paper has made structural improvements and functional extensions to the model, aiming to further enhance the model’s detail-restoration capability, color-reproduction ability, and generalization performance. The specific improvements are as follows [6,20,21,22,23,24]:

(1): Introduction and Enhancement of Multi-dimensional Attention Mechanisms

In response to the problem that traditional networks pay insufficient attention to key areas during feature extraction, this paper embeds both channel attention and spatial attention in the backbone network, as shown in Figure 3. The channel attention combines the results of global average pooling and maximum pooling and then re-calibrates the weights of each channel adaptively through a shared multi-layer perceptron, enabling the model to highlight the channels with rich information and suppress the channels dominated by noise in HDR scenes. The spatial attention uses multi-scale pyramid convolution to capture spatial statistical information under different receptive fields and generates a spatial weight map, guiding the network to focus on high-contrast edges, detailed textures, and extreme exposure areas. Both attentions are embedded multiple times in the residual blocks and feature fusion modules of each network layer in a residual form, significantly improving the model’s adaptability to complex scenes and reconstruction accuracy [25,26].

Channel attention formula:

F_{c a} = σ (W_{2} δ (W_{1} G A P (F))) ⊙ F

(4)

Among them, F is the input feature, GAP is global average pooling,

W_{1}, W_{2}

is the fully connected layer weight,

δ

is ReLU,

σ

is Sigmoid, and

⊙

is element-wise multiplication.

Spatial attention formula:

F_{s a} = σ ({C o n v}_{1 \times 1} ([{C o n v}_{3 \times 3} (S), {C o n v}_{5 \times 5} (S), {C o n v}_{7 \times 7} (S)])) ⊙ F

(5)

Among them, S is the concatenation of the channel mean and maximum value, [

\cdot]

is the channel concatenation,

⊙

is the element-wise multiplication, and F is the input feature.

(2): Brightness Expansion and Color-Enhancement Branches

In response to the inherent limitations of single low dynamic range (LDR) images in terms of brightness distribution and color representation, especially due to the limited bit depth encoding, resulting in the loss of high and low light details as well as color distortion, the model introduces a brightness expansion and color-enhancement branch, as shown in Figure 4. These two branches work in parallel, aiming to handle brightness and color information more precisely [27].

The brightness expansion branch focuses on restoring and expanding the brightness dynamic range of the image. This branch receives intermediate features or preliminary reconstruction results from the backbone network and fine-tunes the brightness information through a series of specially designed nonlinear transformation layers. The core idea is to dynamically stretch and compress the brightness values based on the brightness distribution characteristics of the input image as needed, effectively restoring the shadow details in underexposed areas and the highlight details in overexposed areas, avoiding the common detail loss or artifacts in traditional methods.

Brightness expansion formula.

The nonlinear transformation based on the Ternovsky transformation adopted in Equation (6) is selected for the reconstruction of high dynamic range images, because this transformation can adaptively stretch and compress the brightness values while maintaining the natural appearance of the image, avoiding the shear artifacts that occur when using a linear transformation in extreme exposure areas:

L_{o u t} = T a n h (γ \cdot (L_{i n} - μ)) + β

(6)

Among them, Lout represents the feature after brightness expansion, Tanh represents the learnable nonlinear transformation, Lin represents the input brightness feature, and γ, μ, and β represent the learnable parameters.

The color-enhancement branch focuses on optimizing the color expressiveness of reconstructed images, correcting color bias, and improving color saturation and realism. Single LDR images often suffer from color information loss or distortion in extreme exposure regions due to quantization errors and sensor saturation. This branch utilizes channel attention, cross-channel information interaction (such as 1 × 1 convolution), and perceptual loss techniques to perform targeted enhancement and adjustment of image color channels, effectively enhancing color saturation, correcting color bias, and improving color reproduction capability, making it closer to the ideal HDR image color distribution. The introduction of the color-enhancement branch compensates for the shortcomings of loss functions that only focus on brightness or structure, making the final HDR reconstruction results more visually appealing and realistic.

Color-enhancement formula:

C_{e n h} = α \cdot C_{i n} + (1 - α) \cdot A t t e n t i o n (C_{i n})

(7)

Among them,

α

is the weight coefficient, Cin is the input color feature, and Attention(Cin) is the result of applying the attention mechanism to the input color feature Cin.

(3): Adaptive Multi-component Loss Function Design

In order to comprehensively evaluate and optimize the HDR reconstruction results, this paper introduces a multi-component adaptive loss function, as shown in Figure 5. This loss function takes into account multiple aspects, including pixel accuracy (reconstruction loss), image structure (structural similarity loss), perceptual quality (perceptual loss), and color consistency (color loss). By assigning adaptive weights to different loss components, the model is guided to learn to generate high-quality and high-fidelity HDR images. Total loss function:

L_{t o t a l} = λ_{r e c} L_{r e c} + λ_{p e r c e p t u a l} L_{p e r c e p t u a l} + λ_{S S I M} L_{S S I M} + λ_{c o l o r} L_{c o l o r}

(8)

Among these, Lrec is the reconstruction loss (e.g., L1 or L2 loss), which measures the pixel-level differences between the reconstructed image and the reference HDR image; Lperc is the perceptual loss, which measures the perceptual similarity of images by extracting high-level features; Lssim is the structural similarity loss, which focuses on capturing the structural information of images to enhance detail restoration capabilities; Lcolo is the color consistency loss, which constrains the color distribution of the reconstructed image to align with that of the reference image; λi is the weight of each loss component, which can be adjusted during training.

(4): Adaptive Noise Perception and Denoising Module

Addressing the issue of noise interference often accompanying LDR images in practical scenarios, this paper introduces an adaptive noise perception and denoising module in the network, as shown in Figure 6. This module works collaboratively through noise estimation and denoising branches, combining brightness information of input images to achieve adaptive suppression of noise levels in different regions. This mechanism significantly improves the model’s robustness and reconstruction quality in low signal-to-noise ratio environments [28,29]. Noise perception feature fusion formula:

F_{d e n o i s e d} = (1 - N_{m a p}) ⊙ F_{i n} + N_{m a p} ⊙ F_{c l e a n}

(9)

Among them,

F_{d e n o i s e d}

is the denoised feature for the final output,

N_{m a p}

is the noise estimation map,

F_{i n}

is the input feature,

⊙

is the element-wise multiplication, and

F_{c l e a n}

is the denoised feature.

(5): Multi-scale (Haar) Wavelet Decomposition and Reconstruction Mechanism

To fully exploit multi-scale structural information of images, this paper introduces wavelet decomposition and reconstruction modules in the feature extraction and fusion stages. Through Haar wavelet multi-scale decomposition of features, low-frequency and high-frequency components are obtained. The decomposed features are processed separately and then reconstructed through the wavelet inverse transform, enhancing the model’s ability to model global structure and local details.

Wavelet decomposition formula:

[W_{0}, W_{x}, W_{y}, W_{z}] = W a v e l e t D e c o m p o s e (F)

(10)

Among them, F represents the input feature map, WaveletDecompose (

\cdot

) represents the Haar wavelet decomposition operation on the feature map, and the output contains four components:

w_{0}

: Low-frequency components (Approximation), which primarily encompass the overall structure and global information of an image, reflecting the core content of the original features.

w_{x}

: Horizontal high-frequency components (Horizontal Detail), which primarily capture edge details and texture information in the horizontal direction of an image.

w_{y}

: Vertical high-frequency components (Vertical Detail), which primarily reflect detail changes in the vertical direction of an image, such as vertical edges.

w_{z}

: Diagonal Detail, which is used to describe image detail characteristics in the diagonal direction.

Through the above decomposition, the model can separately process and enhance feature information at different scales and directions, thereby improving the modeling capability for image details and structure [25].

Wavelet reconstruction formula:

F_{r e c o n} = W a v e l e t R e c o n s t r u c t (W_{0}, W_{x}, W_{y}, W_{z})

(11)

where WaveletReconstruct (

\cdot

) represents the wavelet reconstruction operation, recombining the low-frequency component and high-frequency components in various directions to restore a reconstructed feature

F_{r e c o n}

with the same size as the original feature map. This helps the model enhance its ability to express local details while maintaining the overall structure.

2.2. Tone-Mapping Methods

High dynamic range (HDR) images possess a luminance range that far exceeds the display capabilities of standard devices. Therefore, to enable effective visualization and subsequent processing on conventional low dynamic range (LDR) monitors, tone-mapping techniques are essential to compress the wide luminance information of HDR images into the LDR. The design quality of tone-mapping algorithms directly determines the visual effect and detail representation of HDR images [30,31,32,33,34,35,36].

To improve processing efficiency, model generalization, and achieve real-time capability, this study adopts four mainstream tone-mapping algorithms: Tanh-μ-law, Reinhard, ACES, and Drago. Users can select the optimal method according to practical needs [20]:

(1): Tanh-μ-law Tone Mapping. The Tanh-μ-law method normalizes the HDR image and then compresses the highlights using a hyperbolic tangent function. The μ parameter controls the intensity of highlight compression, and the percentile determines the normalization range. The formula is as follows:

I_{t m} = \frac{l o g (1 + μ \cdot t a n h (\frac{1}{P}))}{l o g (1 + μ)}

(12)

where P is the normalization percentile and μ is the highlight compression parameter. This method is suitable for most scenes, effectively preserving highlight details and enhancing overall contrast.

(2): Reinhard Tone Mapping. The Reinhard method is a classic global tone-mapping algorithm that nonlinearly compresses image luminance to map high dynamic range to low dynamic range. The core formula is as follows:

\begin{matrix} L_{m a p p e d} = \frac{a}{\bar{L}} L \\ L_{o u t} = \frac{L_{m a p p e d}}{1 + L_{m a p p e d}} \end{matrix}

(13)

where a is a key parameter (typically 0.18), and

\bar{L}

represents the logarithmic mean luminance of the image. This method enhances naturalness and gradation, and the gamma parameter adjusts overall brightness.

(3): ACES Tone Mapping. ACES (Academy Color Encoding System) is a standard widely used in the film industry, producing natural colors and rich gradation. The mapping formula is as follows:

I_{t m} = \frac{1 \cdot (a l + b)}{1 \cdot (c l + d) + e}

(14)

where a = 2.51, b = 0.03, c = 2.43, d = 0.59, and e = 0.14. ACES maintains natural color transitions and rich gradation while compressing high dynamic range.

(4): Drago Tone Mapping. The Drago method is suitable for high-contrast scenes, using logarithmic compression of luminance and introducing a bias parameter to control highlight compression. The core formula is as follows:

L_{o u t} = \frac{l o g (1 + b L)}{l o g (1 + b)}

(15)

where b is the bias parameter. This method effectively compresses highlights and enhances shadow details, making it suitable for high-contrast and complex lighting scenes.

The evaluation of tone-mapping effects includes both subjective visual assessment and objective metrics. Subjective evaluation focuses on image gradation, naturalness, and detail representation, while objective evaluation can use color fidelity, contrast preservation, PSNR, SSIM, color accuracy, and dynamic range gain.

2.3. Datasets

Experiments are conducted on several public HDR imaging datasets.

NTIRE 2021: Contains paired LDR-HDR images under various real-world scenes [10]. In this dataset, there are 1494 LDR/HDR pairs for training, 60 images for validation, and 201 images for testing. Note that the LDR/HDR pairs are aligned both in the time axis and exposure level and stored after gamma correction (i.e., they are nonlinear images). Since the ground truths of the validation and testing set are not available, we conduct the experiments only based on the training set. The training set is composed of 1494 consecutive frames in 26 long takes. We randomly select 3 frames in every long take, for a total of 78 frames, as the verification set, and the remaining 1416 frames are used for training.

VDS: Provides multi-exposure images from different devices and scenes [37]. Due to its limited size, this dataset is only used for testing.

HDR-Eye: Includes challenging scenes with complex lighting and exposure conditions [38]. Similarly to VDS, due to its small scale, it is only used for evaluation.

2.4. Implementation Details

(1): The model is implemented in PyTorch 1.9.0 and trained using the Adam optimizer with an initial learning rate of 2 × 10⁻⁴. The batch size is set to 16, and the learning rate is decayed at scheduled epochs. Data augmentation techniques such as random cropping, flipping, and rotation are applied to improve generalization.
(2): Loss Weight Selection: The weight parameters λ₁, λ₂, λ₃, and λ₄ are determined through extensive ablation studies and grid search optimization. The optimal values are as follows: λ₁ = 1.0 (reconstruction loss), λ₂ = 0.1 (perceptual loss), λ₃ = 0.5 (structural similarity loss), and λ₄ = 0.3 (color consistency loss). These weights are selected based on validation performance on the NTIRE 2021 dataset, ensuring balanced contribution from each loss component. The selection process involves testing weight combinations in the ranges: λ₁ ∈ [0.5, 1.0], λ₂ ∈ [0.05, 0.2], λ₃ ∈ [0.3, 0.8], and λ₄ ∈ [0.1, 0.5], with step sizes of 0.1, 0.05, 0.1, and 0.1, respectively. Adaptive weight adjustment: During the training process, the weights will be adaptively adjusted according to the convergence of each loss term. If the convergence speed of a specific loss term is too slow or too fast, its weight will be automatically adjusted by ±10% within every 10 cycles to maintain the balance of the training process.
(3): We employ the Haar wavelet transform with 2-level decomposition. It is inserted into the network after the bottleneck layer and before the decoder output to refine high-level features.
(4): Noise model: The noise estimation branch consists of a combination of signal-dependent Poisson noise and signal-independent Gaussian noise. Noise learning strategy: The noise estimation network consists of five convolutional layers with ReLU activation functions, followed by a global average pooling layer for estimating noise parameters. Then, using a learned noise suppression function, the estimated noise map is used to adjust the feature maps. Insertion strategy: Modules with noise perception capabilities are integrated into each residual block of the encoder path, enabling stepwise noise suppression at multiple scales. Noise estimation is performed at the input layer and is passed to various parts of the network through skip connections to ensure noise-perception processing throughout the network.

3. Results

3.1. Experimental Results

As shown in Figure 7, this section presents visualizations of the results on different datasets to verify the effectiveness and superiority of the proposed single-frame HDR reconstruction method. Representative results on the NTIRE 2021, VDS, and HDR-Eye datasets are shown, with comparisons to the original low dynamic range (LDR) input, the ground truth (GT) HDR images, and the mainstream deep-learning method HDRUNet [39].

From the above visual analysis results, it is clearly evident that the single-frame HDR reconstruction method proposed in this paper demonstrates outstanding performance when processing the three major datasets (NTIRE 2021, VDS, and HDR-Eye). Compared with the existing mainstream deep-learning method, HDRUNet, our method shows significant advantages in multiple aspects.

In terms of dynamic range expansion and detail restoration, our method effectively expands the dynamic range of the image, restores the rich details of the highlights and shadow areas of the image, and avoids information loss caused by overexposure or underexposure, thereby presenting the scene information more comprehensively.

In terms of color fidelity and consistency, the reconstructed HDR images show excellent color restoration performance, achieving a high degree of color consistency with the real scene or real images (GT), avoiding the common color deviations or distortions in traditional methods.

In terms of structure restoration and texture preservation, our method demonstrates outstanding capabilities in restoring complex textures and fine structures, with clear and sharp image edges and rich texture details, significantly improving the visual quality.

In terms of suppressing artifacts, compared with other methods, our method performs better in reducing halos, color overflow, and blurring, enhancing the purity and realism of the image.

Overall, these visual results strongly validate the effectiveness and advancement of the proposed model, especially in structure restoration and color consistency, where the performance is particularly outstanding, providing a high-quality solution for single-frame HDR imaging technology.

3.2. Selection of Tone-Mapping Methods

The project integrates four tone-mapping methods: Tanh-μ-law, Reinhard, ACES, and Drago. As shown in Figure 8, we conducted a visual analysis of these methods and found that Tanh-μ-law and Reinhard are suitable for most medium and short exposure images, while ACES and Drago are more suitable for most long exposure images.

As shown in Table 1, we conducted a quantitative assessment of these four tone mapping methods by comparing three indicators: fidelity (CPR)), color accuracy (ΔE)), and structural similarity (SSIM).

Results show that the Reinhard method achieves the best balance between contrast preservation (CPR = 0.89) and color accuracy (ΔE = 2.1), while ACES provides the highest structural similarity (SSIM = 0.94) for long exposure images.

3.3. Ablation Study

To further analyze the contribution of each module to the overall performance, we designed multiple ablation experiments by removing the multi-scale feature fusion, attention mechanism, luminance extension, and color-enhancement branches [27,40]. The results are shown in Table 2.

The experimental results show that the channel and spatial attention mechanisms, as well as the luminance extension and color-enhancement branches, all effectively improve model performance. The final multi-component loss further optimizes reconstruction quality.

Placement of the attention module: We tested the placement by inserting the attention module (encoder module, bottleneck part, and decoder module) at different positions and found that placing the attention module within the encoder and decoder modules could achieve the best performance.

3.4. Comparison with Mainstream Algorithms

For fair comparison, the PSNR, SSIM, and HDR-VDP3 values of all methods were measured on the same test set (201 images from NTIRE 2021); the input images were uniformly resized to 512 × 512 and normalized to [0,1], and then the PSNR/SSIM values were calculated after applying the Reinhard color mapping; HDR-VDP3 used the official implementation; the experimental environment was fixed as NVIDIA RTX 3090 + PyTorch 1.9.0 to eliminate performance differences related to hardware. For methods with official implementations or pre-trained models (such as Deep Chain HDRI [37], HDRUNet [10], SingleHDR [5], ResNet(L1) [11], Deep SR-ITM [41], etc.), the code was run directly; for methods without official implementations, or those that are inconvenient to implement, the reported values from the original papers were referenced; if the original paper did not report a certain metric or could not be reproduced due to implementation differences, a “-” was marked at the corresponding position. The results are shown in Table 3.

As can be seen from Table 3, the method proposed in this paper outperforms the existing mainstream methods in terms of peak signal-to-noise ratio (PSNR); especially in extreme exposure areas and complex scenes, this method can more effectively restore details and maintain color consistency.

3.5. Summary

This chapter systematically verifies the effectiveness and advancement of the proposed method through ablation studies and comparisons with mainstream methods. The introduction of each module significantly contributes to performance improvement, and the final model achieves excellent reconstruction results on multiple public datasets.

4. Discussion

The experimental results demonstrate that the proposed improved HDRUNet achieves superior performance in both quantitative and qualitative evaluations. (1) The integration of multi-dimensional attention mechanisms (channel attention with global average/max pooling and spatial attention with multi-scale pyramid convolution) enables better focus on critical regions, particularly in extreme exposure areas where traditional methods often fail; (2) The luminance extension branch (using Tanh-based nonlinear transformation) and color-enhancement branch (with channel attention and 1 × 1 convolution) specifically address the inherent limitations of single LDR images in brightness distribution and color representation; (3) The adaptive multi-component loss function (λ₁ = 1.0, λ₂ = 0.1, λ₃ = 0.5, λ₄ = 0.3) provides better optimization guidance by balancing pixel accuracy, structural similarity, perceptual quality, and color consistency; (4) The noise-aware modules (with signal-dependent Poisson and signal-independent Gaussian noise modeling) improve robustness in challenging scenarios with complex noise distributions.

5. Conclusions

This paper addresses issues such as loss of detail in extremely exposed areas and color distortion in single-image high dynamic range (HDR) reconstruction by proposing a deep-learning method based on an improved HDRUNet. This method enhances the model’s ability to capture key features by integrating channel and spatial attention mechanisms into the U-Net backbone network. Additionally, the designed brightness expansion and color-enhancement branches optimize image detail recovery and color consistency, respectively. Furthermore, the introduced adaptive multi-component loss function effectively balances pixel accuracy, structural similarity, and perceptual quality during training.

Comprehensive experimental results on multiple public datasets, including NTIRE 2021, VDS, and HDR-Eye, demonstrate that the proposed method outperforms existing mainstream single-frame HDR reconstruction algorithms in terms of PSNR evaluation metrics. Ablation experiments further validate the effectiveness of each improved module. Subjective visual comparisons also show that the proposed method can generate high-quality HDR images with richer details and more natural colors.

Limitations: Our method has some limitations: (1) Computational cost: This model requires approximately 2.3 GB of GPU memory, and the inference time on NVIDIA RTX 3090 is about 1 s, which may limit its use in real-time applications; (2) Performance on noisy input: Although our noise-aware module improves robustness, this method may still have difficulty handling severely damaged images; (3) Memory requirements: The multi-scale attention mechanism increases memory usage compared to the baseline method.

Future work: Future research directions include the following: (1) Model compression and acceleration for mobile devices to achieve real-time applications; (2) Domain adaptation techniques for different camera sensors and imaging conditions; (3) Integration with edge computing and application in embedded systems; and (4) Extension to video high dynamic range reconstruction in dynamic scenes.

Author Contributions

Conceptualization, L.G., X.T.; methodology, L.G., X.T.; software, L.G., X.T.; validation, L.G., X.T.; investigation, X.T.; resources, X.T., L.Z.; data curation, L.G., X.T.; writing—original draft preparation, L.G.; writing—review and editing, L.G., X.T.; supervision, X.T.; funding acquisition, X.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific and Technological Innovation Cultivation Project of Space Engineering University.

Conflicts of Interest

The authors declare no conflict of interest.

References

ITU-R BT.601; Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios. International Telecommunication Union: Geneva, Switzerland, 2011.
Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Books, Publishing House of Electronics Industry: Beijing, China, 2020. [Google Scholar]
Eilertsen, G.; Kronander, J.; Denes, G.; Mantiuk, R.K.; Unger, J. HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. 2017, 36, 1–15. [Google Scholar] [CrossRef]
Lee, S.; An, G.H.; Kang, S.-J. Deep Chain HDRI: Reconstructing a High Dynamic Range Image from a Single Low Dynamic Range Image. IEEE Access 2018, 6, 49913–49924. [Google Scholar] [CrossRef]
Liu, Y.-L.; Lai, W.-S.; Chen, Y.-S.; Kao, Y.-L.; Yang, M.-H.; Chuang, Y.-Y.; Huang, J.-B. Single-image hdr reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1651–1660. [Google Scholar]
Landis, H. Production-ready global illumination. Siggraph Course Notes 2002, 5, 93–95. [Google Scholar]
Zou, Y.; Yan, C.; Fu, Y. Rawhdr: High dynamic range image reconstruction from a single raw image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023. [Google Scholar]
Conde, M.; Timofte, R.; Berdan, R.; Besbinar, B.; Iso, D.; Ji, P.; Dun, X.; Fan, Z.; Wu, C.; Wang, Z.; et al. Raw image reconstruction from RGB on smartphones. NTIRE 2025 challenge report. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025. [Google Scholar]
Lin, H.-Y.; Lin, Y.-R.; Lin, W.-C.; Chang, C.-C. Reconstructing High Dynamic Range Image from a Single Low Dynamic Range Image Using Histogram Learning. Appl. Sci. 2024, 14, 9847. [Google Scholar] [CrossRef]
Chen, X.; Liu, Y.; Zhang, Z.; Qiao, Y.; Dong, C. HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, USA, 20–25 June 2021; pp. 354–363. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Tan, T.; Gong, Z.; Yang, Z. A High Dynamic Range Image Reconstruction Network Utilizing Feature Pre-alignment. Comput. Eng. 2025, 1–8. [Google Scholar] [CrossRef]
Ding, S. Research on HDR Image Reconstruction Method Based on Deep Learning. Master’s Thesis, Yunnan Normal University, Kunming, China, 2024. [Google Scholar] [CrossRef]
Debevec, P.; Malik, J. Recovering High Dynamic Range Radiance Maps from Photographs. In Proceedings of the SIGGRAPH ‘97: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 3–8 August 1997; pp. 369–378. [Google Scholar]
Nemoto, H.; Korshunov, P.; Hanhart, P.; Ebrahimi, T. Visual attention in LDR and HDR images. In Proceedings of the 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), Chandler, AZ, USA, 4–6 February 2015. [Google Scholar]
Bai, B.; Liu, W. High Dynamic Range Imaging Technology. J. Xi’an Univ. Posts Telecommun. 2020, 25, 63–67+73. [Google Scholar]
Wu, Z. Research on Generation Algorithms for High Dynamic Range Images; Guilin University of Electronic Technology: Guilin, China, 2021. [Google Scholar]
Li, H. Research on High Dynamic Range Image Acquisition Methods Based on Deep Learning. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2021. [Google Scholar]
Deng, Y. Design and Implementation of a High Dynamic Range Image Reconstruction System Based on Multi-Scale Context Awareness; Nanjing University: Nanjing, China, 2020. [Google Scholar]
Huo, Y.; Yang, F.; Brost, V. Dodging and burning inspired inverse tone mapping algorithm. J. Comput. Inf. Syst. 2013, 9, 3461–3468. [Google Scholar]
Ye, N. Research on HDR Imaging Methods Based on Deep Learning. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2020. [Google Scholar]
Chen, A. Research on HDR Imaging Technology Based on Deep Learning. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2022. [Google Scholar]
Zhang, D.; Huo, Y. Generation of High Dynamic Range Images for Dynamic Scenes. J. Comput.-Aided Des. Graph. 2018, 30, 1625–1636. [Google Scholar] [CrossRef]
Zeng, H.; Sun, H.; Du, L.; Wang, S. High Dynamic Range Image Synthesis for Spatial Target Observation. Prog. Laser Optoelectron. 2019, 56, 96–103. [Google Scholar]
Liu, J. Research on High Dynamic Range Image Reconstruction Technology Based on Deep Learning. Master’s Thesis, Xi’an University of Electronic Science and Technology, Xi’an, China, 2023. [Google Scholar]
Wang, H. Research on High Dynamic Range Image Reconstruction and Display Methods Based on Deep Learning. Doctoral Thesis, Tianjin University, Tianjin, China, 2022. [Google Scholar]
Ding, W. Research on Color High Dynamic Range Image Quality Evaluation Methods. Master’s Thesis, Tianjin University, Tianjin, China, 2018. [Google Scholar]
Chen, X. Application of Noise Suppression Technology in the Generation of Single Exposure HDR Images. Audio Eng. Technol. 2024, 48, 42–44. [Google Scholar] [CrossRef]
Anwar, S.; Barnes, N. Real Image Denoising With Feature Attention. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3155–3164. [Google Scholar] [CrossRef]
Wang, X.; Liu, S.; Tian, J. High Dynamic Range Image Reconstruction Based on a Dual Attention Network. Adv. Lasers Optoelectron. 2024, 61, 402–409. [Google Scholar]
Liu, T.; Miao, D.; Bai, Y.; Zhu, Z. Research on Tone Mapping Algorithms for High Dynamic Range Images. Telev. Technol. 2021, 45, 39–45. [Google Scholar]
Huangfu, Z. Research on Feature-Driven High Dynamic Range Image Tone Mapping Algorithms. Master’s Thesis, Henan University of Science and Technology, Luoyang, China, 2020. [Google Scholar]
Miao, D. Research on High Dynamic Range Image Tone Mapping Algorithms. Master’s Thesis, Zhengzhou University, Zhengzhou, China, 2020. [Google Scholar]
Yu, L. Research on High Dynamic Range Image Processing Technology Based on Visual Mechanisms. University of Electronic Science and Technology of China: Chengdu, China, 2020. [Google Scholar]
Cheng, H. Research on Tone Mapping Algorithms for High Dynamic Range Images; University of Chinese Academy of Sciences (Institute of Optoelectronic Technology, Chinese Academy of Sciences): Beijing, China, 2019. [Google Scholar]
Jia, A. Research on Tone Mapping Algorithms for High Dynamic Range Images. Master’s Thesis, Southwest University of Finance and Economics, Chengdu, China, 2019. [Google Scholar]
Sharif, S.M.A.; Naqvi, R.A.; Biswas, M.; Kim, S. A Two-stage Deep Network for High Dynamic Range Image Reconstruction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 550–559. [Google Scholar] [CrossRef]
Nemoto, H.; Korshunov, P.; Hanhart, P.; Ebrahimi, T. Visual Attention in LDR and HDR Images. 2015. Available online: https://infoscience.epfl.ch/server/api/core/bitstreams/e9191d72-e6cd-4fc2-965e-0f91092212de/content (accessed on 18 September 2025).
Fan, K.; Liang, J.; Li, F.; Qiu, P. CNN Based No-Reference HDR Image Quality Assessment. Chin. J. Electron. 2021, 30, 282–288. [Google Scholar]
Tian, H.; Hao, T.; Zhang, H. A brightness measurement method based on high dynamic range images. Prog. Laser Optoelectron. 2019, 56, 188–193. [Google Scholar]
Kim, S.Y.; Oh, J.; Kim, M. Deep SRITM: Joint learning of superresolution and inverse tonemapping for 4k uhd hdr applications. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3116–3125. [Google Scholar]
Chen, X.; Zhang, Z.; Ren, J.S.; Tian, L.; Qiao, Y.; Dong, C. A New Journey from SDRTV to HDRTV. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
Kim, S.Y.; Oh, J.; Kim, M. JSIGAN: GANBased Joint SuperResolution and Inverse ToneMapping with PixelWise TaskSpecific Filters for UHD HDR Video. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11287–11295. [Google Scholar]
Zhang, Z.; Chen, X.; Wang, Y.; Cai, H. HDR Image Reconstruction Algorithm Based on Masked Transformer. Laser Optoelectron. Prog. 2025, 62, 409–420. [Google Scholar]
Bei, Y.; Wang, Q.; Cheng, Z.; Pan, X.; Yang, M.; Ding, D. A HDR Image Generation Method Based on Conditional Generative Adversarial Network. J. Beijing Univ. Aeronaut. Astronaut. 2022, 48, 45–52. [Google Scholar] [CrossRef]
Xu, M.; Xie, W.; Yao, B. Research on HDR Reconstruction Method Based on CycleGAN. Intell. Comput. Appl. 2023, 13, 180–185. [Google Scholar]

Figure 1. Network Structure Diagram.

Figure 2. Basic Network Structure Diagram.

Figure 3. Multi-dimensional Attention Mechanism Structure Diagram.

Figure 4. Brightness Expansion and Color-Enhancement Branch Structure Diagram.

Figure 5. Adaptive Multi-component Loss Function Structure Diagram.

Figure 6. Adaptive Noise Perception and Denoising Module Structure Diagram.

Figure 7. Visualization of processing results.

Figure 8. Tone-mapping results. Note: (a–c) correspond to tone-mapping results for short, medium, and long exposure images, respectively. Note: Tone-mapping results for different scenes. Note: Tone-mapping results for images with different exposures from 3E to 3E.

Table 1. Quantitative Comparison of Tone-Mapping Methods.

Method	CPR	ΔE	SSIM	Processing Time (ms)
Tanh-μ-law	0.85	2.8	0.91	12.3
Reinhard	0.89	2.1	0.93	8.7
ACES	0.87	2.3	0.94	15.2
Drago	0.83	2.9	0.90	11.8

Table 2. Ablation Study Results.

Module Configuration	PSNR (dB)	SSIM	CIEDE2000	DR Gain (dB)
Full Model (Ours)	45.71	0.937	3.41	45.6
w/o Multi-Scale Feature Fusion	43.32	0.915	3.68	44.9
w/o Attention Mechanism	43.08	0.911	3.74	44.7
w/o Luminance and Color-Enhancement	41.85	0.908	3.81	44.3

Table 3. Comparison of different algorithms in terms of PSNR, SSIM, and VDP indicators.

Method	PSNR (dB)		SSIM		HDR-VDP3
Method	m	σ	m	σ	m	σ
Deep Chain HDRI [37]	30.86	2.77	0.9435	0.0369	-
HDRUNet [10]	41.61	3.36	-		-
SingleHDR [5]	32.32	3.27	-		-
ResNet (L1) [3]	39.82	2.21	0.9213	0.0497	8.192	0.441
Deep SR-ITM [41]	43.29	2.81	0.9396	0.0469	8.311	0.568
HDRTV [42]	37.21	3.04	0.9199	0.0295	8.569	0.498
JSI-GAN [43]	37.08	3.36	0.9489	0.0361	8.339	0.559
Transformer-HDR [44]	31.95	4.28	0.9317	0.0593	-
Generative Adversarial Network [45]	44.37	4.07	0.9692	0.0392	-
CycleGAN-HDR [46]	42.89	4.31	0.9450	0.0564	-
Ours	45.71	4.34	0.9579	0.0571	8.716	0.567

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, L.; Tong, X.; Zhang, L. Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss. Appl. Sci. 2025, 15, 10431. https://doi.org/10.3390/app151910431

AMA Style

Gao L, Tong X, Zhang L. Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss. Applied Sciences. 2025; 15(19):10431. https://doi.org/10.3390/app151910431

Chicago/Turabian Style

Gao, Liang, Xiaoyun Tong, and Laixian Zhang. 2025. "Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss" Applied Sciences 15, no. 19: 10431. https://doi.org/10.3390/app151910431

APA Style

Gao, L., Tong, X., & Zhang, L. (2025). Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss. Applied Sciences, 15(19), 10431. https://doi.org/10.3390/app151910431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Architecture

2.1.1. Overall Network Structure

2.1.2. Basic Network Structure

2.1.3. Model Improvements

2.2. Tone-Mapping Methods

2.3. Datasets

2.4. Implementation Details

3. Results

3.1. Experimental Results

3.2. Selection of Tone-Mapping Methods

3.3. Ablation Study

3.4. Comparison with Mainstream Algorithms

3.5. Summary

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI