Attention-Edge-Assisted Neural HDRI Based on Registered Extreme-Exposure-Ratio Images
Abstract
1. Introduction
- An attention-edge-assisted neural framework is proposed to fuse two EER images, which preserves the information of the highlight and shadow areas and avoids halo-like artifacts present in the fusion output.
- To retain the overall contrast structure and simultaneously enhance fine local details of the fused image, we embed the CNNs into the Transformer module as the core component. We parallelly concatenate the dual attention mechanism and residual gradient information into the window-based multi-head attention, taking into account both local and long-range information of the input images.
2. Related Works
3. The Proposed AEAN Method
3.1. Motivation and Overview
3.2. Structure of the Proposed AEAN Method
3.2.1. Global-Level Design
3.2.2. Detail-Level Design
- Multi-scale Spatial Attention. Given a shallow fusion image of two EER images, multi-scale spatial attention (MSA) is designed followed by layer normalization (LN) to further fuse them in a coarse-to-precise manner. As presented in Figure 4, input is first split into three branches, each undergoing initial feature extraction with convolutions of size , , and , respectively. After each layer, average and max pooling are sequentially executed to capture global spatial information at different receptive fields. The pooled features from each branch are concatenated along the channel dimension and then passed through a convolution followed by a Sigmoid activation to generate spatial attention maps corresponding to each scale. The multi-branch attention maps are further concatenated and processed by a convolution, a GELU activation, and another convolution for feature calibration. Finally, a residual connection adds the calibrated features to , producing the enhanced feature . Similar to the algorithms in [11,14], the proposed algorithm is not based on weighted graphs like the algorithms in [12,13]. Therefore, the proposed method and the algorithms in [11,14] can effectively avoid brightness reversal artifacts in the fused images.
- Channel Attention. The channel attention (CA) is inserted in parallel with W-MSA and MSA. As shown in Figure 4, first passes through a convolution. Global context is captured via average pooling and is then processed by another convolution with GELU activation to model non-linear channel dependencies. A third convolution followed by Sigmoid generates channel-wise attention, which is multiplied with to produce the refined output . The whole process is similar to the hierarchical Transformer in [10,16] by reshaping the flattened token sequence back to the spatial grid to refine the local information, so as to reduce the blocking effect caused by the window segmentation mechanism. At the same time, because CA contains global information, the introduction of CA can dynamically adjust the weight of each channel, so that more pixels are activated. Similar to [18] in W-MSA, we use shift W-MSA in continuous CSTB in a periodic manner.
- Edge-assisted Enhancement. In order to achieve local image detail refinement and facilitate global exposure fusion, edge-assisted enhancement (EH) is incorporated in parallel with the transformer module. As shown in Figure 4, input X goes through three convolutions with ReLU activations, with skip connections from the first two to the third layer. The output is . Similar to reference [16], adding detailed features after fusion can sharpen the image and improve the image expression in gradient texture. We use the instance normalization (IN) in [39] on half of the channels, with the other half preserving their original context through identity (ID) mapping.
3.3. Training Procedure and Objectives
- Rebuilding loss. The loss function is defined between the reconstructed pixels in the fused images and the ground truth images as follows:
- Color loss. In the RGB color space, the cosine angle between the three channels can be used to describe the consistency of color. To avoid potential color distortion and to maintain the inter-channel relationships, we utilize color loss as
4. Experiments and Results
4.1. Evaluation of Different MEF Algorithms
- Quantitative Results. To quantify the luminance reversal in the fused result, two indices defined in [26] are calculated for the ten MEF methods. One is , which describes the number of pixel pairs whose brightness order changes. Let and N represent the total number of the fused result, the brightness luminance reversal ratio is given by
- Qualitative Results. In this subsection, we compare the ten algorithms from subjective perspectives. Specifically, the comparison is made in terms of brightness retention, information of the brightest and darkest regions, and scene depth. Figure 6 and Figure 7 show the visualization performance of EER image fusion on datasets [8] and MEFB [41], respectively.
4.2. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yeh, S.F.; Hsieh, C.C.; Cheng, C.J.; Liu, C.K. A dual-exposure single-capture wide dynamic range CMOS image sensor with columnwise highly/lowly illuminated pixel detection. IEEE Trans. Electron Devices 2012, 59, 1948–1955. [Google Scholar]
- Luo, Y.; Mirabbasi, S. A 30-fps 192 × 192 CMOS image sensor with per-frame spatial-temporal coded exposure for compressive focal-stack depth sensing. IEEE J. Solid-State Circuits 2022, 57, 1661–1672. [Google Scholar] [CrossRef]
- Brandli, C.; Berner, R.; Yang, M.; Liu, S.C.; Delbruck, T. A 240 × 180 130 db 3 μs latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 2014, 49, 2333–2341. [Google Scholar] [CrossRef]
- Somanath, G.; Kurz, D. HDR environment map estimation for real-time augmented reality. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 11293–11301. [Google Scholar]
- Tan, X.; Chen, H.; Xu, K.; Xu, C.; Jin, Y.; Zhu, C.; Zheng, J. High dynamic range imaging for dynamic scenes with large-scale motions and severe saturation. IEEE Trans. Instrum. Meas. 2022, 71, 5003415. [Google Scholar] [CrossRef]
- Chen, Y.; Jiang, G.; Yu, M.; Yang, Y.; Ho, Y.S. Learning stereo high dynamic range imaging from a pair of cameras with different exposure parameters. IEEE Trans. Comput. Imaging 2020, 6, 1044–1058. [Google Scholar] [CrossRef]
- Catley-Chandar, S.; Tanay, T.; Vandroux, L.; Leonardis, A.; Slabaugh, G.; Pérez-Pellitero, E. FlexHDR: Modeling alignment and exposure uncertainties for flexible HDR imaging. IEEE Trans. Comput. Imaging 2025, 31, 5923–5935. [Google Scholar] [CrossRef]
- Mertens, T.; Kautz, J.; Van Reeth, F. Exposure fusion. In Proceedings of the Conference on Computer Graphics and Applications, Maui, HI, USA, 29 October–2 November 2007. [Google Scholar]
- Kou, F.; Li, Z.; Wen, C.; Chen, W. Multi-scale exposure fusion via gradient domain guided image fltering. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017. [Google Scholar]
- Li, Z.; Wei, Z.; Wen, C.; Zheng, J. Detail-enhanced multi-scale exposure fusion. IEEE Trans. Image Process. 2017, 26, 1243–1252. [Google Scholar] [CrossRef]
- Ram Prabhakar, K.; Sai Srikar, V.; Venkatesh Babu, R. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; Volume 10, pp. 4714–4722. [Google Scholar]
- Ma, K.; Duanmu, Z.; Zhu, H.; Fang, Y.; Wang, Z. Deep guided learning for fast multi-exposure image fusion. IEEE Trans. Image Process. 2019, 29, 2808–2819. [Google Scholar] [CrossRef]
- Jiang, T.; Wang, C.; Li, X.; Li, R.; Fan, H.; Liu, S. Meflut: Unsupervised 1d lookup tables for multi-exposure image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 10542–10551. [Google Scholar]
- Zheng, K.; Huang, J.; Yu, H.; Zhao, F. Efficient Multi-exposure Image Fusion via Filter-dominated Fusion and Gradient-driven Unsupervised Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2804–2813. [Google Scholar]
- Yang, Y.; Wu, S. Multi-scale extreme exposure images fusion based on deep learning. In Proceedings of the IEEE 16th Conference on Industrial Electronics and Applications, Chengdu, China, 1–4 August 2021; pp. 1781–1785. [Google Scholar]
- Xu, S.; Chen, X.; Song, B.; Huang, C.; Zhou, J. CNN Injected transformer for image exposure correction. Neurocomputing 2024, 587, 127688. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, Y.; Yu, F. HiT-SR: Hierarchical transformer for efficient image super-resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 483–500. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.V.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 1833–1844. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 10012–10022. [Google Scholar]
- Hessel, C.; Morel, J. An extended exposure fusion and its application to single image contrast enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 137–146. [Google Scholar]
- Cai, J.R.; Gu, S.H.; Zhang, L. Learning a deep single image contrast enhancer from multi-exposure image. IEEE Trans. Image Process. 2018, 27, 2026–2049. [Google Scholar] [CrossRef]
- Zhang, J.; Luo, Y.; Huang, J.; Liu, Y.; Ma, J. Multi-exposure image fusion via perception enhanced structural patch decomposition. Inf. Fusion 2023, 99, 101895. [Google Scholar] [CrossRef]
- Li, H.; Ma, K.; Yong, H.; Zhang, L. Fast multi-scale structural patch decomposition for multi-exposure image fusion. IEEE Trans. Image Process. 2020, 35, 5805–5816. [Google Scholar] [CrossRef] [PubMed]
- Jia, W.; Song, Z.; Li, Z. Multi-scale exposure fusion via content adaptiv edge-preserving smoothing pyramids. IEEE Trans. Consum. 2022, 68, 317–326. [Google Scholar] [CrossRef]
- Yang, Y.; Cao, W.; Wu, S.; Li, Z. Multi-scale fusion of two largeexposure-ratio images. IEEE Signal Process. Lett. 2018, 25, 1885–1889. [Google Scholar] [CrossRef]
- Yang, Y.; Wu, S.; Wang, X.F.; Li, Z. Exposure Interpolation for Two Large-Exposure-Ratio Images. IEEE Access 2020, 8, 227141–227151. [Google Scholar] [CrossRef]
- Li, Z.G.; Zheng, C.B.; Chen, B.; Wu, S. Neural-Augmented HDR Imaging via Two Aligned Large-Exposure-Ratio Images. IEEE Trans. Instrum. Meas. 2025, 72, 4508011. [Google Scholar] [CrossRef]
- Wu, K.; Chen, J.; Ma, J. DMEF: Multi-exposure image fusion based on a novel deep decomposition method. IEEE Trans. Multimed. 2022, 25, 5690–5703. [Google Scholar] [CrossRef]
- Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep retinex decomposition for low-light enhancement. Proc. Brit. Mach. Vis. Conf. 2018, 1–12. [Google Scholar] [CrossRef]
- Ma, K.; Zeng, K.; Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, M.; Huang, S.; Wan, W. MIN-MEF: Multi-scale interaction network for multi-exposure image fusion. IEEE Trans. Instrum. Meas. 2024, 44, 502–518. [Google Scholar]
- Vs, V.; Valanarasu, J.M.; Oza, P.; Patel, V.M. Image fusion transformer. In Proceedings of the 2022 IEEE International Conference on Image Processing, Bordeaux, France, 16–19 October 2022; pp. 3566–3570. [Google Scholar]
- Qu, L.; Liu, S.; Wang, M.; Song, Z. TransMEF: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2126–2134. [Google Scholar] [CrossRef]
- Liu, Y.; Yang, Z.; Cheng, J.; Chen, X. Multi-exposure image fusion via multi-scale and context-aware feature learning. IEEE Signal Process. Lett. 2023, 30, 100–104. [Google Scholar] [CrossRef]
- Yang, Y.; Li, Z.; Wu, S. Low-Light Image Brightening via Fusing Additional Virtual Images. Sensors 2020, 20, 4614. [Google Scholar] [CrossRef]
- Debevec, P.E.; Malik, J. Recovering high dynamic range radiance maps from photographs. In Proceedings of the ACM SIGGRAPH, Los Angeles, CA, USA, 3–8 August 1997; pp. 369–378. [Google Scholar]
- Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 182–192. [Google Scholar]
- Karaimer, H.C.; Brown, M.S. Improving color reproduction accuracy on camerass. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6440–6449. [Google Scholar]
- Zhang, X. Benchmarking and comparing multi-exposure image fusion algorithms. Inf. Fusion 2021, 74, 111–131. [Google Scholar] [CrossRef]
Dataset | [8] | [13] | [41] | AVG. | ||||
---|---|---|---|---|---|---|---|---|
Method | ||||||||
Mertens [8] | 3.586 | 1.296 | 0.955 | 0.334 | 3.306 | 1.119 | 2.616 | 0.916 |
Kou [9] | 5.128 | 2.335 | 0.963 | 0.321 | 4.958 | 2.161 | 3.683 | 1.606 |
Li [23] | 3.519 | 1.157 | 1.04 | 0.34 | 3.251 | 1.079 | 2.603 | 0.859 |
PESPD [22] | 1.409 | 0.266 | 0.499 | 0.037 | 1.684 | 0.231 | 1.197 | 0.178 |
Deepfuse [11] | 0.374 | 0.014 | 0.171 | 0.004 | 0.501 | 0.043 | 0.349 | 0.02 |
MEFLUT [13] | 5.076 | 2.319 | 0.278 | 0.054 | 5.165 | 2.618 | 3.506 | 1.664 |
MEFGAN [32] | 0.38 | 0.025 | 0.271 | 0.007 | 0.574 | 0.038 | 0.408 | 0.023 |
MEFNET [12] | 5.224 | 3.411 | 0.722 | 0.437 | 5.637 | 4.225 | 3.861 | 2.691 |
FFMEF [14] | 0.422 | 0.033 | 0.236 | 0.038 | 0.557 | 0.014 | 0.405 | 0.017 |
Ours | 0.219 | 0.021 | 0.127 | 0.019 | 0.091 | 0.008 | 0.146 | 0.016 |
Dataset | [8] | [13] | [41] | AVG. | ||||
---|---|---|---|---|---|---|---|---|
Method | MEF-SSIM | PSNR | MEF-SSIM | PSNR | MEF-SSIM | PSNR | MEF-SSIM | PSNR |
Mertens [8] | 0.8591 | 38.69 | 0.9331 | 41.90 | 0.8614 | 38.35 | 0.8845 | 39.64 |
Kou [9] | 0.8482 | 38.78 | 0.9249 | 43.41 | 0.8364 | 38.21 | 0.8698 | 40.13 |
Li [23] | 0.8701 | 39.72 | 0.9304 | 42.00 | 0.8683 | 37.66 | 0.8896 | 39.79 |
PESPD [22] | 0.8657 | 41.11 | 0.9084 | 46.52 | 0.8773 | 39.52 | 0.8838 | 42.38 |
Deepfuse [11] | 0.8328 | 42.02 | 0.8616 | 35.93 | 0.8818 | 40.94 | 0.8587 | 39.63 |
MEFLUT [13] | 0.7029 | 37.04 | 0.8313 | 43.23 | 0.6323 | 38.41 | 0.7222 | 39.56 |
MEFGAN [32] | 0.8363 | 42.23 | 0.8869 | 41.43 | 0.8884 | 40.35 | 0.8705 | 41.34 |
MEFNET [12] | 0.7948 | 41.03 | 0.9026 | 42.49 | 0.7689 | 38.05 | 0.8221 | 40.52 |
FFMEF [14] | 0.8151 | 39.53 | 0.9173 | 44.27 | 0.8412 | 39.98 | 0.8579 | 41.26 |
Ours | 0.9127 | 40.61 | 0.9439 | 42.39 | 0.9273 | 38.85 | 0.9280 | 40.62 |
EH | MSCA | MEF-SSIM | PSNR | |||
---|---|---|---|---|---|---|
✓ | ✓ | ✓ | 0.4796 | 8.66 | ||
✓ | ✓ | ✓ | ✓ | 0.7766 | 36.98 | |
✓ | ✓ | ✓ | ✓ | 0.8527 | 37.26 | |
✓ | ✓ | ✓ | 0.8834 | 38.13 | ||
✓ | ✓ | ✓ | ✓ | ✓ | 0.9127 | 40.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Gao, S.; Ke, L.; Liu, X. Attention-Edge-Assisted Neural HDRI Based on Registered Extreme-Exposure-Ratio Images. Symmetry 2025, 17, 1381. https://doi.org/10.3390/sym17091381
Yang Y, Gao S, Ke L, Liu X. Attention-Edge-Assisted Neural HDRI Based on Registered Extreme-Exposure-Ratio Images. Symmetry. 2025; 17(9):1381. https://doi.org/10.3390/sym17091381
Chicago/Turabian StyleYang, Yi, Shuangxi Gao, Longzhang Ke, and Xiaojun Liu. 2025. "Attention-Edge-Assisted Neural HDRI Based on Registered Extreme-Exposure-Ratio Images" Symmetry 17, no. 9: 1381. https://doi.org/10.3390/sym17091381
APA StyleYang, Y., Gao, S., Ke, L., & Liu, X. (2025). Attention-Edge-Assisted Neural HDRI Based on Registered Extreme-Exposure-Ratio Images. Symmetry, 17(9), 1381. https://doi.org/10.3390/sym17091381