Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping

Park, Jiyun; Song, Byung Cheol

doi:10.3390/electronics11152436

Open AccessFeature PaperArticle

Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping

by

Jiyun Park

^1,2 and

Byung Cheol Song

^2,*

¹

LX Semicon, Seoul 06763, Korea

²

Department of Electrical and Computer Engineering, Inha University, Incheon 22212, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(15), 2436; https://doi.org/10.3390/electronics11152436

Submission received: 28 June 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 4 August 2022

(This article belongs to the Topic Computer Vision and Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Recent studies on inverse tone mapping (iTM) have moved toward indirect mapping, which generates a stack of low dynamic range (LDR) images with multiple exposure values (multi-EV stack) and then merges them. In order to generate multi-EV stack(s), several large-scale networks with more than 20 M parameters have been proposed, but their high dynamic range (HDR) reconstruction and multi-EV stack generation performance were not acceptable. Also, some previous methods using cycle consistency should even have trained additional networks that are not used for multi-EV stack generation, which results in large memory for training. Thus, this paper proposes novel cyclic learning based on cycle consistency to reduce the memory burden in training. In detail, we eliminated networks used only for training, so the proposed method enables efficient learning in terms of training-purpose memory. In addition, this paper presents a lightweight iTM network that dramatically reduces the network sizes of the existing networks. Actually, the proposed lightweight network requires only a small parameter size of 1/100 compared to the state-of-the-art (SOTA) method. The lightweight network contributes to the practical use of iTM. Therefore, the proposed method based on a lightweight network reliably generates a multi-EV stack. Experimental results show that the proposed method achieves quantitatively SOTA performance and is qualitatively comparable to conventional indirect iTM methods.

Keywords:

cyclic learning; inverse tone mapping; lightweight network

1. Introduction

With the rapid development of deep learning, a lot of methods for reconstructing a high dynamic range (HDR) image from low dynamic range (LDR) image(s) have been proposed [1,2,3,4]. They can be largely divided into two categories. The first one is the multiexposure fusion (MEF) approach [5,6,7] in which LDR images of different exposure values (EVs) are acquired and merged to generate a single HDR image. Conventional MEF methods often suffer from ghost artifacts due to moving object(s) while acquiring multiple LDR images with parallax. The second one is so-called inverse tone mapping (iTM) [8,9,10,11,12,13,14], which reconstructs an HDR image using only a single LDR image.

Meanwhile, the iTM approach is again classified into direct iTM and indirect iTM. Direct iTM is literally a one-to-one tone mapping between LDR and HDR [8,10,14]. Whereas direct iTM uses only one LDR, indirect iTM synthesizes LDRs of multiple EVs (multi-EV stack) from a single LDR and merges them to generate an HDR [9,11,12,13]. For example, deep chain HDRI [12] employs a strategy to allocate a subnetwork for each target EV. As such, the subnetworks tend to increase as much as the number of target EVs, which can be computationally burdensome. On the other hand, DrTMO [9], deep recursive HDRI [11], and deep cycle HDRI [13] generate a multi-EV stack by using EV up/down networks in consideration of the increasing/decreasing direction toward a target EV. In addition, [15] proposed generating HDR images using LDR video sequence information. As iTM methods develop, the network parameter size has also increased. Unfortunately, HDR reconstruction performance has not improved as much as the increased network size. What all the techniques mentioned so far have in common is that information on various EV images is required to generate an HDR image from a single LDR.

A recent work based on cycle consistency [13] should have trained four networks for only two networks used for multi-EV stack generation. We concentrate on reducing the memory size of the network for multi-EV stack generation in terms of practical usage. In other words, this paper proposes memory-efficient learning. In addition, the burdensome network size of previous work makes practical usage of HDR difficult. First, we propose a cyclic learning using cycle consistency. Inspired by the concept of cycle consistency [16], a simultaneous training mechanism based on alternating optimization is proposed. The proposed mechanism can reduce the memory size required for learning by half compared to the existing cycle consistency-based method [13] while maintaining the advantages of cycle consistency. Second, we propose a much lighter network than conventional large-scale networks with more than 50 M parameters. The proposed lightweight network has a size of only 1/100 that of the state-of-the-art (SOTA) network [14].

The main contributions of this paper are summarized as follows.

We propose a new learning method, i.e., cyclic learning, to train EV up/down networks with less training memory than existing multi-EV stack generation networks based on cycle consistency.
This paper demonstrates the practical applicability of deep learning-based iTM by presenting a lightweight network structure compared to existing iTM methods while maintaining reliable performance.

This paper is organized as follows. Section 2 describes the related work to understand the proposed method. Section 3 describes in detail the core elements of the proposed method. Section 4 shows experimental results through comparison with existing methods. Finally, Section 5 concludes this paper, and in the Appendix A and Appendix B, experimental results not covered in the main body are additionally explained.

2. Related Work

2.1. Direct Inverse Tone Mapping

Direct iTM methods focus on extracting as much information as possible from a single LDR. For instance, HDRCNN [8], the first deep learning-based iTM, generated an HDR image by linearly combining the reconstructed saturated region and the image to which the inverse camera response function (CRF) was applied. ExpandNet [10] fused various features extracted through several branches. Still, such an LDR-to-HDR mapping in a one-to-one fashion inevitably faces restrictions on the domain information available in the saturated regions.

Recently, SingleHDR [14] dissected the LDR-generation process into several steps by reversing the image formation pipeline and assigning a neural network to each step of the inverse process. SingleHDR outperformed previous direct iTM methods, but its insufficient luminance reconstruction performance is a problem, together with halo artifacts.

2.2. Indirect Inverse Tone Mapping

To solve the aforementioned intrinsic problems of direct iTM, indirect iTM methods tried to generate LDRs of multi-EVs, i.e., multi-EV stack. DrTMO [9] generated a multi-EV stack through EV up/down networks and merged the generated LDRs using Debevec’s algorithm [17]. Deep chain HDRI [12] set a total of six target EVs from EV −3 to +3 based on an input image of EV 0, and then configured a subnetwork per EV. Then, an HDR image was generated by applying Debevec’s algorithm, in the same way as DrTMO, to the multi-EV stack obtained from subnetworks. Deep recursive HDRI [11] produced a multi-EV stack by recursively operating the EV up/down networks considering only the increasing/decreasing direction of a target EV. The subsequent deep cycle HDRI [13] introduced cycle consistency to promote stability of multi-EV stack generation in GAN-based training of deep recursive HDRI.

However, previous indirect iTM approaches have limitations, such as halo artifacts and color distortion, even with considerable parameter size. In addition, the existing learning methods using cycle consistency are inefficient in terms of training-purpose memory, because they require an auxiliary network that is used only for learning.

2.3. Inverse Tone Mapping with Cycle Consistency

The concept of cycle consistency came from the CycleGAN architecture [16]. Cycle consistency is a concept that an image output from the first generator could be used as input to the second generator, and the output of the second generator should match the original image. The reverse is also true. Cycle consistency loss is defined by

L_{c y c l e} = E_{x} | | G_{2} (G_{1} (x)) - x {| |}_{1} + E_{y} | | G_{1} (G_{2} (x)) - {y | |}_{1}

(1)

where

x

and

y

indicate an input and the ground truth (GT), respectively. In this paper,

x

and

y

correspond to the input LDR and the LDR of the target EV, respectively.

G_{1} : x \to y

and

G_{2} : y \to x

stand for two different generators with opposite objectives. In this paper, they each corresponds to an EV up/down network.

x \to G_{1} (x) \to G_{2} (G_{1} (x)) \approx x

is called forward-cycle consistency, and

y \to G_{2} (y) \to G_{1} (G_{2} (y)) \approx y

is called backward-cycle consistency. Both networks are updated and trained at the same time, but

G_{2}

is used as an auxiliary tool to help

G_{1}

learn.

For convenience, let the learning mechanism using cycle consistency [16] like Deep cycle HDRI [5] be called cycle learning. If cycle learning is substituted for the EV up/down network,

G_{1}

can be regarded as the EV up network, and

G_{2}

can be regarded as the EV down network. When training the EV up network, cycle learning trains the EV down network simultaneously, but this EV down network is not used in the inference phase. The opposite is also true, i.e., cycle learning requires four trained networks to generate a multi-EV stack.

In practice, using four networks to train only two networks is a waste of memory. Therefore, as a learning strategy to solve this problem, we propose a novel learning method that does not require any auxiliary network, even though it is using cycle consistency. In other words, the proposed method not only trains two networks simultaneously with cycle consistency but also generates a multi-EV stack using both EV up network and EV down network in the inference phase (see Figure 1). Therefore, the proposed learning, named cyclic learning, is more efficient than cycle learning in terms of training-purpose memory. We cover the detailed explanation of cyclic learning in Section 3.1.

3. Methods

The operation of conventional indirect iTM methods is illustrated in Figure 2. By recursively applying the trained EV up/down networks, a multi-EV stack with EV

i

is generated, where

i

= −3, −2, · · ·, 2, 3, and LDR images are merged by Debevec’s algorithm [17]. Section 3.1 depicts the concept of cyclic learning for EV up/down networks and the training procedure. Section 3.2 describes the lightweight architecture of EV up/down networks and the details of loss functions for learning.

3.1. Cyclic Learning

If cycle consistency is applied to the multi-EV stack generation step, the noise amplification problem that may occur in the underexposed/overexposed regions can be mitigated [5]. Although the previous cycle learning uses only two networks to generate multi-EV stacks, it should train a total of four networks in the learning phase. In practice, the conventional learning method using cycle consistency used an auxiliary EV down network for training the EV up network and an auxiliary EV up network for training the EV down network. However, the auxiliary networks were not used in the inference phase. The proposed cyclic learning trains only two networks, and these are both used in the inference phase. In other words, the proposed cyclic learning can mitigate the memory-waste problem during the learning phase. This section describes the proposed cyclic learning in detail.

To eliminate unnecessary memory from the learning process of cycle learning, we propose a new learning method for EV up/down networks. When applying the existing concept of cycle learning to iTM, the contradictory objectives of increasing and decreasing EVs are both required for the inference phase. In addition, since the existing cycle learning method requires an auxiliary network for each of the two networks, they are inefficient in terms of memory. Therefore, we propose a way to simultaneously train two networks with cycle consistency and use both networks in the inference phase.

Before explaining the proposed method in detail, let us consider a naïve way first. The EV up network for increasing EV is trained to generate an LDR of EV

i + 1

from an LDR of EV

i

, and the EV down network is trained in the opposite direction. Then, assume that EV up/down networks are simultaneously updated and trained according to Equation (1). If an input image goes through the EV down network after the EV up network, the EV down network will apply the re-estimation process to the image estimated by the EV up network. Here, the update of the EV down network may cause undesirable training, resulting in such problems as over-sharpness in the output image. For example, in the somewhat dark image of EV−3 in Figure 3, we observe pixel-wise saturation due to over-sharpness. If such images are included in the merging process, the same phenomenon may even occur in the HDR image.

To solve this problem, we propose updating two networks alternately in a single iteration and training them simultaneously (see Figure 4). Figure 4a shows the updating process of the EV up network. Here, the EV down network uses the weights of the previous iteration as they are. Training for updating the EV up network is based on two losses: (1) The EV up network loss is defined by the distance between the inferred LDR of EV i + 1 and the GT of EV i + 1, and (2) the forward-cycle consistency loss is defined by the distance between the re-inferred LDR of EV i (derived from the EV down network) and the GT of EV i. On the other hand, the training of the EV down network in Figure 4b is based on (1) the EV down network loss and (2) the backward-cycle consistency loss.

Section 3.2 describes the mathematical definitions of the aforementioned losses. Also, Section 4.4 experimentally proves that the proposed cyclic learning realizes more effectively cycle consistency than the conventional cycle learning.

3.2. Lightweight Network

Conventional indirect inverse tone mapping (iTM) methods have more than 20 M parameters for exposure value (EV) up/down networks [8,9,11,12,13,14]. As such, in order to build a baseline toward the lightweight network, we adopt WDSR-A residual block (WARB) [18] which is known to have good cost-effectiveness in the super-resolution field. The original WARB was used with weight normalization (WN) [19]. It is known that WN is effective at high learning rates, but does not work at low learning rates. This results in an increase in learning time. To solve this problem, we adopt WARB without WN. The proposed method does not suffer from the so-called convergence problem, because it is based on residual learning and the network depth is not so deep. In addition, the proposed method solves the noise amplification problem by introducing cycle consistency into learning. This consequently alleviates the overfitting phenomenon. Therefore, we can safely remove the WN layer while pursuing both lightweight and cyclic learning.

To further increase the HDR reconstruction performance, we place a luminance compensation module in front of the network (see Figure 5). This structure is based on the fact that the task of increasing/decreasing EV can be divided into compensation of global luminance and restoration of details. If the global luminance of an input image is properly compensated in advance, the subsequent network can concentrate on reconstructing the lost information or details during training. In this paper, the luminance compensation module is implemented in a simple and intuitive way, i.e., the average of the luminance differences of images with an EV gap of 1 is defined as a single parameter, and this is learned. The trained parameter, i.e., a sort of luminance offset is added to each input image. Note that the parameter is positive in the EV up network and negative in the EV down network.

Thus, as shown in Figure 5, the EV up/down network is composed of one luminance compensation module, five WARBs without WN (WARB-noWN), and two 3 × 3 convolution layers to adjust the number of input/output channels. The total losses for training the EV up/down networks are defined by

L_{t o t a l}^{u} = (L_{p i x}^{u} + λ_{1} L_{g d}^{u}) + λ_{2} (L_{p i x}^{f} + λ_{1} L_{g d}^{f}), L_{t o t a l}^{d} = (L_{p i x}^{d} + λ_{1} L_{g d}^{d}) + λ_{2} (L_{p i x}^{b} + λ_{1} L_{g d}^{b})

(2)

As mentioned in Section 3.1, a total loss is composed of EV up/down network loss (

L_{*}^{u}, L_{*}^{d}

) and forward/backward consistency loss (

L_{*}^{f}, L_{*}^{b}

) for cycle consistency. Each loss is again composed of a pixel-wise L1 loss (

L_{p i x}^{*}

) and a gradient difference loss (

L_{g d}^{*}

) [20] to prevent blur phenomena in output images. Two coefficients

λ_{1}

and

λ_{2}

were experimentally set to 0.2 and 0.8, respectively. Note that

L_{*}^{u}, L_{*}^{d}

and

L_{*}^{f}, L_{*}^{b}

are composed in the same way.

L_{*}^{u}

and

L_{*}^{d}

are calculated based on the distance between network output and GT. Also, the pixel-wise losses of forward/backward consistency are defined by

L_{p i x}^{f} = E_{x, y} | | D (U (x)) - {x | |}_{1} L_{p i x}^{b} = E_{x, y} | | U (D (y)) - {y | |}_{1}

(3)

The gradient difference losses of forward/backward consistency are defined by

L_{g d}^{f} = E_{x, y} \sum_{u, v} [| | \nabla_{u} x - \nabla_{u} D (U {(x) | |}_{1} + | | \nabla_{v} x - \nabla_{v} D (U {(x) | |}_{1}] L_{g d}^{b} = E_{x, y} \sum_{u, v} [| | \nabla_{u} y - \nabla_{u} U (D {(y) | |}_{1} + | | \nabla_{v} y - \nabla_{v} U (D {(y) | |}_{1}]

(4)

where

U (\cdot)

and

D (\cdot)

represent EV up/down networks, respectively.

x

and

y

mean LDR images of EV

i

and EV

i + 1

. Here, the input of the EV down network is

y

. Also,

u

and

v

refer to the vertical and horizontal directions, and

\nabla_{u}

and

\nabla_{v}

stand for the vertical and horizontal gradients.

4. Experimental Results

4.1. Experimental Setup

For learning EV up/down networks, we adopted Fairchild [21], an open dataset composed of multi-EV stack GTs with an EV gap of 1. The training-purpose Fairchild dataset has a total of 105 multi-EV stacks photographed through a Nikon DX2. That is, it consists of 105 × 7 images. Each image in the dataset was cropped to 512 × 512 during training and was used in patch units. The learning rate was set to 5 × 10⁻⁵, and an Adam optimizer [22] with

β_{1} =

0.5,

β_{2} =

0.999 was used. The batch size was set to 2, and the number of training epochs was set to 10.

As the test dataset for the following experiments, we used 41 images from HDR-eye [23], which were mainly used by previous iTM methods [11,12,13]. The generated multi-EV stacks were merged by Debevec’s algorithm [17], commonly used in conventional indirect iTM methods [9,11,12,13] for HDR reconstruction. In the inference phase, an input image was also resized to 512 × 512.

4.2. Quantitative Results

First, let us quantitatively evaluate the HDR reconstruction performance of the proposed method. Among the direct iTM, the indirect iTM, and the cycle learning-based tone mapping, we chose a few methods that can be directly compared with the proposed method. However, ideal quantitative evaluation of the generation performance of a multi-EV stack is impossible due to the limitations of the dataset. As such, we employed an HDR-VDP Q score [14,24], which is a representative metric for evaluating HDR reconstruction performance. In addition, PSNRs and SSIMs of the images that were tone mapped (

γ = 2.2

) by [25,26] were compared. In general, the PSNR values tend to be low overall because tone mapping causes significant contrast changes. However, since it is true that PSNR improvement is a measure of image quality improvement to some extent, PSNR is also adopted as a metric in this paper. HDR images are greatly affected by TMO and may also cause distortion. We adopted two TMOs for comparison with conventional methods. Results using additional TMOs are provided in Appendix A.

Table 1 shows the quantitative evaluation results. The results of deep chain HDRI [12] and deep cycle HDRI [13] were quoted from the papers’ figures as they are, and the rest from open results. It is noteworthy that the proposed method shows the best HDR reconstruction performance in terms of HDR-VDP Q score. The proposed method has a Q score of 0.7 or higher than deep cycle HDRI [13], which is the best among the existing methods. Also, the proposed method is the best, even in terms of PSNR and SSIM for both TMOs. Furthermore, the proposed method has a smaller parameter size than the SOTA iTM methods [13,14]. For example, it is only 1/100 of deep cycle HDRI [13]. Therefore, the proposed method provides SOTA performance in terms of HDR reconstruction, keeping almost the minimum parameter size.

4.3. Qualitative Results

This section qualitatively evaluates the HDR reconstruction and the multi-EV stack generation of the proposed method. We compared the tone mapped images generated by Reinhard’s method [25] of HDR-Toolbox [27]. Let us take a look at the first example at the top of Figure 6. Here, magnified images are given together for clear comparison. DrTMO [9] and deep recursive HDRI [11] failed to reconstruct the saturated region(s), causing artifacts as a whole. Other methods also showed halo artifacts at the boundary between trees and sky. On the other hand, we can observe that the proposed method is robust to artifacts and also reconstructs the saturated region successfully.

While the first example showed the reconstruction performance in the saturated region, the second example at the bottom of Figure 6 is presented for evaluating the reconstruction of the dark region and the quantization artifacts. For example, DrTMO [9] and HDRCNN [8] caused quantization artifacts remarkably. ExpandNet [10], deep recursive HDRI [11], and SingleHDR [14] showed relatively few quantization artifacts, but were not satisfactory in terms of reconstruction of dark areas. However, the proposed method reconstructed dark areas, which are close to GTs, well and hardly experienced quantization artifacts. As mentioned earlier, the images in Figure 6 were tone mapped by Reinhard’s TMO [25] for demonstration purposes.

Next, let us evaluate the multi-EV stack generation performance of the proposed method. For this experiment, we compared the proposed method with deep recursive HDRI [11], which is available among conventional indirect iTM methods [11,12,13]. For an input image (top of Figure 7), the EV up/down network created images of EV−3, −2, −1, +1, +2, +3 from left to right. Images of EV−3, −2, −1, [11] showed stairs as if they were the sky and gradually reconstructed their color tone to sky blue. On the other hand, the proposed method reconstructed the stairs, properly maintaining the texture. Also, the proposed method reconstructed the sculpture area more successfully than [11]. Even for the images of EV+1, +2, +3, the proposed method generated stable brightness without losing color in walls and flowers.

Therefore, the proposed method shows comparable qualitative performance, even with the significantly small network size. In particular, the proposed method provides outstanding performance in terms of artifact robustness in both HDR reconstruction and multi-EV stack generation. More qualitative results are given in the Appendix A and Appendix B.

4.4. Ablation Study

4.4.1. Effect of Each Component of the Proposed Method

This section analyzes the effect of each technique applied to the proposed method on overall performance. Table 2 quantitatively shows the effect of applying cyclic learning, WN presence in WARB, and luminance compensation module one by one. The baseline in this experiment is a model consisting of EV up/down networks with only 5 WARBs (see the first row of Table 2). Compared to the conventional methods in Table 1, it is worth noting that the proposed baseline was already competitive in performance. This experimentally proves that WARB is a network suitable for generating multi-EV stacks. First, when cyclic learning was applied to the baseline, the HDR reconstruction performance was improved by about 0.09 (see the second row). This is because cycle consistency was well maintained in the training process. Next, when WN is removed from WARB, the overall performance was additionally improved by as much as 0.21 (see the third row). This is because the network could learn diverse luminance distributions. Finally, when the luminance compensation module was applied, an additional performance improvement of 0.12 was obtained (see the last row). As a result, the proposed method achieved a Q score improvement of 0.42 compared to the baseline.

4.4.2. Realization of Cycle Consistency

This section evaluates whether the proposed cyclic learning realizes cycle consistency more successfully than conventional cycle learning. In this experiment, the proposed cyclic learning was compared with cycle learning using cycle consistency as well as the baseline that did not use cycle consistency. Figure 8 shows images of EV 0, +1, +2 for each method. EV 0, +1, +2 in Figure 8 were regenerated by the EV down network from EV +1, +2, +3, which were generated by the EV up network. By comparing the images sequentially passing through the EV up/down networks with the GTs, we could evaluate how well each method maintains cycle consistency. Let us look at EV 0 of Figure 8. The baseline is significantly different from the GT, especially in the sky area. Cycle learning reconstructed a sky region similar to GT than the baseline. However, cycle learning caused many artifacts. On the other hand, the proposed cyclic learning was most similar to GT, and no artifacts were observed.

5. Conclusions

This paper proposes cycle consistency-based learning for inverse tone mapping. With the proposed cyclic learning, the EV up/down networks can be learned simultaneously without wasting memory in training. In addition, we present a lightweight network requiring only 1/100 of parameters of the existing SOTA network. Experimental results show that the proposed method provides quantitatively SOTA performance in terms of HDR reconstruction and is also competitive in terms of subjective visual quality.

6. Future Work

This paper proposes a method to generate a stable multi-EV stack with a small network size for weight purposes. It is very important to expand the information about the limited LDR input in the inverse tone-mapping process. Therefore, as a tool to generate a virtual LDR image, the one presented in Yang et al. [28] can be employed. Meanwhile, the merging process still has a limitation in that it uses a rule-based method in the same way as the existing methods. If even the merging module can be replaced with a deep learning-based network, such as in [29], further technological advancements can be achieved.

Author Contributions

The work described in this article is the collaborative effort of all authors. All authors contributed to data processing and designed the algorithm. All authors made contributions to data measurement and analysis. All authors participated in the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an Inha University grant (65324).

Acknowledgments

The authors would like to thank the support of Inha University.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

For demonstration, all HDR images in this paper were tone mapped by Reinhard’s tone mapping operator (TMO). Since tone mapped HDR images differ depending on TMO, comparison results for each TMO are required. Figure A1 and Figure A2. show three examples. For each method, a probability map (P map) and the results for three TMOs [25,26,30] are given. The probability of detecting a difference from GT at each pixel of the P map is the visibility metric according to HDR-VDP 2.2.1 [24]. The larger the difference from GT, the closer the P map is to red, and the smaller the difference from GT is, the closer the P map is to blue.

Figure A1. HDR reconstruction performance comparison (a–g). Here, all images are tone mapped by three TMOs and gamma corrected for demonstration.

Figure A2. HDR reconstruction performance comparison (a–g). Here, all images are tone mapped by three TMOs and gamma corrected for demonstration.

In Figure A1 and Figure A2, tone mapped HDR images differ slightly according to TMO. Note that even though the tone mapped image is similar to the tone mapped GT, there may be many blue regions in the P map. From these three examples, we can find that the proposed method provides comparable performance from the perspective of tone mapped HDR.

Appendix B

This section evaluates multi-EV stack generation performance in terms of the perception index [31] (PI), which is a subjective quality evaluation index. PI is a no-reference metric defined for subjective quality evaluation. In Table A1, the tendency of the proposed method is different from that of deep recursive HDRI [11] for bright and dark images. When evaluating the multi-EV stack generation performance, no full-reference metric can be used because GTs of HDR-eye multi-EV stacks are not available. Note that a no-reference metric, such as PI, can provide additional information.

Table A1. Comparison of perception index (PI) for multi-EV stack generation.

Method	EV−3	EV−2	EV−1	EV+1	EV+2	EV+3
[12]	4.59	3.72	3.38	2.94	2.87	3.01
Ours	4.17	3.63	3.33	3.19	3.30	3.52

Also, we present more comparison results for multi-EV stack generation. As in the paper, Figure A3 and Figure A4 show EV−3, −2, −1, +1, +2, +3 images generated by each method, and provide magnified ROIs. In all three examples, the proposed method generates a multi-EV stack more stably than deep recursive HDRI [11]. Here, stability has different meanings in EV up/down networks. In the EV−3, −2, −1 images that are generated by the EV down network, stability means that no artifacts are generated in the saturated regions. On the other hand, in EV+1, +2, +3 images generated by the EV up network, stability means that colors never change or stains are not generated. From this point of view, we can find that the proposed method outperforms deep recursive HDRI in terms of multi-EV stack generation performance.

Figure A3. Multi-EV stack generation performance comparison. The center image is an input, the left three images are the results from the EV down network, and the right three images are the results from the EV up network. EV increases from the far left (EV−3) to the right. Please refer to the magnified regions below each example for detail.

Figure A4. Multi-EV stack generation performance comparison. The center image is an input, the left three images are the results from the EV down network, and the right three images are the results from the EV up network. EV increases from the far left (EV−3) to the right. Please refer to the magnified regions below each example for detail.

References

Liu, Z.; Lin, W.; Li, X.; Rao, Q.; Jiang, T.; Han, M.; Liu, S. ADNet: Attention-guided deformable convolutional network for high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 463–470. [Google Scholar]
Chen, X.; Liu, Y.; Zhang, Z.; Qiao, Y.; Dong, C. HDRUnet: Single image hdr reconstruction with denoising and dequantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 354–363. [Google Scholar]
Banterle, F.; Ledda, P.; Debattista, K.; Chalmers, A. Inverse tone mapping. In Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia 2006, Kuala Lumpur, Malaysia, 29 November–2 December 2006; ACM: New York, NY, USA, 2006; pp. 349–356. [Google Scholar]
Banterle, F.; Debattista, K.; Artusi, A.; Pattanaik, S.N.; Myszkowski, K.; Ledda, P.; Chalmers, A. High Dynamic Range Imaging and Low Dynamic Range Expansion for Generating HDR Content. Comput. Graph. Forum 2009, 28, 2343–2367. [Google Scholar] [CrossRef] [Green Version]
Yan, Q.; Zhang, L.; Liu, Y.; Zhu, Y.; Sun, J.; Shi, Q.; Zhang, Y. Deep hdr imaging via a non-local network. IEEE Trans. Image Process. 2020, 29, 4308–4322. [Google Scholar] [CrossRef] [PubMed]
Kalantari, N.; Ramamoorthi, R. Deep high dynamic range imaging of dynamic scenes. ACM Trans. Graph. 2017, 36, 144. [Google Scholar] [CrossRef]
Wu, S.; Xu, J.; Tai, Y.; Tang, C. Deep high dynamic range imaging with large foreground motions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 117–132. [Google Scholar]
Eilertsen, G.; Kronander Denes, G.; Mantiuk, R.; Unger, J. HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. 2017, 36, 1–15. [Google Scholar] [CrossRef]
Endo, Y.; Kanamori, Y.; Mitani, J. Deep Reverse Tone Mapping. ACM Trans. Graph. 2017, 36, 177. [Google Scholar] [CrossRef]
Marnerides, D.; Bashford-Rogers, T.; Hatchett, J.; Debattista, K. Expandnet: A deep convolutional neural network for high dynamic range expansion from low dynamic range content. Comput. Graph. Forum. 2018, 37, 37–49. [Google Scholar] [CrossRef] [Green Version]
Lee, S.; Hwan An, G.; Kang, S. Deep recursive hdri: Inverse tone mapping using generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 596–611. [Google Scholar]
Lee, S.; An, G.; Kang, S. Deep chain hdri: Reconstructing a high dynamic range image from a single low dynamic range image. IEEE Access 2018, 6, 49913–49924. [Google Scholar] [CrossRef]
Lee, S.; Jo, S.; An, G.; Kang, S. Learning to Generate Multi-Exposure Stacks with Cycle Consistency for High Dynamic Range Imaging. IEEE Trans. Multimed. 2020, 23, 2561–2574. [Google Scholar] [CrossRef]
Liu, Y.; Lai, W.; Chen, Y.; Kao, Y.; Yang, M.; Chuang, Y.; Huang, J. Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1651–1660. [Google Scholar]
Banterle, F.; Marnerides, D.; Debattista, K.; Bashford-Rogers, T. Unsupervised HDR Imaging: What Can Be Learned from a Single 8-bit Video? arXiv 2022, arXiv:2202.05522. [Google Scholar]
Zhu, J.; Park, T.; Isola, P.; Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Debevec, P.; Malik, J. Recovering high dynamic range radiance maps from photographs. In Proceedings of the SIGGRAPH, Los Angeles, CA, USA; 2008; pp. 1–10. [Google Scholar]
Yu, J.; Fan, Y.; Yang, J.; Xu, N.; Wang, Z.; Wang, X.; Huang, T. Wide activation for efficient and accurate image super-resolution. ArXiv 2018, arXiv:1808.08718. [Google Scholar]
Salimans, T.; Kingma, D. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Adv. Neural Inf. Process. Syst. 2016, 2016, 901–909. [Google Scholar]
Mathieu, M.; Couprie, C.; LeCun, Y. Deep multi-scale video prediction beyond mean square error. ArXiv 2015, arXiv:1511.05440. [Google Scholar]
Fairchild, M. The HDR photographic survey. In Proceedings of the Color and Imaging Conference, Albuquerque, NM, USA, 5–9 November 2007; pp. 233–238. [Google Scholar]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Nemoto, H.; Korshunov, P.; Hanhart, P.; Ebrahimi, T. Visual attention in LDR and HDR images. In Proceedings of the 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), Chandler, AZ, USA, 5–6 February 2015. [Google Scholar]
Mantiuk, R.; Kim, K.; Rempel, A.; Heidrich, W. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Trans. Graph. 2011, 30, 1–14. [Google Scholar] [CrossRef]
Reinhard, E.; Stark, M.; Shirley, P.; Ferwerda, J. Photographic tone reproduction for digital images. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, San Antonio, AP, USA; 2002; pp. 267–276. [Google Scholar]
Kim, M.; Kautz, J. Others Consistent tone reproduction. In Proceedings of the Tenth IASTED International Conference on Computer Graphics and Imaging, Innsbruck, Austria, 13–15 February 2008; ACTA Press Anaheim: Calgary, AB, Canada, 2008; pp. 152–159. [Google Scholar]
Banterle, F.; Artusi, A.; Debattista, K.; Chalmers, A. Advanced High Dynamic Range Imaging; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Zheng, C.; Li, Z.; Yang, Y.; Wu, S. Single image brightening via multi-scale exposure fusion with hybrid learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 1425–1435. [Google Scholar] [CrossRef]
Yang, Y.; Cao, W.; Wu, S.; Li, Z. Multi-scale fusion of two large-exposure-ratio images. IEEE Signal Process. Lett. 2018, 25, 1885–1889. [Google Scholar] [CrossRef]
Drago, F.; Myszkowski, K.; Annen, T.; Chiba, N. Adaptive logarithmic mapping for displaying high contrast scenes. Comput. Graph. Forum. 2003, 22, 419–426. [Google Scholar] [CrossRef]
Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 pirm challenge on perceptual image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]

Figure 1. Training and inference phases of cycle learning and cyclic learning. Blue and yellow blocks mean EV up and down networks, respectively. Here, the dotted block indicates an auxiliary network that is necessary only for learning. Blocks of the same color represent the same network. Also, both cycle learning and cyclic learning create a multi-EV stack through recursive use of the inference phase of (c).

i

indicates relative EV.

Figure 1. Training and inference phases of cycle learning and cyclic learning. Blue and yellow blocks mean EV up and down networks, respectively. Here, the dotted block indicates an auxiliary network that is necessary only for learning. Blocks of the same color represent the same network. Also, both cycle learning and cyclic learning create a multi-EV stack through recursive use of the inference phase of (c).

i

indicates relative EV.

Figure 2. Overview of recent indirect iTM methods. Curved arrows indicate the recursive usage of EV up/down networks. Debevec’s algorithm [17] was used to merge a multi-EV stack for fair comparison with the previous works.

Figure 3. The pixel-wise saturation phenomenon occurring when EV up/down networks are simultaneously updated and trained according to Equation (1). (left) An LDR of EV−3, (right) Magnified regions of interest (ROIs).

Figure 4. Training process of the proposed networks. The networks (a,b) are alternately operated in a single iteration. As a result, the training process accomplishes forward and backward cycle consistency.

Figure 5. The architecture of the proposed EV up network. Note that the up/down networks have the equivalent architecture. The number above the convolution block indicates the channels of features.

Figure 6. HDR reconstruction performance comparison. Here, all HDRs except input LDR are tone mapped by Reinhard’s TMO and gamma-corrected for demonstration. Please refer to the magnified regions below each example for details.

Figure 7. Multi-EV stack generation performance comparison. The center image is an input, the left three images are the results from the EV down network, and the right three images are the results from the EV up network. That is, EV increases from the far left (EV−3) to the right (EV+3).

Figure 8. Cycle consistency comparison. Each image, excluding GT, is the result of passing all the EV up/down networks.

Table 1. Quantitative comparison of several iTM methods. For this experiment, pretrained weights were used, and * indicates citing numerical figures from the corresponding paper. Red (bold) and blue (underline) mean 1st place and 2nd place, respectively.

Method	Params	HDR-VDP Q Score $(m / σ)$	Reinhard TMO		Kim and Kautz TMO
Method	Params	HDR-VDP Q Score $(m / σ)$	PSNR $(m / σ)$	SSIM $(m / σ)$	PSNR $(m / σ)$	SSIM $(m / σ)$
HDRCNN [8]	29 M	52.98/3.41	18.74/2.16	0.809/0.092	19.67/3.19	0.831/0.091
DrTMO [9]	48 M	53.62/3.59	26.93/3.97	0.886/0.083	23.06/4.69	0.854/0.099
ExpandNet [10]	0.5 M	52.18/3.33	22.73/3.06	0.839/0.090	21.80/2.52	0.824/0.098
Deep chain HDRI [12]	0.4 M	49.80/5.97 *	25.77/2.44 *	-	22.62/3.39 *	-
Deep recursive HDRI [11]	24 M	53.05/3.41	26.54/3.18	0.879/0.077	25.75/2.89	0.864/0.085
Deep cycle HDRI [13]	29 M	54.57/3.68	26.65/3.11	0.890/0.077	26.94/3.28	0.880/0.086
SingleHDR [14]	74 M	54.88/1.87 *	27.07/1.73 *	-	20.48/2.91 *	-
Ours	0.7 M	55.63/3.45	27.42/3.25	0.892/0.082	27.10/3.13	0.881/0.093

Table 2. The effect of each component of the proposed method on the overall performance.

Cyclic Learning	WN Removal	Luminance Compensation	HDR-VDP Q Score	Gain with Each Component
			55.21	-
$\sqrt$			55.30	0.09
$\sqrt$	$\sqrt$		55.51	0.21
$\sqrt$	$\sqrt$	$\sqrt$	55.63	0.12

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, J.; Song, B.C. Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping. Electronics 2022, 11, 2436. https://doi.org/10.3390/electronics11152436

AMA Style

Park J, Song BC. Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping. Electronics. 2022; 11(15):2436. https://doi.org/10.3390/electronics11152436

Chicago/Turabian Style

Park, Jiyun, and Byung Cheol Song. 2022. "Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping" Electronics 11, no. 15: 2436. https://doi.org/10.3390/electronics11152436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cyclic Learning-Based Lightweight Network for Inverse Tone Mapping

Abstract

1. Introduction

2. Related Work

2.1. Direct Inverse Tone Mapping

2.2. Indirect Inverse Tone Mapping

2.3. Inverse Tone Mapping with Cycle Consistency

3. Methods

3.1. Cyclic Learning

3.2. Lightweight Network

4. Experimental Results

4.1. Experimental Setup

4.2. Quantitative Results

4.3. Qualitative Results

4.4. Ablation Study

4.4.1. Effect of Each Component of the Proposed Method

4.4.2. Realization of Cycle Consistency

5. Conclusions

6. Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI