Two-Exposure Image Fusion Based on Optimized Adaptive Gamma Correction

In contrast to conventional digital images, high-dynamic-range (HDR) images have a broader range of intensity between the darkest and brightest regions to capture more details in a scene. Such images are produced by fusing images with different exposure values (EVs) for the same scene. Most existing multi-scale exposure fusion (MEF) algorithms assume that the input images are multi-exposed with small EV intervals. However, thanks to emerging spatially multiplexed exposure technology that can capture an image pair of short and long exposure simultaneously, it is essential to deal with two-exposure image fusion. To bring out more well-exposed contents, we generate a more helpful intermediate virtual image for fusion using the proposed Optimized Adaptive Gamma Correction (OAGC) to have better contrast, saturation, and well-exposedness. Fusing the input images with the enhanced virtual image works well even though both inputs are underexposed or overexposed, which other state-of-the-art fusion methods could not handle. The experimental results show that our method performs favorably against other state-of-the-art image fusion methods in generating high-quality fusion results.


Introduction
Image fusion has been a crucial low-level image processing task for various applications, such as multi-spectrum image fusion [1,2], multi-focus image fusion [3], multi-modal image fusion [4], and multi-exposure image fusion [5]. Among these applications, thanks to smartphones' prevalence with their built-in cameras, multi-exposure image fusion is one of the most common applications. Since most natural scenes have a larger ratio of light to dark than what a single camera shot can capture, a single-shot image usually cannot present details of high dynamic ranges, thus having under-or overexposed parts for the scene. When a camera captures an image, its sensors can only catch a limited luminance range during a specific exposure time, resulting in a so-called low-dynamic-range image. An image taken for short exposure tends to be dark, while it is bright for long exposure, as shown in Figure 1a. Fusing differently exposed low-dynamic-range (LDR) images to obtain a high-dynamic-range (HDR) image requires extracting well-exposed (highlighted) regions from each LDR image to generate an excellent fused image, which has been very challenging.
Several research works have been performed for Multi-scale Exposure Fusion (MEF) [6][7][8]. In general, it is common to fuse LDR images using a weighted sum, where the weight associated with each input LDR is determined in a pixel-wise fashion [6][7][8]. Mertens et al. [6] proposed the fusion of images in a multi-scale manner based on pixel contrast, saturation, and well-exposedness to ease content inconsistency issues in the fused results. However, this often yields halo artifacts in its fusion results. In [7,8], the authors addressed the artifacts by applying modified guided image filtering to weight maps to eliminate halos around edges. The abovementioned methods produce good results using a sequence of images exposed in a small interval of different exposure values (EV). Thanks to advanced sensor technology, a camera with Binned Multiplexed Exposure High-Dynamic-Range (BME-HDR) or Spatially Multiplexed Exposure High-Dynamic-Range (SME-HDR) technology can simultaneously capture an image pair with short-and long-exposure image sensors. The captured pair has only a negligible difference, possibly caused by local motion blur between them. The existing MEF methods may not work well with two exposure images, since none of the inputs may have well-exposed contents. In addition, weighted-sum fusion based on well-exposedness may not be able to deal with highlighted regions of a short-exposure image that are darker than dark parts in a long-exposure image, resulting in the method ignoring contents in the short-exposure image. Yang et al. [9] proposed the production of an intermediate virtual image with a medium exposure based on an image pair with two exposures to help generate better fusion results. Nevertheless, it does not work in situations where highlighted regions of both input LDR images are not well exposed.
In recent years, deep convolutional neural networks (CNNs) have gained tremendous success in low-level image processing works. In MEF, CNN-based methods [10,11] can better learn features from input multiple-exposure images and fuse them into a nice image. However, the fused images often lack image details [12], since spatial information may be lost when features pass through deep layers. Xu et al. [13] proposed a unified unsupervised image fusion network trained based on the importance and information carried by the two input images to generate fusion results. However, these learning-based methods can only produce a fused image based on the two input images' interpolation. They cannot deal with cases where both of the input images do not have highlighted regions/contents. This paper presents a two-exposure fusion framework that generates a more helpful intermediate virtual image for fusion using the proposed Optimized Adaptive Gamma Correction (OAGC). The virtual image has better contrast, saturation, and well-exposedness, and it is not restricted to being an interpolated version of the two input images. Fusing the input images with their virtual image processed by OAGC works well even though both inputs have no well-exposed contents or regions. Figure 1b shows an example where the proposed framework can still generate a good fusion result for when both of the input images lack highlighted regions (Figure 1a). Our primary contributions are three-fold:

•
Our image fusion framework adopting the proposed OAGC can produce better fusion results for two input images with various exposure ratios, even when both of the input images lack well-exposed regions. • The proposed framework with OAGC can also adapt to single-image enhancement.
• We conduct an extensive experiment using a public multi-exposure dataset [14] to demonstrate that the proposed fusion framework performs favorably against the state-of-the-art image fusion methods.

Related Work
MEF-based methods produce fusion results using a weighted combination of the input images based on each pixel's "well-exposedness". In [15], fusion weight maps were calculated based on the correlation-based match and salience measures of the input images. With the weight maps, one can fuse the input images into one by using the gradient pyramid.
Mertens et al. [6] constructed fusion weight maps based on contrast, saturation, and exposedness of the input images. Differently from [15], the fusion was performed with the Gaussian and Laplacian pyramids. The problem was that using the smoothed weight maps in fusion often causes halo artifacts, especially around the edges. The method proposed in [7] addressed this issue by applying an edge-preserving filter (weighted guided image filtering [16]) to fusion weight maps. Kou et al. [8] further proposed an edge-preserving gradient-domain guided image filter (GGIF) to avoid generating halo artifacts in the fused image. To extract image details, Li et al. [7] proposed a weighted structure tensor to manipulate details presented in a fused image. In general, MEF-based methods can generate decent fusion results.
General MEF algorithms [6,8] that require a sequence of images with different exposure ratios as the inputs may not work with only two input images. Yang et al. [9] proposed the use of the MEF algorithm for two-exposure-ratio image fusion, where an intermediate virtual image with a medium exposure is generated to help produce a better fusion result. However, the virtual image's intensity and exposedness are bounded by the two input images, which often fails to work for cases where two images are both underexposed and overexposed. Yang's method [9] can only generate both the intermediate and fusion results with approximate medium exposure between its two input images. The problem is that medium exposure between the inputs may still be under-or overexposure. Image fusion will not improve visual quality. We will discuss this issue more in the next section.
In the following paragraphs, we introduce the techniques adopted in the work of Yang et al., including the generation of the virtual image and fusion weights and the multiscale image fusion. Before continuing, we define several notations that are used here. Let I ∈ R M×N×3 be a color image. We denote I (c) as the color channel c, where c ∈ {R, G, B} stand for the red, green, and blue channels. I(m, n) represents the pixel located at (m, n), where 0 ≤ m < M and 0 ≤ n < N. M and N are the image width and height. Let Y be the luminance component or the grayscale version of I. Note that the values of images in this paper are normalized to [0, 1].

Quality Measures and Fusion Weight Maps
In HDR imaging, an image taken at a certain exposure may contain underexposed or overexposed regions, which are less informative and should be assigned fewer weights in multi-exposure fusion. The input's contrast, saturation, and well-exposedness determine a pixel's weight at (m, n) [6]. The contrast of a pixel, denoted by C(m, n), is obtained by applying a 3 × 3 Laplacian filter to a grayscale version of the image: Let C = C(m, n) be the map of the contrast of I; therefore, where Y l , Y r , Y u , and Y d are obtained from I l , I r , I u , and I d ; i.e., shifting I one pixel left, right, up, and down, respectively. The saturation of the pixel, denoted by S(m, n), is obtained by computing the standard deviation across the red, green, and blue channels: whereĪ (m, n) = 1 3 ∑ c∈{R,G,B} I (c) (m, n). The well-exposedness of the pixel, E(m, n), is defined as: where σ = 0.2 and ξ = 0.5. Essentially, E is a normal distribution centered at 0.5 with a standard deviation of 0.2. The maps of saturation and well-exposedness of I can, respectively, be represented as S = S(m, n) and E = E(m, n) . Next, the weight of the pixel for fusion is computed using: where ω c , ω s , and ω e can be adjusted to emphasize or ignore one or more measures. Considering a set of P images I 1 , . . . , I P for image fusion, the weight of this pixel in the p th image is normalized by the sum of the weights across all the images at the same pixel: The weight map of the image I p is represented as W p = W p (m, n) .

Multi-Scale Fusion
In the MEF algorithm [6], a fusion image,Î, is obtained through multi-scale image fusion based on the standard Gaussian and Laplacian pyramids. For each input image I p in the set of I p P p=1 , the Laplacian pyramid, L (l) I p , and the Gaussian pyramid of its weight map, G (l) W p , in the l th level are constructed by applying the Gaussian pyramid generation [17]. In this level, the overall Laplacian pyramid is collapsed by performing weighted averaging on the Laplacian pyramids from all of the input images in the set: where denotes element-wise multiplication. Finally, the fusion image,Î, is reconstructed by collapsing the Laplacian pyramids L (l) Î . Applying edge-preserving filtering to preserve edges in the weight maps before averaging the Laplacian pyramids in Equation (6) can reduce halo artifacts in fused images. In [9], the GGIF [18] was adopted to smooth the weight maps W p and to preserve the significant change as well. Let Ω ρ (m 0 , n 0 ) be the square local patch with a radius of ρ centered at (m 0 , n 0 ), and let (m, n) be a pixel in the patch. In Ω ρ (m 0 , n 0 ), the weight map in the l th level of the p th image, W where a p,(m 0 ,n 0 ) can be obtained by minimizing the objective function: where is a constant for regularization. The variance of the intensities within this local patch, σ 2 , is computed when solving for the coefficients in Equation (8).
In GGIF, a 3 × 3 local window, Ψ, is applied to the pixels within Ω ρ (m 0 , n 0 ) for capturing the structure within Ω ρ (m 0 , n 0 ) by computing the variance within Ψ, σ 2 This local window makes GGIF a content-adaptive filter; thus, GGIF produces fewer halos and better preserves the edge than the GIF. In GGIF, the regularization term is designed to yield: where Γ Y k (m 0 , n 0 ) and ζ (m 0 ,n 0 ) are computed according to the product of σ p,(m 0 ,n 0 ) can solved by minimizing Λ GG in Equation (9).
The fused imageÎ can be obtained by fusing the Laplacian pyramids of the input images taken at different exposures using the weight maps retrieved from the Gaussian pyramids, G (l) W p . Note that the weight maps are filtered using GGIF, as described in Equation (9), to preserve edges.

Virtual Image Generation
In [9], Yang et al. proposed the modification of two differently exposed images to have the same medium exposure using the intensity mapping function based on the crosshistogram between two images, called the comparagram (Ref. [19]), and fused them to produce an intermediate virtual image. Let I 1 and I 2 be the two input images and let F 12 and F 21 be the intensity mapping functions (IMFs) that map I 1 to I 2 and I 2 to I 1 . Based on [19], the IMFs that map the two images to the same exposure, denoted as F 13 and F 23 , are computed as F 13 (z)(I i ) = (zF 12 (z)) 0.5 , F 23 (z) = (zF 21 (z)) 0.5 , where z is a pixel intensity. The two modified images with the same exposure are The desired virtual image I v is computed by fusing I 1 and I 2 using the weighting functions adopted in [9]. The two-exposure-fusion image in [9] is obtained by fusing I 1 , I 2 , and I v based on the MEF algorithm [8].
As described previously, Yang's method often fails to produce a satisfying fusion result when the medium exposure between inputs is still under-or overexposure. The proposed method addresses this issue by improving the contrast, saturation, and wellexposedness for the intermediate virtual image to generate better fusion results under different input conditions.

Proposed Method
The algorithm in [9] can work for two images with a large difference between their exposure ratios. In this case, the intermediate virtual image with medium exposure helps bridge the dynamic range gap between the two inputs. Thus, it can improve the quality of the fusion result. However, if the two inputs' exposure is under-or overexposure, the generated virtual image would not help fusion. Thus, the quality of the fused image is not improved much.
For example, to fuse Figure 2a,b, both of which look overexposed, the virtual image I v (Figure 2c) generated by [9] with medium exposure between the inputs is still overexposed and, thus, not helpful for the fusion result (Figure 2e). We propose Optimized Adaptive Gamma Correction (OAGC) to enhance the intermediate virtual image to have better contrast, saturation, and well-exposedness (Figure 2d) so that it can improve the fusion quality and produce a better result (Figure 2f). In OAGC, we derive an optimal γ based on the input's contrast, saturation, and wellexposedness by formulating an objective function based on these image quality metrics and apply it to the input image using gamma correction. Let Y(m, n) be the luminance of a pixel. One can gamma-correct the image Y to alter its luminance through the power function as follows: where Y γ is the corrected image, η and γ are positive scalars, and η is usually set to 1 [20].
Here, the notation Y γ in bold represents the entire image, while Y γ (m, n) stands for the pixel located at (m, n). If γ < 1, it stretches the contrast of shadow regions (pixel intensities less than the mid-tone of 0.5), and features in these regions become discernible, whereas if γ > 1, it stretches the contrast of bright regions (intensities larger than 0.5), and features in the regions become perceptible. For γ = 1, it is linear mapping.
To derive the optimal gamma, we design an objective function as follows: , where C γ , S γ , and E γ are the maps of quality measures computed based on the gamma-corrected version of the input image, denoted as I γ . Here, the virtual image I v is used as the input, which is I γ := I γ v . We set k c , k s , and k e to 4, 0.5, and 1 according to the upper bounds of the corresponding quality measures (contrast, saturation, and well-exposednesse; refer to the Appendix A for the derivation). The term withr(γ) in the objective function prevents the corrected image from deviating the input too much. Hence, minimizing the objective function f (γ) is to maximize all three quality measures: the contrast, saturation, and well-exposedness. q 1 , q 2 , and q 3 are the weighting factors for the contributions from different quality measures (independent from ω c , ω s , and ω e in Equation (4) and are all set to 1 3 . δ is a small, fixed scalar and is set to 0.1 in the present study. 1 is the vector of 1s, vec(·) is the vectorization of a matrix, and · represents the 2-norm of a vector. The regularization term is added to avoid possible color distortion caused by gamma correction.
The optimal gamma, γ * , which aims to increase contrast, saturation, and well-exposedness simultaneously, can be obtained by minimizing the optimization function f (γ): Since there is no closed-form solution for Equation (13), we apply the gradient descent to iteratively approximate it: where with i v,l , i v,r , i v,u , and i v,d being the vectorization of I v,l , I v,r , I v,u , and I v,d , as well as t (R) , t (G) , and t (B) being 0.299, 0.587, and 0.114 respectively.
with being the element-wise division, and α (k) is the adjustable learning rate. Figure 3 shows the flowchart of the presented two-exposure image fusion framework, where the two inputs are taken in the same scene at different exposure ratios. The virtual image is first generated using the intensity mapping function [9]. Next, we solve Equation (12) to find the optimal gamma value γ * for the virtual image, which enhances the contrast, saturation, and well-exposedness of I v . The final fused image,Î, is obtained by applying the MEF algorithm [8,9] to the fusion of two input images and I γ .

Experimental Results
In the experiment, we compared the proposed method against state-of-the-art image fusion methods, which included Kou's method [8], DeepFuse [10], Yang's method [9], and U2Fusion [13]. We adopted the SICE datasest [14] and collected 116 image pairs that consisted of various scenes to evaluate the performance. The presented algorithm was implemented using MATLAB R2019b on a MacBook Pro with an Intel i5 dual-core processor at 2.7 GHz and 8 GB 1867 MHz DDR3 RAM. We present a performance evaluation with a qualitative visual comparison and a quantitative objective assessment in the following.

Qualitative Assessment
We compared different fusion results under various input conditions. First, Figure 4 shows the fusion results of using the compared image fusion algorithms [8][9][10]13] and our presented framework. As can be seen, one input image is underexposed, and the other is overexposed in the two cases, where the fusion results should have middle exposure between the two inputs. All of the compared methods worked fine in such cases, although U2fusion's [13] fusion results were a little darker than the others' results. Figure 4. Comparison of the results obtained using different fusion methods with an underexposed and an overexposed input. (a,b) show the input images squared in red. The fusion results were obtained using (c) Kou's method [8], (d) DeepFuse [10], (e) Yang's method [9], (f) U2Fusion [13], and (g) the proposed method. Figure 5 shows image fusion cases where the difference between the two input images' EVs was not large. Thus, fusion methods that can only produce results with medium exposure between the inputs do not work. As shown, all of the compared methods except for the proposed framework output fusion results similar to the input images, and were thus unable to reveal more details than the inputs. In contrast, the proposed framework produced an intermediate virtual image enhanced by OAGC with additional well-exposed highlighted contents and generated better fusion results. Therefore, we can further improve the overall image visibility by revealing details in regions that are too dark or bright. Figure 5. Comparisons of the fusion results using different algorithms, where the two input images had smaller exposure differences. (a,b) show the input images squared in red. The fusion results were obtained using (c) Kou's method [8], (d) DeepFuse [10], (e) Yang's method [9], (f) U2Fusion [13], and (g) the proposed method.

Quantitative Assessment
Objectively, we compare the performance of our presented framework against other image fusion methods using five benchmark metrics: the Naturalness Image Quality Evaluator (NIQE) [21], Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [22], No-Reference Image Quality Assessment (NR-IQA) [23], and discrete entropy (DE) [24]. The NIQE [21] is a no-reference image quality metric that is trained on pristine images without subjective scores from humans. Therefore, it can measure image quality degradation if any distortions exist, but is correlated little with human perceptuality. A smaller value means a better quality. BRISQUE [22] is a natural scene statistics-based distortion-generic no-reference image quality assessment model that is trained on images with known distortions and subjective quality scores. It can evaluate losses of naturalness of an image caused by possible distortions. A BRISQUE value ranges between 0 and 100. A smaller value means worse visual quality. NR-IQA [23] is another no-reference image quality metric for HDR images that is constructed using deep CNNs while considering image saliency, and it constructs deep CNNs to extract quality features across the HDR and LDR domains. DE [24] can represent the information contained in an information source, i.e., if an image has higher entropy, it contains more information. Consequently, it is often used to measure the richness of image details. It is defined as: where I is a grayscale image, L represents the largest pixel intensity value, and p I (l) is the probability density function of a given grayscale intensity l. Table 1 shows the quantitative performance of different fusion methods, where the scores are averaged over all of the test images. As can be seen, the results demonstrate that the presented framework achieved the best scores in all four categories, meaning that our fusion results looked natural with the fewest distortions (having the lowest NIQE [21] and lowest BRISQUE values [22]). In assessing the HDR image quality (NR-IQA [23]), our method performed favorably against other fusion methods. Our method could also preserve the most image details in the fusion results (largest DE value [24]).

Extension to Single Image Enhancement
As stated previously, the proposed framework works well for two-exposure image fusion in cases where the difference between the two input images' EVs varies. In recent years, fusion-based single image enhancement methods have attracted much attention [14]. We can also extend our framework to single image enhancement by applying OAGC to the input image, I, to yield a quality-improved image I γ . Then, both I and I γ are fused to obtain an enhancement result. Figure 6 compares the results obtained using various single image enhancement methods, including global histogram equalization (HE) [25], CVC [26], AGCWD [27], EPMP [28], SICE [14], and the proposed method. As shown, the conventional HE tended to over-enhance/introduce noise to the processed images (Figure 6b), since the input images had over-and underexposed regions. SICE [14] only performed well for the second row of Figure 6, where the input image was underexposed. For the other cases, it tended to overexpose the input images. The other methods [26][27][28] could only enhance the contrast of the input images, while the proposed framework not only did that, but also revealed unseen details from the input and increased the color vividness ( Figure 6g).  [25], (c) CVC [26], (d) AGCWD [27], (e) EPMP [28], (f) SICE [14], and (g) the proposed method (fusing I and I γ ).

Analysis of OAGC
Convergence of Gradient Descent: To further analyze the process of attaining the target gamma value γ * in OAGC, we take the case in the top row of Figure 5 as an example to show the iterative steps of finding γ * for the intermediate virtual image I v . Figure 7 shows that it takes about 66 steps for the objective function f (γ) to converge with gradient descent, and it attains γ * = 0.434. The value of the objective function changes from the initial 2.690 to 2.688. As γ = γ * ,ê(γ) reaches its minimum, andĉ(γ) is close to its minimum whilê s(γ) is at its maximum, indicating that contrast, saturation, and well-exposedness are all maximized. This also shows that solving the objective function strikes a balance among these three measures. To further attest to the effectiveness of OAGC, Figure 8 shows the trend of values of the objective function and its quality measure terms using the grid-search method on γ. As seen, the minimum of the objective function is 2.688 when γ = 0.434, consistently with the γ * obtained using gradient descent. Limitation of OAGC: Using OAGC, we can attain a gamma coefficient γ * from the input image by optimizing the objective function in Equation (12), and we can then apply gamma correction to the input to generate a corrected image whose contrast, saturation, and well-exposedness are improved. However, if both of the input images have no content at all for the same regions due to extremely low or high exposure, even OAGC cannot help generate or restore those regions from nothing. Figure 9 shows a failure case of OAGC, where both of the input images are very underexposed and bear little content. Figure 9c shows the intermediate virtual images obtained using the intensity mapping algorithm described in [9], which is similar to the interpolation of the inputs, and the result is still very dark and lacks content. After applying OAGC to it (γ * = 0.2015), the processed virtual image (Figure 9d) presents more details than before. Still, it inevitably has noise in some regions (such as the door and the banner on the façade).

Conclusions
This paper presented a two-exposure image fusion framework that utilizes the proposed OAGC to bring out additional well-exposed contents from an intermediate virtual image derived from the two inputs. It can work better for the input images with various combinations of exposure ratios and can produce more well-exposed fusion results. In addition, the proposed framework with OAGC can easily adapt to single image enhancement. The experimental results have demonstrated that the proposed method performs favorably against the state-of-the-art image fusion methods.
The saturation S(m, n) is defined as in Equation (2). LetĪ(m, n) be the mean of all the channels of this pixel; i.e.,Ī