An Adaptive Exposure Fusion Method Using Fuzzy Logic and Multivariate Normal Conditional Random Fields

High dynamic range (HDR) has wide applications involving intelligent vision sensing which includes enhanced electronic imaging, smart surveillance, self-driving cars, intelligent medical diagnosis, etc. Exposure fusion is an essential HDR technique which fuses different exposures of the same scene into an HDR-like image. However, determining the appropriate fusion weights is difficult because each differently exposed image only contains a subset of the scene’s details. When blending, the problem of local color inconsistency is more challenging; thus, it often requires manual tuning to avoid image artifacts. To address this problem, we present an adaptive coarse-to-fine searching approach to find the optimal fusion weights. In the coarse-tuning stage, fuzzy logic is used to efficiently decide the initial weights. In the fine-tuning stage, the multivariate normal conditional random field model is used to adjust the fuzzy-based initial weights which allows us to consider both intra- and inter-image information in the data. Moreover, a multiscale enhanced fusion scheme is proposed to blend input images when maintaining the details in each scale-level. The proposed fuzzy-based MNCRF (Multivariate Normal Conditional Random Fields) fusion method provided a smoother blending result and a more natural look. Meanwhile, the details in the highlighted and dark regions were preserved simultaneously. The experimental results demonstrated that our work outperformed the state-of-the-art methods not only in several objective quality measures but also in a user study analysis.


Introduction
All surroundings have a large dynamic range-the luminance of the highlight region might be over one hundred thousand times larger than that of the dark region. However, common cameras can capture a small portion of the dynamic range. If the exposure time is long, a detailed scene in a dark region can be captured. However, the content in the highlight region is lost because of over-saturation (over-exposure). By contrast, if the exposure time is short, the details in the dark region are lost because of under-exposure. Both are unacceptable. In addition, most traditional display devices only support 24 bit RGB (red, green, and blue) color images. In this case, representing all details of natural scenes on displays is a challenge. Displaying natural scenes as perceived through the human visual system becomes a difficult task; therefore, high dynamic range (HDR) techniques play a crucial image processing areas, such as image de-hazing [14], image de-noising [15], image decomposition [16], and contrast enhancement [17]. Applying WGIF in the proposed enhanced fusion allows users to manipulate the degree of sharpness in a more appropriate way, and the details in the highlight/dark regions are better preserved.
After determining the optimal weights by the fuzzy-MNCRF model, this paper adopted the pyramid decomposition scheme for multi-scale fusion of differently exposed images. The concept of pyramid-based fusion is to first smooth and sub-sample all the input images repeatedly (according to how many levels we want) and then fuse them through individual levels of image pyramid (spatial scales). Applying the pyramid decomposition scheme creates a set of cascading versions of the input image, which is useful in extracting structures or features at multiple scales. In addition to image fusion, pyramid decomposition scheme is also applied in different topics, such as image filtering [18], dehazing [19], and image decolorization [20]. Compared to single-scale weighted averaging, multi-scale fusion provides more seamless and pleasant results. In view of the advantage of multi-scale decomposition, there are several representative multi-scale exposure fusion methods proposed recently. In Reference [21], the first stage is similar to an extension of Reference [4] by integrating with the weighted guided filter, and the second stage involves using the structure tensor to preserve the details in the bright/dark regions. In Reference [22], an edge-preserving smoothing pyramid, which is based on the gradient domain-guided image filter (GGIF) [23], is proposed to preserve the details in the brightest or darkest regions for multi-scale exposure fusion. In Reference [24], a multi-scale exposure fusion in YUV ((indicating luminance, chrominance, and chroma)) Color Space is proposed, which addresses the issue of the computational complexity of edge-preserving smoothing. Compared to the above methods, this work also presents a detail preservation scheme; moreover, we utilized the MNCRF model to fine-tune the weight maps (before the multi-scale fusion stage) for pleasing image quality.
The rest of this paper is organized as follow. In Section 2, we briefly explain the motivation of combining fuzzy logic and the MNCRF model in fusion weighting. In Section 3, we present the proposed approach. In Section 4, we provide the experimental results and compare them with the existing state-of-the-art methods. Finally, we conclude the paper in Section 5.

Motivation of Integrating Fuzzy Logic with MNCRF Model
Because of its applicability and capability of handling non-numerical information, fuzzy logic has been applied to many image processing topics, such as fuzzy filtering [25], fuzzy segmentation [26], and fuzzy contrast enhancement [27]. Fuzzy logic also demonstrates its effectiveness in some recently proposed image fusion methods. Celebi et al. [28] applied fuzzy logic to determine fusion weights; only one input image was required, and the other exposed images are generated from the input image by histogram separation and histogram equalization techniques. Rahman et al. [29] proposed a multifocal image fusion method, where the fuzzy logic is used to determine the degree of focus for in-focus and out-of-focus data.
In References [28,29], the fusion weights were determined by unidirectional analysis using fuzzy logic. We explored some artifacts in the fused images that were output from their methods, especially in local color inconsistency. Chen et al. [30] presented an exposure fusion method, which uses a fuzzy-feedback loop to control the sharpness of fused images in a more appropriate way. The image quality was considerably improved using this method. However, the number of loops might increase computational complexity. To address this difficulty, we proposed a two-step sequence-based weighting procedure that uses fuzzy logic to determine the initial fusion weights and uses the multivariate normal conditional random fields (MNCRF) model [31] to fine tune the weights. The undirected graph of the MNCRF model is illustrated in Figure 1, where the linkages between nodes indicates the associated conditional dependency. The MNCRF model is a scheme based on the stochastic process of multivariate vectors, which can encode contextual relationships among different random variables. It is widely applied in the areas which require excellent image quality or fine and precise details, such as image denoising [32], HDR map estimation [33], saliency detection [34], and object detection [35]. Therefore, this work utilized the MNCRF model to fine tune the weights. The proposed two-step weighting was based on our observation that a successful exposure fusion involves not only determining the weight according to individual pixel importance (i.e., weighting results of fuzzy inference system, FIS) but also considering the intra-and inter-image information simultaneously to maintain smoothness.

Proposed Approach
Throughout this paper, we use superscript = { , , } to denote different exposure levels: u, n, and o, respectively, indicate under-exposure, normal-exposure, and over-exposure. We use subscript i to denote the pixel position. Figure 2 describes the overall framework of the proposed approach. For simplicity but without loss of generality, we assumed that there were three input differently exposed images . The MNCRF model is a scheme based on the stochastic process of multivariate vectors, which can encode contextual relationships among different random variables. It is widely applied in the areas which require excellent image quality or fine and precise details, such as image denoising [32], HDR map estimation [33], saliency detection [34], and object detection [35]. Therefore, this work utilized the MNCRF model to fine tune the weights. The proposed two-step weighting was based on our observation that a successful exposure fusion involves not only determining the weight according to individual pixel importance (i.e., weighting results of fuzzy inference system, FIS) but also considering the intra-and inter-image information simultaneously to maintain smoothness.

Proposed Approach
Throughout this paper, we use superscript χ = {u, n, o} to denote different exposure levels: u, n, and o, respectively, indicate under-exposure, normal-exposure, and over-exposure. We use subscript i to denote the pixel position. Figure 2 describes the overall framework of the proposed approach. For simplicity but without loss of generality, we assumed that there were three input differently exposed images I χ i .

Fuzzy-Based Pixel Weights Initialization
One of the most typical exposure fusion methods is the method proposed in Reference [4], which determines the pixel weights by considering different properties at the same time. However, we have

Fuzzy-Based Pixel Weights Initialization
One of the most typical exposure fusion methods is the method proposed in Reference [4], which determines the pixel weights by considering different properties at the same time. However, we have found some artifacts, such as local hue inconsistency and slight seam effects, in Reference [4] which probably come from the imbalance among those properties. The fuzzy inference system (FIS) provides a straightforward and efficient method for modeling complex systems through fuzzy variables. Because exposure fusion involves searching for portions with details from input images I χ i and blending them to construct an HDR-like scene, quality metrics are excellent indicators to determine the fusion weights. To measure quality, the color space was converted from RGB to YUV (indicating luminance, chrominance, and chroma) color space. The proposed FIS was based on our observation that regions containing well-exposed or large gradients play an essential role in the fusion stage. In this study, two quality metrics were entered as inputs in the FIS, well-exposedness (τ) and local pixel-visibility (∇), which are, respectively, defined as follows: where Y denotes the luminance value, and the symbol N 4 (.) denotes the 4 connected neighboring pixels. Normally, if the luminance value is closer to 128, the image has a more pleasant visual appearance and is worth a higher weight. Thus τ simulates this property by using a Gaussian curve. Moreover, ∇ simulates the directional derivative that is close to calculating the gradient value, where the maximum operation is exploited in comparing the intensity difference to decrease the computation cost. Table 1 constructs the fuzzy rule base for FIS, which is specified by observing massive images. After the defuzzification process, the initial pixel weight (B) can be expressed as follows: where f uzzy χ i , χ = {u, n, o} indicates the crisp output from the FIS.

Weight Fine-Tuned Using the MNCRF Model
Fuzzy weighting allows us to efficiently extract both well-exposed regions and pixels containing strong local pixel-visibilities. Nevertheless, the color inconsistency problem, such as local hue inconsistency and seam effects, are not well-solved thus far, mainly because of two reasons: 1) the information in the UV channels is not considered yet; and 2) the inter-and intra-image relationships are not considered properly. Generating a high-quality HDR-like image is beyond weighting by the pixels' importance separately. Apparently, the initial weights from the FIS are somehow unbalanced among current properties and lack of analyzing the mutual relationship among different input images simultaneously. To address this problem, this study applied the MNCRF model to formulate the abovementioned information by treating B x i as the naïve weight. Modeling the weight determination in MEF (multiple exposure fusion) is sensitive. To avoid over-adjustment, the relationship between the naïve weight and its corresponding desired pixel weight was assumed to be a zero-mean Gaussian distribution. Moreover, to take the spatial coherence into account, the relationship among the desired neighboring pixel weights in the local region was also assumed to be another zero-mean Gaussian distribution. In the MNCRF model, two matrices are defined: where B is the naïve weight matrix, W is the corresponding MNCRF weight matrix, and N is the total number of pixels in an input image. This work adopts the maximum-a-posteriori (MAP) procedure to find the optimal W.

Inter-Image Relationships
An N × N precision matrix Λ was designed to represent the inter-image relationships of B and W which can be expressed as Λ = U + V. The matrix U is a diagonal matrix. Thus, the inter-image exposure correlation (i.e., same pixel position, but from differently exposed images) was considered. If the exposedness values of a pixel position for the three differently exposed images are similar , it implies that this position does not belong to an exposure-sensitive region and, thus, a more flexible modification of pixel weight can be presented in this pixel position. Therefore, the matrix U is defined as follows: where (i, j) is the element position of a matrix, and σ 1 = 1 in this work. The matrix U is further normalized so that the largest entry value is equal to one. The matrix V is a symmetric matrix, which considers the accumulated local hue continuity from the three input images. Because usually the spatially neighboring pixels have high probabilities of belonging to the same object, they have high chances of having similar exposedness, hue, and pixel weight. The MNCRF model should build a link between neighboring hue/luminance similarity and the output weights to alleviate the interference from noise and luminance variation. Therefore, the matrix V is defined as follows: where σ 2 is set as 1, and ∆UV χ ij is the chrominance difference defined in the UV color plane:

Intra-Image Relationships
An N × N precision matrix was designed to represent the intra-image relationships of B and W which can be expressed as = P + Q. The matrix Q is a symmetric matrix which takes neighboring color similarity into account. The color similarity (CS) index of the adjacent pixel pair (i, j) is defined as follows: where γ is set within the range of [0.4, 0.6]. Similar to Equation (7), ∆YUV χ ij is the color difference defined in the YUV color space: The CS index is constructed based on a Cauchy function, which is also a bell-shape function (as is the Gaussian function). However, as the color difference increases, the Cauchy function decreases more dramatically than the Gaussian function, which matches our observation on the weight adjustment. If the neighboring pixels have high color coherence, their linkage in the MNCRF model should be strong. Therefore, the matrix Q is defined as follows: The matrix P is a diagonal matrix, which takes the intra-image correlation into account to maintain the regional smoothness in the final fused image. If a pixel position has high color similarity to its four-neighboring pixels at all the three input images, then these pixels have a high possibility of belonging to the same object. Accordingly, the accumulation of both the CS and well-exposedness values is considered. Therefore, the matrix P is defined as follows: Derived from Reference [36], searching for the optimal fusion weights could be viewed as solving a MAP problem as follows: where Tr(.) denotes the trace operator. The optimal W of the MNCRF model can be expressed as follows: where each column of W indicates the 1D representation of a weight map. As depicted in the enlarged region of the cloud in Figure 3, the local hues from the differently exposed images and the dark-to-bright gradients were transferred more smoothly in the fused image using the proposed method (comparison among Figure 3a-c). Meanwhile, the details were preserved more completely because of the combination of FIS with the MNCRF model.
where each column of indicates the 1D representation of a weight map. As depicted in the enlarged region of the cloud in Figure 3, the local hues from the differently exposed images and the dark-to-bright gradients were transferred more smoothly in the fused image using the proposed method (comparison among Figure 3a-c). Meanwhile, the details were preserved more completely because of the combination of FIS with the MNCRF model.

Enhanced Multiscale Fusion with Region-Selective WGIF-Based Sharpening
Because each differently exposed image only contains a portion of dynamic range, there are three common major challenges in the fusion stage: edge-preserving, halo effects, and gradient reversal.
To address these problems, we propose an enhanced multiscale fusion that utilizes the weighted guided image filter (WGIF) technique as follows.
For the edge-preserving problem and only considering that the image gradients cannot completely represent the structural edges because these problems are scale-variant: a large gradient might not be an essential edge of the entire image, whereas a small gradient might be essential to a local region. In Reference [4], it was proven that pyramid representation is excellent at handling the edge-preserving decomposition problem with multiscale difference. Unlike the study in Reference [4], we used WGIF in two separate places to enhance the fine details. First, with regards to the structure-transferring property of WGIF, we added a preprocessing step in generating the guided images. Normally, the guided image is the input image itself. However, because WGIF can transfer (c) the inter-and intra-image information using the proposed fuzzy-MNCRF model. For this example, the three input differently exposed images are shown in

Enhanced Multiscale Fusion with Region-Selective WGIF-Based Sharpening
Because each differently exposed image only contains a portion of dynamic range, there are three common major challenges in the fusion stage: edge-preserving, halo effects, and gradient reversal.
To address these problems, we propose an enhanced multiscale fusion that utilizes the weighted guided image filter (WGIF) technique as follows.
For the edge-preserving problem and only considering that the image gradients cannot completely represent the structural edges because these problems are scale-variant: a large gradient might not be an essential edge of the entire image, whereas a small gradient might be essential to a local region. In Reference [4], it was proven that pyramid representation is excellent at handling the edge-preserving decomposition problem with multiscale difference. Unlike the study in Reference [4], we used WGIF in two separate places to enhance the fine details. First, with regards to the structure-transferring property of WGIF, we added a preprocessing step in generating the guided images. Normally, the guided image is the input image itself. However, because WGIF can transfer the structural edges from the guided image to the input image, a region-selective sharpening (RSS) scheme was used to enhance the details of the guided image: and where base plane BP χ i is the WGIF result which has mostly homogeneous regions with edges inherited from I χ i . Detail-enhanced plane DP χ denotes the RSS result which has the same homogeneous regions as I χ i but more enhanced details in texture regions. In some works, BP χ i and (I χ i − BP χ i ) are referred to as the base layer (containing large-scale variations) and the detail layer (containing small-scale details), respectively. Parameter n is the boosting coefficient (n is suggested to range from five to ten), and η i . was adopted from Reference [13] which is an edge-aware function used to distinguish the flat region from the texture region.
The WGIF is a local linear filter. Compared to other edge-preserving filter, such as bilateral filter, WGIF has better protection against the artifacts of halo and gradient reversal. Here, Ŵ χ i l and DP χ i l , respectively, denote the Gaussian pyramids of the fuzzy-MNCRF weight map and the sharpened image, where l is the number of pyramid levels. According to the property of WGIF, the primary details of DP χ i are transferred toŴ χ i at different pyramid levels through Then, the detail-enhanced weight pyramid W χ i l is fused with the Laplacian pyramid of the differently exposed images (L I χ i l ) at individual pyramid levels: The final synthesized image is reconstructed by collapsing the pyramid of L I χ i l .

Experimental Results and Discussions
To evaluate the performance of the proposed method, it was compared with the four recent methods in Reference [5] (2013), Reference [6] (2017), Reference [28] (2015), and Reference [23] (2015). Eight test image sequences were selected from public databases [37,38], and each of them contained three exposure levels, as shown in Figure 4. Quality measures are the objective tools which help us to quantitatively evaluate the performance among different methods. In this paper, we selected five image quality measures described as follows.
Sensors 2019, 18, x FOR PEER REVIEW 9 of 24 the structural edges from the guided image to the input image, a region-selective sharpening (RSS) scheme was used to enhance the details of the guided image: = WGIF input: , guided: and = × × − + (15) where base plane is the WGIF result which has mostly homogeneous regions with edges inherited from . Detail-enhanced plane denotes the RSS result which has the same homogeneous regions as but more enhanced details in texture regions. In some works, and ( − ) are referred to as the base layer (containing large-scale variations) and the detail layer (containing small-scale details), respectively. Parameter is the boosting coefficient ( is suggested to range from five to ten), and was adopted from Reference [13] which is an edge-aware function used to distinguish the flat region from the texture region.
The WGIF is a local linear filter. Compared to other edge-preserving filter, such as bilateral filter, WGIF has better protection against the artifacts of halo and gradient reversal.
Then, the detail-enhanced weight pyramid { } is fused with the Laplacian pyramid of the differently exposed images ( { } ) at individual pyramid levels: The final synthesized image is reconstructed by collapsing the pyramid of { } .

Experimental Results and Discussions
To evaluate the performance of the proposed method, it was compared with the four recent methods in Reference [5] (2013), Reference [6] (2017), Reference [28] (2015), and Reference [23] (2015). Eight test image sequences were selected from public databases [37,38], and each of them contained three exposure levels, as shown in Figure 4. Quality measures are the objective tools which help us to quantitatively evaluate the performance among different methods. In this paper, we selected five

Comparison of the Objective Quality Measures
The first quality measure is the Contrast and Sharpness Measurement Index (CSMI) introduced in [39]. The human visual system (HVS) captures wider dynamic range than a camera, which allows people to perceive details in every part of a real-world scene. Whereas in exposure fusion methods, normally the details in highlight and shadow regions are difficult to be preserved because of the limited dynamic range using a single shot. In CSMI, the contrast degree is evaluated by considering the difference between foreground and background using the logarithmic image processing operator, and the sharpness degree is evaluated by considering the boundaries between different zones using the wavelet decomposition. Therefore, the CSMI value is closely correlated to the HVS property which reflects people's perceptions.  Red numbers indicates the best entropy value of each row.
The second quality measure is the image entropy value, which can be expressed as: where i is the intensity levels of each color channel, P x ρ i is the probability of a pixel with the intensity i, and ρ indicates one of the RGB channels. Entropy is a no-reference image quality assessment scheme, and the degree of Entropy indicates the richness of information content shown in a fused image. Therefore, in some works such as [28] and [29], Entropy is adopted to represent the level of detail-preserving ability. Normally for the highlight region of an over-exposed image and the shadow region of an under-exposed image, the detailed information is almost lost, which leads to a low Entropy value. However, a successful exposure fusion method should be able to extract the fine details form several differently exposed images and to present sufficient and high-quality details in all regions of the output image. Table 3 lists the resulting entropy values of the four methods. As shown in the bottom row of Table 3, the average entropy values achieved by the four methods were, respectively, 7.4047 [5], 7.5391 [6], 7.4229 [28], 7.4140 [23], and 7.6088 (our proposed method). Although the proposed method does not obtain the highest entropy value in every test image sequence (e.g., the test images Masked The third quality measure is specifically designed for the exposure fusion methods which is called the multi-exposure fusion structural similarity (MEF-SSIM) index [40]. Different from the original SSIM index that requires only a single reference image, the MEF-SSIM index aims to evaluate the ability of preserving information from the multiple input images at each pixel position. Moreover, the contrast and structure components of local image patches were also analyzed and taken into account when formulating the MEF-SSIM index. Table 4 presents the results of the MEF-SSIM values of the four methods. Promisingly, the proposed method demonstrates the superior ability to maintain the perception-based structural similarity from the results shown in Table 4. Among the eight test images, the MEF-SSIM scores of our approach were all higher than 0.9 except for the image Studio (but in this image, our score was still the highest of the four methods). In addition, the proposed method outperformed other comparative methods in every test image sequence. The average MEF-SSIM values achieved by the four methods were respectively 0.8344 [5], 0.8914 [6], 0.8500 [28], 0.829 [23], and 0.9415 (proposed method). Table 4. Comparison of the five methods in terms of MEF-SSIM (multi-exposure fusion structural similarity) [40]. In addition, Tables 5 and 6 show the comparison results of two other objective metrics: a feature-enriched blind image quality evaluator called IL-NIQE [41] and a no-reference quality metric called NIQMC [42]. For the IL-NIQE metric, it is an opinion-unaware blind image quality assessment which is based on integrating several image statistics such as texture, color, and contrast. The IL-NIQE value reflects the naturalness of the fused image, and a lower IL-NIQE value indicates a more natural look. For the NIQMC metric, it is a no-reference and blind image quality assessment of contrast distortion, which is based on calculating the entropy of particular regions with maximum information. The NIQMC value reflects the contrast distortion of the fused image, and a higher NIQMC value indicates a more pleasing visual quality with better clarity. The average IL-NIQE values achieved by the four methods were 19.3959 [5], 18.7395 [6], 18.4621 [28], 19.6196 [23], and 17.8119 (proposed method). The average NIQMC values achieved by the four methods were, respectively, 4.9102 [5], 5.2867 [6], 5.0640 [28], 5.3400 [23], and 5.4606 (proposed method). As shown in Tables 5 and 6, due to the combination of MNCRF, fuzzy, and WGIF-based enhancement, this work achieved the best average scores in both IL-NIQE and NIQMC metrics. Red numbers indicate the best IL-NIQE value for each row. Furthermore, for the comparison of computational performance, the average processing times required to produce an image with a size of 870 × 578 were 7.1421 s [5], 1.9803 s [6], 5.8957 s [28], 1.0311 s [23], and 6.3402 s (proposed method). All methods were written in MATLAB and were implemented in the Windows 7 operating system with 3.2 GHz CPU. For the method in Reference [23], because it was a single-image enhancement method (we applied Reference [23]'s method in the normal-exposed image), it required the least processing time. For the proposed method, although combining the MNCRF model and the fuzzy-based weights initialization increased the computation cost, this work demonstrated superior image quality in the output fused images.

Visual Comparison and User Study Analysis
In addition, to employ the objective quality measures, Figures 5-7 provide the qualitative visual comparisons among the five methods. Putting the output fused images from different methods side by side allowed us to see the subtle but essential differences between our proposed strategy and the other exposure fusion methods.  [5]. (b) Results from using the method in Reference [6]. (c) Results from using the method in Reference [28]. (d) Results from using the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences. Figure 6 shows the exposure fusion results using the test image Masked Lady. For the result of the Reference [5] (Figure 6a) method, the overall brightness was not enough. For example, the  [5]. (b) Results from using the method in Reference [6]. (c) Results from using the method in Reference [28]. (d) Results from using the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences. images at the same time. For the result of the method in Reference [23] (Figure 6d), some white noiselike dots can be seen on the floor. The result of the proposed method (Figure 6e) outperformed the other methods in that not only were the relative contrast well preserved, but the global chrominance was pleasing and presented a more natural illumination of the real scene. Not accidentally, from the MEF-SSIM results shown in Table 4 (0.7878 in Figure 5a, 0.8628 in Figure 5b, 0.8467 in Figure 5c, 0.9245 in Figure 5d, and 0.9345 in Figure 5e), our method apparently overwhelming outperformed the others. Results from the method in Reference [5]. (b) Results from the method in Reference [6]. (c) Results from the method in Reference [28]. (d) Results from the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences. Figure 7 shows the exposure fusion results using the test image Laurentian Library. For the result of the method in Reference [5] (Figure 7a), the weighting process did not extract sufficient information from the normal-exposed image and the over-exposed image. Therefore, the highlight region such as the sky was not bright enough, and the details of the shadow region such as grass (the right enlarged  [5]. (b) Results from the method in Reference [6]. (c) Results from the method in Reference [28]. (d) Results from the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences.   To demonstrate the effectiveness of the proposed enhanced multiscale fusion, Figure 9 illustrates an example for visual comparison. There are two merits of the proposed enhanced multiscale fusion. First, in exposure fusion, the extracted details are required to be enhanced to increase detail clarity. Second, in many computational photography applications, it is usually desirable to freely manipulate the sharpness level of the details in the fused image. As depicted in the enlarged region of building in Figure 9b,c, the proposed enhanced multiscale fusion effectively improves the sharpness and preserves the structural edges. By integrating WGIF in the weight pyramid and using a controllable boosting coefficient shown in Equation (15), detail manipulation is achieved without visual artifacts. Figure 10 shows the results of the proposed method using the remaining test images. To enrich the (a) Results from the method in Reference [5]. (b) Results from the method in Reference [6]. (c) Results from the method in Reference [28]. (d) Results from the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences. Figure 5 shows the exposure fusion results using the test image Cottage. For the results for the Reference [5] (Figure 5a) method, the overall chrominance was somehow faded and lacked contrast. Moreover, the detailed textures, e.g., the details in the grass area were lost. This is consistent with the results shown in Table 2, where the CSMI value of this fused image (7.1133 in Figure 5a) was much lower than those of the other four images (9.3121 in Figure 5b (Figure 5b) method, although the dynamic contrast was stretched, the color vividness was lost during the fusion process. For the results of the Reference [28] (Figure 5c) method, the top-left corner of the fused image was apparently over-exposed without preserving the details. This was because, when calculating the pixel weights of each input image, the weights were determined only through analyzing each single image without considering the inter-image relationships among each other. In this example, comparison among the sky regions from the different methods underlines our strategy of integrating the MNCRF model with fuzzy logic. In the sky region of the proposed method (Figure 5e), high-luminance, middle-luminance, and low-luminance pixels all appeared with very smooth gradients, and the WGIF-based enhanced fusion preserved the details. Therefore, a visually pleasing HDR-like image was generated. Figure 6 shows the exposure fusion results using the test image Masked Lady. For the result of the Reference [5] (Figure 6a) method, the overall brightness was not enough. For example, the reflected light on the stone floor (the left enlarged image patch) was not as clear as the results shown in Figure 7c,d, and the texture of the wall (the center enlarged image patch) was vague. Similar phenomena occurred in the results of the Reference [6] method (Figure 6b). In both Figure 6a,b, the dynamic ranges of the fused images were not well stretched and were dim so that the details in the shadow regions of the scene were hardly preserved. For the result of the Reference [28] method (Figure 6c), the entire dynamic range was broadened through fusing the input images. For example, each window along the first-floor corridor can be seen. However, the overall chrominance was somehow greenish as shown in the clothes of the lady and the first-floor corridor. Moreover, the color of the lamp post (the right enlarged image patch) was unnatural. This reflects the difficulty of determining the appropriate pixel weights which can generate accurate colors and natural-looking images at the same time. For the result of the method in Reference [23] (Figure 6d), some white noise-like dots can be seen on the floor. The result of the proposed method (Figure 6e) outperformed the other methods in that not only were the relative contrast well preserved, but the global chrominance was pleasing and presented a more natural illumination of the real scene. Not accidentally, from the MEF-SSIM results shown in Table 4 (0.7878 in Figure 5a, 0.8628 in Figure 5b, 0.8467 in Figure 5c, 0.9245 in Figure 5d, and 0.9345 in Figure 5e), our method apparently overwhelming outperformed the others. Figure 7 shows the exposure fusion results using the test image Laurentian Library. For the result of the method in Reference [5] (Figure 7a), the weighting process did not extract sufficient information from the normal-exposed image and the over-exposed image. Therefore, the highlight region such as the sky was not bright enough, and the details of the shadow region such as grass (the right enlarged image patch) were sacrificed. For the result of the method in Reference [6] (Figure 7b), the overall luminance was brighter than the result of the method in Reference [3]; however, the contrast was not stretched and the details of the grass region were still unclear. For the result of the method in Reference [28] (Figure 7c), the pixels of the input over-exposed image seemed to dominate the final fused image. Therefore, the details of the sky region were lost, and the color gamut was not wide. Moreover, the boundary between the sky and the tower (the center enlarged image patch) was unnatural and not smooth. For the result of the method in Reference [23] (Figure 7d), although the details are enhanced, the output image still lacked detail information from other differently exposed images. Moreover, while the details were enhanced, the noise was also amplified which led to some artifacts of unnatural color gradients shown in the sky. For the result of our work (Figure 7e), because the enhanced multiscale fusion with region-selective sharpening was utilized, the details of both highlights (e.g., sky and tower) and shadow (e.g., grass) were well preserved. Simply determining pixel weights by analyzing each image separately (by the fuzzy logics) was not enough to generate a high-quality HDR image. Combining the MNCRF model and fuzzy logic can modify the weights significantly. Furthermore, applying WGIF in the multiscale fusion enhanced the details in the bright/dark regions while avoiding over-amplifying the noise. From the comparison results shown in Tables 2-4, in this test image, the proposed method completely outperformed the other four methods in terms of CSMI, entropy, and MEF-SSIM.
For the subjective evaluation, we invited 30 (15 male and 15 female) participants to conduct a visual quality test. The participants were asked to rate the visual pleasantness and the contrast/sharpness of each image. The visual pleasantness score indicates the participants' preference. The contrast/sharpness score indicates whether the output fused image preserved clear details and edge information but was not unnaturally sharp. The scores ranged from 1 to 7, where score 1 indicated "unsatisfactory" and score 7 indicated "excellent." Applying the MNCRF model to fine tune the weight maps enabled local color consistency and a wider range of color detail with more contrast because both intra-and inter-image information can be considered. Applying WGIF in the multiscale fusion ensures detail preservation while avoiding unpleasant noise. From the subjective user study results (summarized in Figure 8), the proposed method significantly outperformed the other four methods, especially in the aspect of visual pleasantness.
To demonstrate the effectiveness of the proposed enhanced multiscale fusion, Figure 9 illustrates an example for visual comparison. There are two merits of the proposed enhanced multiscale fusion. First, in exposure fusion, the extracted details are required to be enhanced to increase detail clarity. Second, in many computational photography applications, it is usually desirable to freely manipulate the sharpness level of the details in the fused image. As depicted in the enlarged region of building in Figure 9b,c, the proposed enhanced multiscale fusion effectively improves the sharpness and preserves the structural edges. By integrating WGIF in the weight pyramid and using a controllable boosting coefficient shown in Equation (15), detail manipulation is achieved without visual artifacts. Figure 10 shows the results of the proposed method using the remaining test images. To enrich the experimental results, we also tested the performance by fusing more than three images using the proposed method, as shown in Figure 11. For the case of fusing four images (Figure 11c), there were four initial weight maps generated by the fuzzy weighting process. Then, both the naïve weight matrix (the matrix B) and the MNCRF weight matrix (the matrix W) became N × 4 matrices, and the maximum-a-posteriori procedure in Reference [36] was still able to find the optimal W. For the case of fusing five images (Figure 11d), it is similar to the case of fusing four images. Results from the method in Reference [5]. (b) Results from the method in Reference [6]. (c) Results from the method in Reference [28]. (d) Results from the method in Reference [23]. (e) Results from the proposed method. The enlarged versions of the red rectangles are provided to illustrate the subtle differences. To demonstrate the effectiveness of the proposed enhanced multiscale fusion, Figure 9 illustrates an example for visual comparison. There are two merits of the proposed enhanced multiscale fusion. First, in exposure fusion, the extracted details are required to be enhanced to increase detail clarity. Second, in many computational photography applications, it is usually desirable to freely manipulate the sharpness level of the details in the fused image. As depicted in the enlarged region of building in Figure 9b,c, the proposed enhanced multiscale fusion effectively improves the sharpness and preserves the structural edges. By integrating WGIF in the weight pyramid and using a controllable boosting coefficient shown in Equation (15), detail manipulation is achieved without visual artifacts. Figure 10 shows the results of the proposed method using the remaining test images. To enrich the  experimental results, we also tested the performance by fusing more than three images using the proposed method, as shown in Figure 11. For the case of fusing four images (Figure 11c), there were four initial weight maps generated by the fuzzy weighting process. Then, both the naïve weight matrix (the matrix ) and the MNCRF weight matrix (the matrix ) became × 4 matrices, and the maximum-a-posteriori procedure in Reference [36] was still able to find the optimal . For the case of fusing five images (Figure 11d), it is similar to the case of fusing four images.   Figure 11. Results of the proposed method using different numbers of differently exposed images. (a) Sequence of differently exposed images (from Reference [23]). (b) Results of using the first three differently exposed images in (a). (c) Results of using the first four differently exposed images in (a).
(d) Result of using all the differently exposed images in (a).  Figure 11. Results of the proposed method using different numbers of differently exposed images. (a) Sequence of differently exposed images (from Reference [23]). (b) Results of using the first three differently exposed images in (a). (c) Results of using the first four differently exposed images in (a).
(d) Result of using all the differently exposed images in (a). Figure 11. Results of the proposed method using different numbers of differently exposed images.
(a) Sequence of differently exposed images (from Reference [23]). (b) Results of using the first three differently exposed images in (a). (c) Results of using the first four differently exposed images in (a).
(d) Result of using all the differently exposed images in (a).

Conclusions
In this paper, we present a novel exposure fusion method which integrates fuzzy logic and the MNCRF model to achieve an adaptive coarse-to-fine weight determination process. Determining optimal pixel weights from individual bracketed images is a primary challenge for exposure fusion. Obviously, the highlights in an over-exposed image tend to be blown out and almost white; conversely, the shadows in an under-exposed image tend to be flat and almost black. In both cases, the information on the detail and color is lost. However, simply determining pixel weights by analyzing each image separately is not enough to generate a high-quality HDR image. To address this difficulty, in addition to the coarse initial weighting conducted by applying fuzzy logic, this work incorporated the MNCRF model into the fine-tuning stage to take inter-image information into account. Moreover, a multiscale enhanced fusion scheme was proposed to blend images with edge-preserving and even edge-enhancing. Exposure fusion methods are essential to applications involving human-computer interaction and intelligent vision sensing, because, actually, the human visual system has a much wider dynamic range than a common optical sensor. The experimental results validated the superiority of the proposed method in terms of objective quality measures (CSMI, entropy, MEF-SSIM, IL-NIQE, and NIQMC) and subjective user evaluation, compared with the state-of-the-art methods. For future work, we plan to investigate the possibility of fusing large-exposure-ratio images using the proposed method (especially, if there are only two to-be-fused images.) Fusing large-exposure-ratio images is an interesting problem mentioned in Reference [43], because, in this case, the highlight regions in the under-exposed image might be darker than the shadow regions in the over-exposed images.