To evaluate the performance of the proposed method, it was compared with the four recent methods in Reference [
5] (2013), Reference [
6] (2017), Reference [
28] (2015), and Reference [
23] (2015). Eight test image sequences were selected from public databases [
37,
38], and each of them contained three exposure levels, as shown in
Figure 4. Quality measures are the objective tools which help us to quantitatively evaluate the performance among different methods. In this paper, we selected five image quality measures described as follows.
4.1. Comparison of the Objective Quality Measures
The first quality measure is the Contrast and Sharpness Measurement Index (CSMI) introduced in [
39]. The human visual system (HVS) captures wider dynamic range than a camera, which allows people to perceive details in every part of a real-world scene. Whereas in exposure fusion methods, normally the details in highlight and shadow regions are difficult to be preserved because of the limited dynamic range using a single shot. In CSMI, the contrast degree is evaluated by considering the difference between foreground and background using the logarithmic image processing operator, and the sharpness degree is evaluated by considering the boundaries between different zones using the wavelet decomposition. Therefore, the CSMI value is closely correlated to the HVS property which reflects people’s perceptions.
Table 2 lists the resulting CSMI values of the four methods. As shown in the bottom row of
Table 2, average CSMI values achieved by the five methods are respectively 5.3916 (method in Reference [
5]), 8.3436 (method in Reference [
6]), 8.2355 (method in Reference [
28]), 8.5081 (method in Reference [
23]), and 8.6860 (proposed method). Although the proposed method did not obtain the highest CSMI value in every test image sequence (e.g., the test images Mountains and Arno River), the comparison of the average CSMI values validated that the proposed method can effectively maintain the details’ sharpness and great contrast.
The second quality measure is the image entropy value, which can be expressed as:
where
is the intensity levels of each color channel,
is the probability of a pixel with the intensity
, and
indicates one of the RGB channels. Entropy is a no-reference image quality assessment scheme, and the degree of Entropy indicates the richness of information content shown in a fused image. Therefore, in some works such as [
28] and [
29], Entropy is adopted to represent the level of detail-preserving ability. Normally for the highlight region of an over-exposed image and the shadow region of an under-exposed image, the detailed information is almost lost, which leads to a low Entropy value. However, a successful exposure fusion method should be able to extract the fine details form several differently exposed images and to present sufficient and high-quality details in all regions of the output image.
Table 3 lists the resulting entropy values of the four methods. As shown in the bottom row of
Table 3, the average entropy values achieved by the four methods were, respectively, 7.4047 [
5], 7.5391 [
6], 7.4229 [
28], 7.4140 [
23], and 7.6088 (our proposed method). Although the proposed method does not obtain the highest entropy value in every test image sequence (e.g., the test images Masked Lady, Grand Canal, Mountains, Arno River, and Studio), the comparison of the average image entropy values demonstrated that our approach can preserve the details of a natural scene to the greatest extent.
The third quality measure is specifically designed for the exposure fusion methods which is called the multi-exposure fusion structural similarity (MEF-SSIM) index [
40]. Different from the original SSIM index that requires only a single reference image, the MEF-SSIM index aims to evaluate the ability of preserving information from the multiple input images at each pixel position. Moreover, the contrast and structure components of local image patches were also analyzed and taken into account when formulating the MEF-SSIM index.
Table 4 presents the results of the MEF-SSIM values of the four methods. Promisingly, the proposed method demonstrates the superior ability to maintain the perception-based structural similarity from the results shown in
Table 4. Among the eight test images, the MEF-SSIM scores of our approach were all higher than 0.9 except for the image
Studio (but in this image, our score was still the highest of the four methods). In addition, the proposed method outperformed other comparative methods in every test image sequence. The average MEF-SSIM values achieved by the four methods were respectively 0.8344 [
5], 0.8914 [
6], 0.8500 [
28], 0.829 [
23], and 0.9415 (proposed method).
In addition,
Table 5 and
Table 6 show the comparison results of two other objective metrics: a feature-enriched blind image quality evaluator called IL-NIQE [
41] and a no-reference quality metric called NIQMC [
42]. For the IL-NIQE metric, it is an opinion-unaware blind image quality assessment which is based on integrating several image statistics such as texture, color, and contrast. The IL-NIQE value reflects the naturalness of the fused image, and a lower IL-NIQE value indicates a more natural look. For the NIQMC metric, it is a no-reference and blind image quality assessment of contrast distortion, which is based on calculating the entropy of particular regions with maximum information. The NIQMC value reflects the contrast distortion of the fused image, and a higher NIQMC value indicates a more pleasing visual quality with better clarity. The average IL-NIQE values achieved by the four methods were 19.3959 [
5], 18.7395 [
6], 18.4621 [
28], 19.6196 [
23], and 17.8119 (proposed method). The average NIQMC values achieved by the four methods were, respectively, 4.9102 [
5], 5.2867 [
6], 5.0640 [
28], 5.3400 [
23], and 5.4606 (proposed method). As shown in
Table 5 and
Table 6, due to the combination of MNCRF, fuzzy, and WGIF-based enhancement, this work achieved the best average scores in both IL-NIQE and NIQMC metrics.
Furthermore, for the comparison of computational performance, the average processing times required to produce an image with a size of 870
578 were 7.1421 s [
5], 1.9803 s [
6], 5.8957 s [
28], 1.0311 s [
23], and 6.3402 s (proposed method). All methods were written in MATLAB and were implemented in the Windows 7 operating system with 3.2 GHz CPU. For the method in Reference [
23], because it was a single-image enhancement method (we applied Reference [
23]’s method in the normal-exposed image), it required the least processing time. For the proposed method, although combining the MNCRF model and the fuzzy-based weights initialization increased the computation cost, this work demonstrated superior image quality in the output fused images.
4.2. Visual Comparison and User Study Analysis
In addition, to employ the objective quality measures,
Figure 5,
Figure 6 and
Figure 7 provide the qualitative visual comparisons among the five methods. Putting the output fused images from different methods side by side allowed us to see the subtle but essential differences between our proposed strategy and the other exposure fusion methods.
Figure 5 shows the exposure fusion results using the test image Cottage. For the results for the Reference [
5] (
Figure 5a) method, the overall chrominance was somehow faded and lacked contrast. Moreover, the detailed textures, e.g., the details in the grass area were lost. This is consistent with the results shown in
Table 2, where the CSMI value of this fused image (7.1133 in
Figure 5a) was much lower than those of the other four images (9.3121 in
Figure 5b, 9.4548 in
Figure 5c, 9.2748 in
Figure 5d, and 9.4681 in
Figure 5e). For the result of the Reference [
6] (
Figure 5b) method, although the dynamic contrast was stretched, the color vividness was lost during the fusion process. For the results of the Reference [
28] (
Figure 5c) method, the top-left corner of the fused image was apparently over-exposed without preserving the details. This was because, when calculating the pixel weights of each input image, the weights were determined only through analyzing each single image without considering the inter-image relationships among each other. In this example, comparison among the sky regions from the different methods underlines our strategy of integrating the MNCRF model with fuzzy logic. In the sky region of the proposed method (
Figure 5e), high-luminance, middle-luminance, and low-luminance pixels all appeared with very smooth gradients, and the WGIF-based enhanced fusion preserved the details. Therefore, a visually pleasing HDR-like image was generated.
Figure 6 shows the exposure fusion results using the test image Masked Lady. For the result of the Reference [
5] (
Figure 6a) method, the overall brightness was not enough. For example, the reflected light on the stone floor (the left enlarged image patch) was not as clear as the results shown in
Figure 7c,d, and the texture of the wall (the center enlarged image patch) was vague. Similar phenomena occurred in the results of the Reference [
6] method (
Figure 6b). In both
Figure 6a,b, the dynamic ranges of the fused images were not well stretched and were dim so that the details in the shadow regions of the scene were hardly preserved. For the result of the Reference [
28] method (
Figure 6c), the entire dynamic range was broadened through fusing the input images. For example, each window along the first-floor corridor can be seen. However, the overall chrominance was somehow greenish as shown in the clothes of the lady and the first-floor corridor. Moreover, the color of the lamp post (the right enlarged image patch) was unnatural. This reflects the difficulty of determining the appropriate pixel weights which can generate accurate colors and natural-looking images at the same time. For the result of the method in Reference [
23] (
Figure 6d), some white noise-like dots can be seen on the floor. The result of the proposed method (
Figure 6e) outperformed the other methods in that not only were the relative contrast well preserved, but the global chrominance was pleasing and presented a more natural illumination of the real scene. Not accidentally, from the MEF-SSIM results shown in
Table 4 (0.7878 in
Figure 5a, 0.8628 in
Figure 5b, 0.8467 in
Figure 5c, 0.9245 in
Figure 5d, and 0.9345 in
Figure 5e), our method apparently overwhelming outperformed the others.
Figure 7 shows the exposure fusion results using the test image Laurentian Library. For the result of the method in Reference [
5] (
Figure 7a), the weighting process did not extract sufficient information from the normal-exposed image and the over-exposed image. Therefore, the highlight region such as the sky was not bright enough, and the details of the shadow region such as grass (the right enlarged image patch) were sacrificed. For the result of the method in Reference [
6] (
Figure 7b), the overall luminance was brighter than the result of the method in Reference [
3]; however, the contrast was not stretched and the details of the grass region were still unclear. For the result of the method in Reference [
28] (
Figure 7c), the pixels of the input over-exposed image seemed to dominate the final fused image. Therefore, the details of the sky region were lost, and the color gamut was not wide. Moreover, the boundary between the sky and the tower (the center enlarged image patch) was unnatural and not smooth. For the result of the method in Reference [
23] (
Figure 7d), although the details are enhanced, the output image still lacked detail information from other differently exposed images. Moreover, while the details were enhanced, the noise was also amplified which led to some artifacts of unnatural color gradients shown in the sky. For the result of our work (
Figure 7e), because the enhanced multiscale fusion with region-selective sharpening was utilized, the details of both highlights (e.g., sky and tower) and shadow (e.g., grass) were well preserved. Simply determining pixel weights by analyzing each image separately (by the fuzzy logics) was not enough to generate a high-quality HDR image. Combining the MNCRF model and fuzzy logic can modify the weights significantly. Furthermore, applying WGIF in the multiscale fusion enhanced the details in the bright/dark regions while avoiding over-amplifying the noise. From the comparison results shown in
Table 2,
Table 3 and
Table 4, in this test image, the proposed method completely outperformed the other four methods in terms of CSMI, entropy, and MEF-SSIM.
For the subjective evaluation, we invited 30 (15 male and 15 female) participants to conduct a visual quality test. The participants were asked to rate the visual pleasantness and the contrast/sharpness of each image. The visual pleasantness score indicates the participants’ preference. The contrast/sharpness score indicates whether the output fused image preserved clear details and edge information but was not unnaturally sharp. The scores ranged from 1 to 7, where score 1 indicated “unsatisfactory” and score 7 indicated “excellent.” Applying the MNCRF model to fine tune the weight maps enabled local color consistency and a wider range of color detail with more contrast because both intra- and inter-image information can be considered. Applying WGIF in the multiscale fusion ensures detail preservation while avoiding unpleasant noise. From the subjective user study results (summarized in
Figure 8), the proposed method significantly outperformed the other four methods, especially in the aspect of visual pleasantness.
To demonstrate the effectiveness of the proposed enhanced multiscale fusion,
Figure 9 illustrates an example for visual comparison. There are two merits of the proposed enhanced multiscale fusion. First, in exposure fusion, the extracted details are required to be enhanced to increase detail clarity. Second, in many computational photography applications, it is usually desirable to freely manipulate the sharpness level of the details in the fused image. As depicted in the enlarged region of building in
Figure 9b,c, the proposed enhanced multiscale fusion effectively improves the sharpness and preserves the structural edges. By integrating WGIF in the weight pyramid and using a controllable boosting coefficient shown in Equation (15), detail manipulation is achieved without visual artifacts.
Figure 10 shows the results of the proposed method using the remaining test images. To enrich the experimental results, we also tested the performance by fusing more than three images using the proposed method, as shown in
Figure 11. For the case of fusing four images (
Figure 11c), there were four initial weight maps generated by the fuzzy weighting process. Then, both the naïve weight matrix
and the MNCRF weight matrix
became
matrices, and the maximum-a-posteriori procedure in Reference [
36] was still able to find the optimal
. For the case of fusing five images (
Figure 11d), it is similar to the case of fusing four images.