Halo-Free Multi-Exposure Image Fusion Based on Sparse Representation of Gradient Features

: Due to sharp changes in local brightness in high dynamic range scenes, fused images obtained by the traditional multi-exposure fusion methods usually have an unnatural appearance resulting from halo artifacts. In this paper, we propose a halo-free multi-exposure fusion method based on sparse representation of gradient features for high dynamic range imaging. First, we analyze the cause of halo artifacts. Since the range of local brightness changes in high dynamic scenes may be far wider than the dynamic range of an ordinary camera, there are some invalid, large-amplitude gradients in the multi-exposure source images, so halo artifacts are produced in the fused image. Subsequently, by analyzing the signiﬁcance of the local sparse coefﬁcient in a luminance gradient map, we construct a local gradient sparse descriptor to extract local details of source images. Then, as an activity level measurement in the fusion method, the local gradient sparse descriptor is used to extract image features and remove halo artifacts when the source images have sharp local changes in brightness. Experimental results show that the proposed method obtains state-of-the-art performance in subjective and objective evaluation, particularly in terms of effectively eliminating halo artifacts.


Introduction
There is abundant information about brightness and color in real natural scenes, and the dynamic range is very wide.Therefore, high dynamic range (HDR) images, which can truly represent natural scenes, are widely used in digital photography, aerospace, satellite meteorology, medical treatment, and military industries [1][2][3].
The HDR imaging technology can be divided into two categories.The first is the hardware-based method [4,5], which utilizes HDR devices to capture real natural scenes.However, these devices are too expensive to be popular with the public.The second uses software-based technology to obtain HDR images [6].The notion of this category is that real natural scenes can be captured with a stack of low dynamic range (LDR) images with different exposures, and an HDR image can be obtained by using the camera response function [7,8].Then, tone mapping technology [9][10][11] is used to reduce the dynamic range of the HDR image for display on the common device.However, the quality of the reconstructed HDR image depends heavily on the related exposure settings (e.g., exposure time and exposure value), which are often unknown to the user.Therefore, software-based HDR imaging technology through multi-exposure fusion [12][13][14] has attracted increasing amounts of attention.This technology can directly obtain greater information and produce a more perceptually appealing composite LDR image, reflecting natural scenes more closely.Furthermore, the result of a multi-exposure fusion can be directly presented on the LDR displays.
In recent years, many scholars have performed in-depth studies on multi-exposure fusion.Mertens et al. [15] proposed a multi-exposure fusion method based on weight maps, the factors of which are comprised of three quality measures (i.e., contrast, saturation, and exposure quality at the pixel level).Moreover, the multi-resolution approach, based on Laplacian pyramids, was also applied to pursue better human visual perception.However, this method placed too much emphasis on contrast.Thus, the details of brighter and darker regions were lost in the fused image.Vonikakis et al. [16] evaluated the exposure quality of pixels in the source images via illumination estimation filtering and then realized multi-exposure image fusion.Zhang et al. [17] achieved the multi-exposure fused image with gradient information, which can effectively preserve texture details.Li et al. [18] decomposed a number of multi-exposure source images into a detailed layer, containing small-scale details, and a base layer, containing large-scale intensity variations.Then, they used spatial consistency to fuse the basic and detailed layers.Shen et al. [19] proposed a multi-exposure fusion method, based on an improved pyramid, to preserve detail information.Liu et al. [12] introduced a dense scale invariant feature transform (SIFT) descriptor into image fusion to extract local details from multi-exposure source images.Ma et al. [13] proposed a multi-exposure fusion method, based on an effective structural path decomposition approach, which took account of the local and global characteristics of the source images and obtained better fused image quality.Prabhakar et al. [20] proposed an unsupervised deep learning framework for multi-exposure fusion (MEF), utilizing a no-reference quality metric as the loss function, which obtained excellent fusion results.On the basis of Ma et al. [13], Ma et al. [14] then proposed a multi-exposure fusion method by optimizing the color multi-exposure fusion structural similarity (MEF-SSIMc) index.However, the above fused methods did not consider that there were halo artifacts in the multi-exposure fused image.These halo artifacts always occur in the region of sharp local brightness changes, severely reducing the perceptual appeal.
Additionally, the theory of sparse representation shows that the linear combination of a "few" atoms from an over-complete dictionary can effectively describe the salient features of objects [21][22][23].Thus, it has been widely used for image processing.Yang et al. [24] introduced sparse representation into image fusion.With their method, source images were divided into small patches and transformed into vectors via lexicographic ordering.They then realized sparse representation using an over-complete dictionary.Liu et al. [25] combined multi-scale transformation and sparse representation to constitute a general image fusion framework, which presented more details in the fused images.However, these works mainly focused on multi-focus and multi-modal fusion, and because they did not consider the characteristics of multi-exposure source images, they are still not appropriate for multi-exposure image fusion.
In this paper, we propose a halo-free multi-exposure fusion method based on sparse representation of gradient features.In our method, the local gradient sparse descriptor, obtained from a gradient map through sparse representation, is used for both local detail extraction and halo artifact removal.The main contributions of this paper can be summarized in the following three points: (1) By analyzing characteristics of the luminance gradient in multi-exposure source images, the cause of halo artifacts in the fused image is discussed.Because the local brightness changes in HDR scenes are violent, they can far exceed the dynamic range of a single photograph.Thus, the halo artifacts in the fused image are generated by the invalid large-amplitude gradients in the source images.(2) By calculating the l 1 -norm of the sparse coefficients in the luminance gradient map, we obtain the local gradient sparse descriptor as the feature of the images.This descriptor can effectively extract local details of source images.
(3) A new multi-exposure fusion method based on the local gradient sparse descriptor is proposed, which can achieve a halo-free fused image by restraining the invalid large-amplitude gradients.Meanwhile, the fused image of the proposed method is not recovered by the over-complete dictionary, which effectively improves the fusion efficiency.
The remainder of this paper is organized as follows: In Section 2, we analyze the cause of halo artifacts in the multi-exposure fused images; Section 3 presents the proposed halo-free fusion method; the experimental results and discussions are described in Section 4; and finally, Section 5 concludes the paper.

Cause of Halo Artifacts in Fused Images
In HDR natural scenes, there is not only abundant information about brightness and color, but also plenty of sharp local brightness changes.In the process of multi-exposure fusion, if those sharp brightness changes cannot be fully considered, the halo artifacts will appear in the fused images.As seen in Figure 1a, which shows the fused result of Li et al. [18], halo artifacts have occurred in the sky outside the windows.By contrast, the sharp local brightness changes are fully considered in the proposed method.Thus, the fused image is more consistent with human visual perception through the removal of halo artifacts, as shown in Figure 1b.
Appl.Sci.2018, 8, x FOR PEER REVIEW 3 of 18 dictionary, which effectively improves the fusion efficiency.
The remainder of this paper is organized as follows: In Section 2, we analyze the cause of halo artifacts in the multi-exposure fused images; Section 3 presents the proposed halo-free fusion method; the experimental results and discussions are described in Section 4; and finally, Section 5 concludes the paper.

Cause of Halo Artifacts in Fused Images
In HDR natural scenes, there is not only abundant information about brightness and color, but also plenty of sharp local brightness changes.In the process of multi-exposure fusion, if those sharp brightness changes cannot be fully considered, the halo artifacts will appear in the fused images.As seen in Figure 1a, which shows the fused result of Li et al. [18], halo artifacts have occurred in the sky outside the windows.By contrast, the sharp local brightness changes are fully considered in the proposed method.Thus, the fused image is more consistent with human visual perception through the removal of halo artifacts, as shown in Figure 1b.The cause of these halo artifacts are revealed in Figure 2. In the multi-exposure source images of Figure 2a-c, the exposure of the scenery outside the window is increasing until over-exposure, while the brightness contrast increases step-by-step between indoors and outdoors.In the local luminance gradient amplitude of Figure 2k-o, the maximal gradient is found in Figure 2m.However, the region around the maximal gradient in the corresponding source image in Figure 2c is over-exposed and unsuitable for the fused image.However, in the same position of Figure 2a,b, the well-exposed region can be found, which are more suitable for the fused image than that of Figure 2c.Therefore, the maximal gradient existing in Figure 2m is an invalid gradient and the primary cause of halo artifacts in the fused image.The reason for this phenomenon is that a single photograph has very limited dynamic range, compared to real HDR scenes.Thus, the invalid large-amplitude gradients occur in the multi-exposure source images.If these invalid gradients cannot be suppressed, halo artifacts are inevitable in the fused image.
In most multi-exposure image fusion methods, the image features, such as contrast and saliency, are used as weight coefficients for image fusion, except for the gradient.However, these image features have more or less a direct connection with the gradient, which leads to halo artifacts of varying degrees in the fusion results.In the method by Mertens et al. [15], a Laplacian filter was applied to the grayscale version of each multi-exposure image to yield a simple indicator, C, for contrast.This indicator C was then used as one of the weight coefficients for image fusion.Generally, pixels with larger contrast always have larger gradients, so the pixels with invalid large gradients inevitably take part in the image fusion process with larger weight, generating the halo artifacts.Like Mertens et al.'s method [15], the Laplacian filter was implemented to each source image in the Li method [18] to obtain a high-pass image.Then, the Gaussian filter was used in the high-pass image The cause of these halo artifacts are revealed in Figure 2. In the multi-exposure source images of Figure 2a-c, the exposure of the scenery outside the window is increasing until over-exposure, while the brightness contrast increases step-by-step between indoors and outdoors.In the local luminance gradient amplitude of Figure 2k-o, the maximal gradient is found in Figure 2m.However, the region around the maximal gradient in the corresponding source image in Figure 2c is over-exposed and unsuitable for the fused image.However, in the same position of Figure 2a,b, the well-exposed region can be found, which are more suitable for the fused image than that of Figure 2c.Therefore, the maximal gradient existing in Figure 2m is an invalid gradient and the primary cause of halo artifacts in the fused image.The reason for this phenomenon is that a single photograph has very limited dynamic range, compared to real HDR scenes.Thus, the invalid large-amplitude gradients occur in the multi-exposure source images.If these invalid gradients cannot be suppressed, halo artifacts are inevitable in the fused image.
In most multi-exposure image fusion methods, the image features, such as contrast and saliency, are used as weight coefficients for image fusion, except for the gradient.However, these image features have more or less a direct connection with the gradient, which leads to halo artifacts of varying degrees in the fusion results.In the method by Mertens et al. [15], a Laplacian filter was applied to the grayscale version of each multi-exposure image to yield a simple indicator, C, for contrast.This indicator C was then used as one of the weight coefficients for image fusion.Generally, pixels with larger contrast always have larger gradients, so the pixels with invalid large gradients inevitably take part in the image fusion process with larger weight, generating the halo artifacts.Like Mertens et al.'s method [15], the Laplacian filter was implemented to each source image in the Li method [18] to obtain a high-pass image.Then, the Gaussian filter was used in the high-pass image to obtain the saliency map of the source image, which was used as the winner-take-all weight coefficient for image fusion.It can be noted that pixels with larger gradients always have greater saliency, so the pixels with invalid large gradients are also involved in image fusion with larger weighting factors.In the Ma method [13,14], signal contrast was used as one of benchmarks for image fusion, and the desired contrast of the fused image patch was determined by the highest contrast of all the source image patches, which would lead to the appearance of halo artifacts in the fused images.coefficient for image fusion.It can be noted that pixels with larger gradients always have greater saliency, so the pixels with invalid large gradients are also involved in image fusion with larger weighting factors.In the Ma method [13,14], signal contrast was used as one of benchmarks for image fusion, and the desired contrast of the fused image patch was determined by the highest contrast of all the source image patches, which would lead to the appearance of halo artifacts in the fused images.

Multi-Exposure Fusion via Sparse Representation of Gradient Features
Through further analysis of the local luminance gradient in Figure 2k-o, it can be seen that there is a maximal gradient existing in Figure 2m, whereas the gradients around this maximal gradient are almost at a zero level and are not changed.However, in Figure 2k,l, there are discernible gradient changes in those regions and most of the values are not zero.Considering the salience extraction capability of sparse representation, especially for a subtle feature, the local gradient sparse descriptor is proposed in our method, which can not only extract the local details, but can also effectively constrain the invalid gradients of large amplitude.
The schematic diagram of the proposed multi-exposure image fusion method, based on a sparse representation of gradient features, is shown in Figure 3.There are three steps in the proposed method: Image feature extraction, weight map calculation and weight term-based fusion.These image features are composed of a local gradient sparse descriptor and exposure quality.

Multi-Exposure Fusion via Sparse Representation of Gradient Features
Through further analysis of the local luminance gradient in Figure 2k-o, it can be seen that there is a maximal gradient existing in Figure 2m, whereas the gradients around this maximal gradient are almost at a zero level and are not changed.However, in Figure 2k,l, there are discernible gradient changes in those regions and most of the values are not zero.Considering the salience extraction capability of sparse representation, especially for a subtle feature, the local gradient sparse descriptor is proposed in our method, which can not only extract the local details, but can also effectively constrain the invalid gradients of large amplitude.
The schematic diagram of the proposed multi-exposure image fusion method, based on a sparse representation of gradient features, is shown in Figure 3.There are three steps in the proposed method: Image feature extraction, weight map calculation and weight term-based fusion.These image features are composed of a local gradient sparse descriptor and exposure quality.
is proposed in our method, which can not only extract the local details, but can also effectively constrain the invalid gradients of large amplitude.
The schematic diagram of the proposed multi-exposure image fusion method, based on a sparse representation of gradient features, is shown in Figure 3.There are three steps in the proposed method: Image feature extraction, weight map calculation and weight term-based fusion.These image features are composed of a local gradient sparse descriptor and exposure quality.

Local Gradient Sparse Descriptor Extraction
The local gradient sparse descriptor is obtained by the sparse representation model.Since the digital signal can be represented with a linear combination of a "few" atoms from an over-complete dictionary [26,27], the inherent feature of the signal can be extracted through the sparse coefficient combination of the corresponding atoms.Mathematically, the digital signal y can be presented by the following model: where is an atom of the over-complete dictionary, and indicates the sparse coefficient vector, which has a value that can be obtained by solving the following optimization problem: min a a 0 subject to y = DA, where • 0 denotes the l 0 -norm: The number of nonzero components.Generally, there is noise in the original digital signal.Thus, the sparse representation model, including noise, can be expressed as where ε > 0 is the error tolerance.
In the proposed method, the process of sparse representation is realized in two steps, namely dictionary learning and feature extraction.

Dictionary Learning in the Gradient Domain
During the process of sparse representation, it is meaningful to obtain a suitable over-complete dictionary [28].Therefore, such a dictionary in the gradient domain should be obtained.
The dictionary learns from a large number of example image patches, using a certain training algorithm, such as K-means or K-SVD [29], and has better representative ability.Firstly, the luminance gradient map of the training image, I, is calculated: where L is the operation for obtaining image luminance, and ∇ is the operation for obtaining the image gradient.
Then, the K-SVD algorithm is used to obtain the dictionary of gradient domain from the clear images of indoors and outdoors.min The dictionary learning framework in the gradient domain is shown in Figure 4. First, lots of luminance gradient maps are obtained from a large number of clear images, including indoors and outdoors.To construct the training set for dictionary learning from those luminance gradient maps, numerous image patches of size n × n are randomly sampled in the same manner as Reference [30], but the patch size is odd, n = 2r + 1.Then, all of image patches are transformed into vectors via column by column extraction.Finally, the over-completed dictionary of the gradient domain can be obtained by the K-SVD algorithm.

Local Gradient Sparse Descriptor Extraction
During the process of local gradient sparse descriptor extraction, luminance gradient maps of the multi-exposure source images are obtained by Equation ( 4), as shown in Figure 5a,b.Then, the luminance gradient map, SL∇, is divided into overlapping image patches, as in Yang's method [24].However, the patch size is odd, n = 2r + 1, and as big as the patch size used in the dictionary learning.The next step is to shape each image patch into a vector through column-by-column extraction, as shown in Figure 5c,d.It is noteworthy that the small red piece in this figure represents the center of the image patch.Finally, each vector is now represented by a gradient domain from the overcomplete dictionary obtained in the previous section, as shown in Figure 5d, wherein the black piece represents the nonzero sparse coefficient.Thus, the image patches can be expressed as: (2) where denotes the matrix that is lexicographically ordered from the luminance gradient maps, is the sparse coefficient of each image patch, J is the number of image patches, and ε > 0 is the error tolerance.If , and , which is a sparse matrix, Equation ( 6) can be expressed as: Then, the gradient sparse descriptor of the central pixel in the image patch, Ai(x, y), can be calculated as 1 ( , ) where || ||  denotes the l1-norm, j i s is the jth sparse matrix of the ith source image.
Generally, the larger the gradient value of the corresponding patch in multi-exposure source images, the better the exposure result of these regions, and the gradient sparse descriptor Ai(x, y) is

Local Gradient Sparse Descriptor Extraction
During the process of local gradient sparse descriptor extraction, luminance gradient maps of the multi-exposure source images are obtained by Equation ( 4), as shown in Figure 5a,b.Then, the luminance gradient map, S L∇ , is divided into overlapping image patches, as in Yang's method [24].However, the patch size is odd, n = 2r + 1, and as big as the patch size used in the dictionary learning.The next step is to shape each image patch into a vector through column-by-column extraction, as shown in Figure 5c,d.It is noteworthy that the small red piece in this figure represents the center of the image patch.Finally, each vector is now represented by a gradient domain from the over-complete dictionary obtained in the previous section, as shown in Figure 5d, wherein the black piece represents the nonzero sparse coefficient.Thus, the image patches can be expressed as: where V L∇ ∈ R n 2 ×J denotes the matrix that is lexicographically ordered from the luminance gradient maps, T is the sparse coefficient of each image patch, J is the number of image patches, and ε > 0 is the error T , which is a sparse matrix, Equation ( 6) can be expressed as: Then, the gradient sparse descriptor of the central pixel in the image patch, A i (x, y), can be calculated as Appl.Sci.2018, 8, 1543 where • 1 denotes the l 1 -norm, s j i is the jth sparse matrix of the ith source image.Generally, the larger the gradient value of the corresponding patch in multi-exposure source images, the better the exposure result of these regions, and the gradient sparse descriptor A i (x, y) is also larger.However, in the HDR scenes, the invalid larger amplitude gradient may be found in a particular exposure source image, which does not represent a good exposure result.Fortunately, the gradients in the adjacent region of these invalid gradients are almost zero.So, the local gradient sparse descriptor A i (x, y) of these regions is much smaller than the corresponding well-exposed image regions in other source images.Therefore, A i (x,y) can effectively suppress the invalid large-amplitude gradients and remove halo artifacts from the fused image.

Exposure Quality Extraction
In the multi-exposure source images, there are more local details in the well-exposed regions.In the over-exposed or under-exposed regions, the luminance values are too extreme to present valid information.Therefore, the pixel luminance value is used to measure the local exposure quality.First, the luminance value of the images is normalized.Then, considering that there are more noises in the region under low illumination conditions, the exposure quality of pixels is calculated to obtain the exposure quality map: where (0 0.45) is the luminance map of the source image, and Bi(x, y) = 1 denotes that the related pixel is in the well-exposed region.Thus, the influence of over-exposed or under-exposed regions to the fused image can be effectively removed.

Initial Weight Map Estimation
After obtaining the local gradient sparse descriptor and exposure quality feature, the initial weight maps can be estimated by multiplying these two terms one-by-one: Then, the weight maps, ( , ) i T x y are normalized: where τ denotes a small positive number (e.g., 10 −25 ), which is used to avoid singularities in the weight value ˆ( , ) i T x y .

Final Weight Map Estimation and Fusion
Since the noises in the initial weight maps have greater influence on the fused image, the maps must be refined further.Considering the availability and real-time capability, the recursive filter [31], which can perform high-quality edge-preserving filtering of images in real time, is used to refine the initial weight maps:

Exposure Quality Extraction
In the multi-exposure source images, there are more local details in the well-exposed regions.In the over-exposed or under-exposed regions, the luminance values are too extreme to present valid information.Therefore, the pixel luminance value is used to measure the local exposure quality.First, the luminance value of the images is normalized.Then, considering that there are more noises in the region under low illumination conditions, the exposure quality of pixels is calculated to obtain the exposure quality map: where η(0 ≤ η ≤ 0.45) and α(0 ≤ α ≤ 0.5 − η) are the thresholds, I i (x, y) is the luminance map of the source image, and B i (x, y) = 1 denotes that the related pixel is in the well-exposed region.Thus, the influence of over-exposed or under-exposed regions to the fused image can be effectively removed.

Initial Weight Map Estimation
After obtaining the local gradient sparse descriptor and exposure quality feature, the initial weight maps can be estimated by multiplying these two terms one-by-one: Then, the weight maps, T i (x, y) are normalized: where τ denotes a small positive number (e.g., 10 −25 ), which is used to avoid singularities in the weight value Ti (x, y).

Final Weight Map Estimation and Fusion
Since the noises in the initial weight maps have greater influence on the fused image, the maps must be refined further.Considering the availability and real-time capability, the recursive filter [31], which can perform high-quality edge-preserving filtering of images in real time, is used to refine the initial weight maps: Ŵi (x, y) = RF(W i (x, y), I i (x, y)), where RF(•) presents the recursive filter operation, and W i denotes the initial weight map, and I i are the multi-exposure source images which can intensify the edge information during filtering.
After the final weight map has been normalized, the fused image, I F , can be obtained by: and

Experiment and Analyses
To compare the performance of the proposed multi-exposure fused method, 12 multi-exposure source image sequences, shot in different scenes with rich details and high brightness contrast, were used.Figure 6 shows two images of different exposures in each image sequence, as used in Ma et al.'s method [14], Merten's method [15], and other databases.Additionally, more details of each multi-exposure source image sequence are described in Figure 6, including name, spatial resolution and number of source images.The experiment was run on a PC platform with an Intel ® Core™ i5 processor with 4-GB memory.

Experiment and Analyses
To compare the performance of the proposed multi-exposure fused method, 12 multi-exposure source image sequences, shot in different scenes with rich details and high brightness contrast, were used.Figure 6 shows two images of different exposures in each image sequence, as used in Ma et al.'s method [14], Merten's method [15], and other databases.Additionally, more details of each multiexposure source image sequence are described in Figure 6, including name, spatial resolution and number of source images.The experiment was run on a PC platform with an Intel ® Core™ i5 processor with 4-GB memory.In all of the following experiments, unless otherwise indicated, the parameters of the proposed method were as follows: the patch size, n, was fixed to 9 × 9; the global error was ε = 0.1; the threshold of exposure quality, η , was set to 0.15; another threshold of exposure quality, α, was set to 0.05; and the parameter of the recursive filter was set to the optimal value, as in Liu's method [12].
Five representative multi-exposure fused methods (i.e., Liu15 method [12], Ma18 method [14], In all of the following experiments, unless otherwise indicated, the parameters of the proposed method were as follows: the patch size, n, was fixed to 9 × 9; the global error was ε = 0.1; the threshold of exposure quality, η, was set to 0.15; another threshold of exposure quality, α, was set to 0.05; and the parameter of the recursive filter was set to the optimal value, as in Liu's method [12]. Five representative multi-exposure fused methods (i.e., Liu15 method [12], Ma18 method [14], Mertens10 method [15], Vonikakis11 method [16], Li13 method [18]) were employed to make comparisons via subjective and objective evaluation methods.The code of these fusion methods is available on the corresponding websites [32][33][34][35][36], and default parameters were used in the comparisons.

Subjective Evaluation
The subjective evaluation of the multi-exposure fusion methods is shown in Figures 7-10.In Figure 7a, the results of Mertens10 showed nice global contrast and color information, but the details in the brighter regions are lost in the clouds around the sun.Halo artifacts are produced at the boundary between the balloon and the blue sky.Vonikakis11 can preserve the proper details in the brighter regions, but the details of the darker regions are lost at the lawn in the bottom of the balloon in Figure 7b, and halo artifacts are present.The best global contrast can be obtained with Li13 and Ma18, but the halo artifacts in those methods are more prominent.There are almost no halo artifacts in Liu15, but shadow can be observed around the sun.The fused image of the proposed method presented better visual perception and preserved the proper global contrast and color information (e.g., clouds around the sun and the lawn below the balloon), while simultaneously removing the halo artifacts.

Subjective Evaluation
The subjective evaluation of the multi-exposure fusion methods is shown in Figures 7-10.In Figure 7a, the results of Mertens10 showed nice global contrast and color information, but the details in the brighter regions are lost in the clouds around the sun.Halo artifacts are produced at the boundary between the balloon and the blue sky.Vonikakis11 can preserve the proper details in the brighter regions, but the details of the darker regions are lost at the lawn in the bottom of the balloon in Figure 7b, and halo artifacts are present.The best global contrast can be obtained with Li13 and Ma18, but the halo artifacts in those methods are more prominent.There are almost no halo artifacts in Liu15, but shadow can be observed around the sun.The fused image of the proposed method presented better visual perception and preserved the proper global contrast and color information (e.g., clouds around the sun and the lawn below the balloon), while simultaneously removing the halo artifacts.Figure 8 gives the results of an indoor scene.The results of Mertens10 showed nice global contrast, but the details, such as colors in the brighter regions, are lost (e.g., the details of the loudspeaker box on the left side).There are halo artifacts on the right side of the flowerpot.Additionally, the color is distorted, as shown in Figure 8e.Vonikakis11 also preserved the rich details in the brighter regions, but lost details in the darker regions (e.g., details of the outdoor scene in Figure 8b).Halo artifacts are also observed on the right side of the flowerpot.Halo artifacts in Li13, Liu15 and Ma18 are evident too.The proposed method had better global contrast and color information (e.g., vivid color in the loudspeaker box in Figure 8f), and overcame the halo artifacts present in the images produced via other methods.Figure 8 gives the results of an indoor scene.The results of Mertens10 showed nice global contrast, but the details, such as colors in the brighter regions, are lost (e.g., the details of the loudspeaker box on the left side).There are halo artifacts on the right side of the flowerpot.Additionally, the color is distorted, as shown in Figure 8e.Vonikakis11 also preserved the rich details in the brighter regions, but lost details in the darker regions (e.g., details of the outdoor scene in Figure 8b).Halo artifacts are also observed on the right side of the flowerpot.Halo artifacts in Li13, Liu15 and Ma18 are evident too.The proposed method had better global contrast and color information (e.g., vivid color in the loudspeaker box in Figure 8f), and overcame the halo artifacts present in the images produced via other methods.
The dynamic range of the scene in Figure 9 is not as wide as those in Figures 7 and 8.However, it can also be seen that, in the Vonikakis11, Li13 and Ma18 method images, halo artifacts are observed near the doorframes in the middle of the fused images.There is also nice global contrast in the Mertens10 image, but the details, such as the background in the brighter regions, are lost (e.g., the details around the tree in the middle of the fused images).The results of Liu15 preserve the proper details in the brighter regions, but its global contrast is very limited.By contrast, the halo artifacts are removed with the proposed method, and proper global contrast and better visual perception are achieved.As can be clearly seen in Figure 10, there are halo artifacts on the right side of the palace and around the head of the performer, except with Liu15 and the proposed method.Although the halo artifacts are removed by Liu15, the color in the sky is distorted and became pale.However, the proposed method not only maintained better color information of the sky, but also removed the halo artifacts in the fused image.
The dynamic range of the scene in Figure 9 is not as wide as those in Figures 7 and 8.However, it can also be seen that, in the Vonikakis11, Li13 and Ma18 method images, halo artifacts are observed near the doorframes in the middle of the fused images.There is also nice global contrast in the Mertens10 image, but the details, such as the background in the brighter regions, are lost (e.g., the details around the tree in the middle of the fused images).The results of Liu15 preserve the proper details in the brighter regions, but its global contrast is very limited.By contrast, the halo artifacts are removed with the proposed method, and proper global contrast and better visual perception are achieved.
As can be clearly seen in Figure 10, there are halo artifacts on the right side of the palace and around the head of the performer, except with Liu15 and the proposed method.Although the halo artifacts are removed by Liu15, the color in the sky is distorted and became pale.However, the proposed method not only maintained better color information of the sky, but also removed the halo artifacts in the fused image.
Except for the four multi-exposure fused images in Figures 7-10, the fused images respect to the other eight multi-exposure source images in Figure 6, obtained with the proposed method, are shown in Figure 11 owing to space limitations.

Objective Evaluation
In this subsection, two objective metrics are applied to evaluate the performance of the different fused methods: Q AB/F [37], MEF-SSIMc [38], and the distortion identification-based image verity and integrity evaluation (DIIVINE) [39], which is a no-reference image evaluation metric.

Objective Evaluation
In this subsection, two objective metrics are applied to evaluate the performance of the different fused methods: Q AB/F [37], MEF-SSIMc [38], and the distortion identification-based image verity and integrity evaluation (DIIVINE) [39], which is a no-reference image evaluation metric.

Evaluation Using
The fusion metric, Q AB/F , which is widely used in the fused image evaluation [12,24], can be extended to evaluate the image fused with three or more source images.This metric is mainly used to analyze the edge information of the fused images.The more edge information remaining, the larger the Q AB/F value is.The extended version of Q AB/F for multi-exposure fusion is defined as: where I i (x, y) denotes the fused image to be evaluated, a (x, y) are the edge intensity and orientation preservation values of the pixel at position (x, y), respectively, and w I i (x, y) is the gradient magnitude of I i (x, y), which is used as the weighting factor of Q I i I F (x, y).
The performance of the different methods on the 12 image sequences using Q AB/F is listed in Table 1, in which the top two values are shown in bold.It can be seen from Table 1 that, in the fusion metric, Q AB/F , the performance of the proposed method and Liu15 was obviously superior to the others.with η and α, the value of Q AB/F and MEF-SSIMc was worse if η and α were too large or too small, and the value of DIIVINE showed only a slight concave change.
As shown in Figure 12, with the increase of n, the value of Q AB/F and MEF-SSIMc increased slowly, and the value of DIIVINE showed almost no fluctuation.Furthermore, with the decrease of ε, the value of Q AB/F and MEF-SSIMc changed a little, while the value of DIIVINE has improved.When changing with η and α, the value of Q AB/F and MEF-SSIMc was worse if η and α were too large or too small, and the value of DIIVINE showed only a slight concave change.
To better understand the proposed method, a set of comparative experiments were carried out to analyze the effect of the recursive filter in the fusion process.The experimental results show that the recursive filter can effectively remove the noise of the initial weight map and optimize the fusion result, as shown in Figure 13.To better understand the proposed method, a set of comparative experiments were carried out to analyze the effect of the recursive filter in the fusion process.The experimental results show that the recursive filter can effectively remove the noise of the initial weight map and optimize the fusion result, as shown in Figure 13.

Computational Efficiency
In this subsection, the computational efficiency of the above six fusion methods, that is, the execution time of the corresponding methods in the computer, is compared, as listed in Table 4.Because of the high computational complexity of the sparse representation model, the proposed method spent most of its time on calculating the local gradient sparse descriptor of the source images.However, in the above six fusion methods, the proposed method had great efficiency improvement, compared to Ma18.

Computational Efficiency
In this subsection, the computational efficiency of the above six fusion methods, that is, the execution time of the corresponding methods in the computer, is compared, as listed in Table 4.Because of the high computational complexity of the sparse representation model, the proposed method spent most of its time on calculating the local gradient sparse descriptor of the source images.However, in the above six fusion methods, the proposed method had great efficiency improvement, compared to Ma18.

Conclusions
In this paper, we first analyzed the cause of halo artifacts in multi-exposure fused images.Then, we proposed a new multi-exposure fusion method based on sparse representation of gradient features.The sparse representation model was used to obtain the local gradient sparse descriptor, which can not only remove the halo artifacts, but can also extract image details effectively.The experiment was assessed through both a subjective and objective evaluation of 12 multi-exposure image sequences.Experimental results show that the proposed method was able to remove the halo artifacts and obtain a higher quality fused image, presenting a more vivid natural scene.In the future, we plan to assess whether the proposed method could be applied to multi-exposure image fusion in dynamic scenes.

Figure 1 .
Figure 1.An example of multi-exposure image fusion.(a) The fused image by Li et al. [18], and (b) the fused image by the proposed method.

Figure 1 .
Figure 1.An example of multi-exposure image fusion.(a) The fused image by Li et al. [18], and (b) the fused image by the proposed method.

Figure 2 .
Figure 2. Local luminance gradient comparison of multi-exposure source images: (a-e) multiexposure source images; (f-j) luminance gradient maps; and (k-o) luminance gradient amplitude of the red lines in (f-j).

Figure 2 .
Figure 2. Local luminance gradient comparison of multi-exposure source images: (a-e) multi-exposure source images; (f-j) luminance gradient maps; and (k-o) luminance gradient amplitude of the red lines in (f-j).

Figure 3 .
Figure 3. Schematic diagram of the proposed multi-exposure.Figure 3. Schematic diagram of the proposed multi-exposure.

Figure 3 .
Figure 3. Schematic diagram of the proposed multi-exposure.Figure 3. Schematic diagram of the proposed multi-exposure.

Figure 4 .
Figure 4. Dictionary learning in the gradient domain: (a) Clear images of indoors and outdoors; (b) Luminance gradient map; (c) Lexicographical vector; (d) Over-complete dictionary.

Figure 4 .
Figure 4. Dictionary learning in the gradient domain: (a) Clear images of indoors and outdoors; (b) Luminance gradient map; (c) Lexicographical vector; (d) Over-complete dictionary.

Figure 13 .
Figure 13.Performance comparison of fusion results between without recursive filter and with recursive filter.(a-c) multi-exposure source images; (d-f) weight map without recursive filter; (g-i) weight map with recursive filter; (j) fused image without recursive filter; and (k) fused image with recursive filter.

Figure 13 .
Figure 13.Performance comparison of fusion results between without recursive filter and with recursive filter.(a-c) multi-exposure source images; (d-f) weight map without recursive filter; (g-i) weight map with recursive filter; (j) fused image without recursive filter; and (k) fused image with recursive filter.

Table 3 .
Performance of different methods on the 12 image sequences, using the distortion identificationbased image verity and integrity evaluation (DIIVINE) [39].Top two values are shown in bold.

Table 4 .
Computational efficiency of different fused methods on the 12 image sequences (s).