Human Visual Perception-Based Multi-Exposure Fusion Image Quality Assessment

: Compared with ordinary single exposure images, multi-exposure fusion (MEF) images are prone to color imbalance, detail information loss and abnormal exposure in the process of combining multiple images with di ﬀ erent exposure levels. In this paper, we proposed a human visual perception-based multi-exposure fusion image quality assessment method by considering the related perceptual features (i.e., color, dense scale invariant feature transform (DSIFT) and exposure) to measure the quality degradation accurately, which is closely related to the symmetry principle in human eyes. Firstly, the L1 norm of chrominance components between fused images and the designed pseudo images with the most severe color attenuation is calculated to measure the global color degradation, and the color saturation similarity is added to eliminate the inﬂuence of color over-saturation. Secondly, a set of distorted images under di ﬀ erent exposure levels with strong edge information of fused image is constructed through the structural transfer, thus DSIFT similarity and DSIFT saturation are computed to measure the local detail loss and enhancement, respectively. Thirdly, Gauss exposure function is used to detect the over-exposure or under-exposure areas, and the above perceptual features are aggregated with random forest to predict the ﬁnal quality of fused image. Experimental results on a public MEF subjective assessment database show the superiority of the proposed method with the state-of-the-art image quality assessment models.


Introduction
Natural scenes usually have a wide brightness range from 10 −5 cd/m 2 to 10 8 cd/m 2 , but it is difficult for the existing imaging devices to acquire all parts of scene information at the single exposure situation due to the limitation of its own dynamic range [1]. Multi-exposure fusion (MEF), as an effective quality enhancement technology, is able to integrate multiple low dynamic range (LDR) images under different exposure levels captured by the normal cameras into a perceptually attractive image and has been successfully applied in various multimedia fields, such as remote sensing, medical imaging, panoramic imaging and HDTV [2,3].
Generally, the performance differences between several MEF algorithms are mainly reflected in the solving process of fusion weights. The simplest local and global energy weighting algorithms obtain weights by measuring the local and global energy among source images. Mertens et al. [4] constructed the weights by considering contrast, saturation and good exposure, and fused multiple images by the multi-scale pyramid model. On this basis, Li et al. [5] made the fused image more realistic subjectively by solving the quadratic optimization problem to enhance the detail information. Gu et al. [6] extracted gradient information from the structural tensor of source images to design the initial weights and smoothed them with edge-preserving filter to prevent the artifacts. Remarkably, different kinds of (1) The difference of chrominance components between fused images and the defined pseudo images with the most severe color attenuation is calculated to measure the global color degradation, and the color saturation similarity is added to eliminate the influence of over-saturated color. (2) A set of distorted source images with strong edge information of fused image is constructed by the structural transfer characteristic of guided filter; thus, structure similarity and structure saturation are computed to measure the local detail loss and enhancement, respectively.
(3) The Gauss function is designed to accurately detect the over-exposed or under-exposed areas of images; then, the local luminance of each source images and the global luminance of fused image are used to measure the luminance consistency between them.
The remaining of this paper is organized as follows: The proposed MEF-IQA method is investigated in Section 2. The performance comparison between the proposed method and the state-of-the-art ones is described in Section 3. Finally, the conclusion and future work are drawn in Section 4.

Proposed Human Visual Perception-Based MEF-IQA Method
It is generally acknowledged that MEF images with ideal quality are rich in color, detail information and exposure, so single perceptual feature is inadequate for accurately evaluating MEF images. In this paper, a human visual perception-based MEF-IQA method is proposed, which mainly consists of color, detail and good exposure metrics, and Figure 1 shows its flowchart. In the first stage, local saturation similarity and global color distortion metric are designed for detecting the unbalanced chrominance of fused images. In the second stage, dense scale invariant feature transform (DSIFT) descriptor is adopted to obtain the local structure information along different directions for each pixel in the images, and DSIFT similarity and DSIFT saturation between the source images and the pseudo reference images with fused image's strong edge information are calculated at different scales to measure the distortion such as detail loss and detail enhancement. In the third stage, local exposure similarity and global exposure metric are presented by combining the luminance of source images and fused image with Gauss function, respectively. The specific implementation details of the proposed MEF-IQA method are stated in the following four subsections. degradation, and the color saturation similarity is added to eliminate the influence of oversaturated color.
(2) A set of distorted source images with strong edge information of fused image is constructed by the structural transfer characteristic of guided filter; thus, structure similarity and structure saturation are computed to measure the local detail loss and enhancement, respectively.
(3) The Gauss function is designed to accurately detect the over-exposed or under-exposed areas of images; then, the local luminance of each source images and the global luminance of fused image are used to measure the luminance consistency between them.
The remaining of this paper is organized as follows: The proposed MEF-IQA method is investigated in Section 2. The performance comparison between the proposed method and the stateof-the-art ones is described in Section 3. Finally, the conclusion and future work are drawn in Section 4.

Proposed Human Visual Perception-Based MEF-IQA Method
It is generally acknowledged that MEF images with ideal quality are rich in color, detail information and exposure, so single perceptual feature is inadequate for accurately evaluating MEF images. In this paper, a human visual perception-based MEF-IQA method is proposed, which mainly consists of color, detail and good exposure metrics, and Figure 1 shows its flowchart. In the first stage, local saturation similarity and global color distortion metric are designed for detecting the unbalanced chrominance of fused images. In the second stage, dense scale invariant feature transform (DSIFT) descriptor is adopted to obtain the local structure information along different directions for each pixel in the images, and DSIFT similarity and DSIFT saturation between the source images and the pseudo reference images with fused image's strong edge information are calculated at different scales to measure the distortion such as detail loss and detail enhancement. In the third stage, local exposure similarity and global exposure metric are presented by combining the luminance of source images and fused image with Gauss function, respectively. The specific implementation details of the proposed MEF-IQA method are stated in the following four subsections.

Local and Global Color Metrics
Since HVS is more sensitive to luminance than chrominance, most existing GF-IQA and MEF-IQA models focus on grayscale information, while ignoring the importance of color. Unfortunately, there are distinct color differences between the MEF images formed by different algorithms, and Figure 2 depicts an example of MEF images of sequence "Lamp1" from MEF database. From Figure 2, it can be seen that the image in Figure 2a has a bright and vivid color, while the MEF images in Figure 2b,c are very dim, causing a terrible visual experience. Therefore, the quality assessment for the colorful MEF images will be more reliable, and we adopt the global and local ways to evaluate the color distortion of MEF images in this section.

Local and Global Color Metrics
Since HVS is more sensitive to luminance than chrominance, most existing GF-IQA and MEF-IQA models focus on grayscale information, while ignoring the importance of color. Unfortunately, there are distinct color differences between the MEF images formed by different algorithms, and Figure 2 depicts an example of MEF images of sequence "Lamp1" from MEF database. From Figure  2, it can be seen that the image in Figure 2a has a bright and vivid color, while the MEF images in Figure 2b,c are very dim, causing a terrible visual experience. Therefore, the quality assessment for the colorful MEF images will be more reliable, and we adopt the global and local ways to evaluate the color distortion of MEF images in this section.

Global Color Distortion Metric
To extract the chrominance components of MEF images, the RGB color space is first transformed into YCbCr color space that is more in line with human visual characteristics, which is expressed as Cr (3) where N is the number of pixels for each chrominance component,

Global Color Distortion Metric
To extract the chrominance components of MEF images, the RGB color space is first transformed into YCbCr color space that is more in line with human visual characteristics, which is expressed as where Y f is the luminance information, and Cb f and Cr f are the two chrominance components.
Evidently, the inverse process from YCbCr color space to RGB space according to Equation (1) can be deduced as where R f , G f and B f are the R, G and B channels of fused image, respectively. From Equation (2), it can be found that the farther away the value of Cb f and Cr f are from 128, the more colorful subjectively the fused image will be. Hence, we utilize the L1 norm between the chrominance components and 128 to approximately measure the global color distortion of fused image, which is calculated by where N is the number of pixels for each chrominance component, · 1 is the L1 norm operator, and C G Cb and C G Cr are the two global color distortion metrics for Cb f and Cr f , respectively. However, the presented C G Cb and C G Cr will be worthless in the case of over-saturated color, so it is necessary to eliminate this negative effect.

Local Saturation Similarity
According to the related visual psychology research, color saturation can accurately indicate the natural response of HVS to color information. Generally, pixels with high saturation have more vivid color, while pixels with low saturation are dim. Moreover, color saturation S can be simply measured by calculating the standard deviation in the RGB space, which is defined as where µ is the mean value of R, G and B channels. Thus, the color saturation maps for each source image I k and fused image I f can be computed by Equation (4), which are denoted as {S k }(k = 1, 2, . . . , K) and S f , respectively. Then, a maximum color saturation map S max can be calculated from {S k } as the pseudo reference image with the optimal chrominance and is expressed as where S k (p) is the maximum saturation at the position of p for the k-th source image, max( . ) is the "select max" operation, and K is the number of source images. Finally, the local color distortion is evaluated by calculating the similarity between S max and S f , thereby eliminating the impact caused by over-saturated color, and the local saturation similarity C L SIM is defined as where mean( . ) is the mean operator, and c 1 is a constant to control the denominator not to be zero.

Structure Similarity and Saturation Metric
The structural information of images usually carries the essential visual contents of scenes, and HVS is highly adaptable to extract structures for visual perception. Moreover, DSIFT descriptor [20], as an effective means for obtaining local gradient information of each pixel in eight directions, has been successfully applied in the field of computer vision, such as image registration and image fusion. Compared with gradient magnitude, DSIFT is more accurate and robust to extract the structural information of images. Figure 3a-c shows three fused images of sequence "Tower" created by Mertens' algorithm [4], local energy weighting and Li's algorithm [5], respectively. From Figure 3, we can have the following observations: Figure 3a cannot preserve the fine details in the center of tower and the brightest cloud region; Figure 3b produces unnatural artifacts near the edges of sky or tower and also known as pseudo contour; Figure 3c can be regard as the ones after performing the edge enhancement operation on Figure 3a. Generally, the detail enhancement algorithm may create more appealing results perceptually, but it also introduces some unnatural shadow into fused image. Therefore, in this section, DSIFT similarity and DSIFT saturation are designed for precisely detecting the edge distortion in the fused image, which is mainly reflected in three aspects, i.e., detail loss, pseudo contour and detail enhancement.

DSIFT Similarity
As an effective edge-preserving filter, guided filter fG( . ) is determined by the input image Ii and guided image Ig, and the specific filtering process is defined as where Io is the filtering output, and r and ε are the filtering radius and regularization parameter, respectively. In addition, when Ig is different from Ii, guided filter is equivalent to the structure transfer filter, that is, Io will retain the strong edge information of Ig, and the size of r and ε limits the strength of the retained edge information.
According to the characteristic, we first choose the multi-exposure source images {Ik}(k = 1, 2, ..., K) as the filtering input, and the fused image If is used as the guided image. Thus, a set of pseudo multi-exposure images with the strong edge information of fused image are constructed and denoted

DSIFT Similarity
As an effective edge-preserving filter, guided filter f G ( . ) is determined by the input image I i and guided image I g , and the specific filtering process is defined as where I o is the filtering output, and r and ε are the filtering radius and regularization parameter, respectively. In addition, when I g is different from I i , guided filter is equivalent to the structure transfer filter, that is, I o will retain the strong edge information of I g , and the size of r and ε limits the strength of the retained edge information.
According to the characteristic, we first choose the multi-exposure source images {I k }(k = 1, 2, ..., K) as the filtering input, and the fused image I f is used as the guided image. Thus, a set of pseudo multi-exposure images with the strong edge information of fused image are constructed and denoted as I d,s k (k = 1, 2, . . . , K). At the same time, in order to eliminate the influence caused by filtering, I k is selected as the input image and guided image once again to generate a set of filtered multi-exposure source images I r,s k (k = 1, 2, . . . , K). The above filtering process can be expressed as where r s and ε s are set as the small values to guarantee that the constructed pseudo multi-exposure images retain all edge information of fused image. Then, DSIFT descriptor is applied to each pixel point in I r,s k and I d,s k , and the related DSIFT feature with the dimension of M is extracted, which can be defined as where D r,s k,1:M and D d,s k,1:M are the obtained DSIFT feature of the k-th source image and pseudo image after filtering, respectively. f D ( . ) is the operator for calculating DSIFT feature, and M is the dimension of feature.
Finally, DSIFT feature similarity between I r,s k and I d,s k is calculated to measure the detail loss or pseudo contour in the fused image, which is expressed as where D k,1:M SIM is the obtained k-th DSIFT similarity map, and c 2 is a constant to control the denominator not to be zero.
Since D k,1:M SIM symbolizes the edge distortion areas of fused image in different gradient orientations, the final k-th DSIFT similarity map D k SIM can be computed by the simple average operation.
where m is the dimension index of the k-th DSIFT similarity map. Furthermore, we define the Gauss exposure weighting function f s E (·) to integrate all DSIFT similarity maps under different exposure levels, and it is expressed as where L s k is a set of images after smoothing source images I k by the mean filter with the window size of 7 × 7, and it is mainly based on the spatial consistency principle. Specially, DSIFT similarity map is calculated from I r,s k and I d,s k with smaller smoothness level, thus the corresponding weight should be also smooth in space. D SIM is the aggregated DSIFT similarity map, and the final DSIFT similarity feature is simply calculated by the mean operation for D SIM . Figure 4a-c depicts the corresponding DSIFT similarity maps of Figure 3a-c, respectively. From Figure 4, it can be found that the edge distortion areas in Figure 3a,b (e.g., the center of tower, the brightest cloud and the sky) are well reflected in the calculated quality maps. However, Figure 4c shows the false results at the position of enhanced pixels in Figure 3c, because detail enhancement usually makes the edge information of images change strongly. Actually, an appropriate detail enhancement algorithm will produce more attractive results subjectively, which is just opposite of the impact caused by DSIFT similarity. Therefore, it is necessary to add the DSIFT saturation on the basic of DSIFT similarity to eliminate this effect.
where rb and εb are set as the large values to guarantee that the constructed pseudo multi-exposure images only retain some large intensity edge information, i.e., detail enhancement area.
Then, DSIFT descriptor is also applied to each pixel point in r

DSIFT Saturation
Similar to the calculation of DSIFT similarity, a set of filtered pseudo images I d,b k and source images I r,b k can be generated by guided filter, and the filtering process is expressed as where r b and ε b are set as the large values to guarantee that the constructed pseudo multi-exposure images only retain some large intensity edge information, i.e., detail enhancement area. Then, DSIFT descriptor is also applied to each pixel point in I r,b k and I d,b k , and the related DSIFT feature with the dimension of M is extracted as follows where D r,b k,1:M and D d,b k,1:M are the obtained DSIFT feature of the k-th source image and pseudo image after filtering, respectively.
Finally, DSIFT saturation between I r,b k and I d,b k is calculated to measure the detail enhancement, which is expressed as where D k,1:M SA is the obtained k-th DSIFT saturation map, atan( . ) is the inverse tangent function, and c 3 is a constant to control the denominator not to be zero. Evidently, when D d,b k,1:M is greater than D r,b k,1:M , the saturation is greater than 1, indicating the high visual perception of HVS on detail enhancement areas. Similarly, the k-th DSIFT saturation map D k SA can be obtained by averaging the feature on each dimension, which is expressed as Moreover, we also use the Gauss exposure weighting function f b E (·) to integrate all DSIFT saturation maps under different exposure levels, and it is denoted as where L b k is a set of images after smoothing source images I k by the mean filter with the window size of 15 × 15 according to the spatial consistency principle. D SA is the aggregated DSIFT saturation map, and the final DSIFT saturation feature is simply calculated by the mean operation for D SA . Figure 5a-c shows the corresponding DSIFT saturation maps of Figure 3a-c, respectively. From Figure 5, it can be seen that Figure 5c has higher saturation at the position of pixels with enhanced detail information in Figure 3c, which eliminates the undesired effects caused by DSIFT similarity.
Moreover, we also use the Gauss exposure weighting function b E ( ) f ⋅ to integrate all DSIFT saturation maps under different exposure levels, and it is denoted as   Figure 3a-c, respectively. From Figure 5, it can be seen that Figure 5c has higher saturation at the position of pixels with enhanced detail information in Figure 3c, which eliminates the undesired effects caused by DSIFT similarity. Fortunately, the detail loss areas in Figure 3c are also reflected on Figure 5c, demonstrating the validity of the proposed DSIFT saturation metric.

Local and Global Exposure Metrics
There are usually under-exposed or over-exposed areas in MEF images, which is mainly caused by the luminance inconsistency among adjacent pixels, thus resulting in a poor visual experience. In this section, we constructed the local and global exposure metrics to evaluate these luminance distortion phenomena.

Local and Global Exposure Metrics
There are usually under-exposed or over-exposed areas in MEF images, which is mainly caused by the luminance inconsistency among adjacent pixels, thus resulting in a poor visual experience. In this section, we constructed the local and global exposure metrics to evaluate these luminance distortion phenomena.

Local Exposure Similarity
Similar to the definition of Gauss exposure weighting function in Equations (12) and (18), the local exposure map of an image can be defined by measuring the distance between normalized pixel intensity and 0.5, i.e., when the pixel intensity is close to 0 or 1, the pixel point is considered as under-exposed or over-exposed. Therefore, local exposure maps E b k for each source image are calculated by Then, the best exposure areas in each source image are selected to form a good exposure reference image I b r , which is defined as Finally, the local exposure distortion areas can be detected by calculating the similarity between I b r and I f .
where c 4 is a constant to control the denominator not to be zero, and the final local exposure similarity feature is simply calculated by the mean operation for E b SIM . Figure 6a-c depicts three fused images of sequence "Candle" created by different algorithms, and Figure 6d-f are the local exposure similarity maps of Figure 6a-c, respectively. Obviously, there are several under-exposed areas (e.g., the teacup in Figure 6a and the shadows in Figure 6c) that are inconsistent with surrounding areas in terms of luminance. Such exposure distortion is clearly indicated in the corresponding quality maps. Moreover, Figure 6b has the uniform luminance distribution, but it is darker on the whole space than Figure 6a,c, which still results in the bad visual perception. Therefore, it is essential to consider the impact by the overall luminance.  [5].
(b) fused image created by Raman's algorithm [7]. (c) fused image created by Li's algorithm [9]. (d-f) Corresponding local exposure similarity maps of images in the first row.

Global Exposure Metric
Similar to the local exposure metric, we combine the average luminance of fused image with Gauss function to design the global exposure metric g m E , and it can be expressed as  [5].
(b) fused image created by Raman's algorithm [7]. (c) fused image created by Li's algorithm [9]. (d-f) Corresponding local exposure similarity maps of images in the first row.

Global Exposure Metric
Similar to the local exposure metric, we combine the average luminance of fused image with Gauss function to design the global exposure metric E g m , and it can be expressed as where I k is average luminance of fused image. When the luminance value is close to 0 or 1, the fused image looks entirely dark or bright.

Quality Prediction
In addition, multi-scale characteristics in spatial domain can acquire the image content from the fine level to the coarse level, which is consistent with the processing mechanism of low-level retina and cortex in primate visual system. As illustrated in Figure 7, the original scale of multi-exposure images and fused images are marked as scale 1. By iteratively applying a low-pass filter and a down-sampling operation with a factor of 2 on the original images, the filtered images at scale l can be obtained after l-1 iterations. Then, the above-mentioned feature extraction method is conducted in the multi-scale space, thus generating the final feature vector F f = [F 1 , F 2 , F 3 ], where F 1 , F 2 , and F 3 are color, structure and exposure features, respectively. After feature extraction, the quality regression from feature space to image quality is conducted, which can be denoted as where f Q ( . ) is a quality regression function achieved by random forest (RF) algorithm, and Q is the quality of fused image.

Database
To compare the performance of the proposed MEF-IQA method with other state-of-the-art IQA models, the experiments were performed on the public MEF subjective assessment database [21] provided by Waterloo IVC. Specially, it consists of 17 multi-exposure source image sequences, and each image sequence contains the corresponding fused images generated by 8 MEF algorithms. Therefore, a total of 136 fused images with associated mean opinion score (MOS) are included in this database, and more details can be referred to Figure 8 and Table 1.

Database
To compare the performance of the proposed MEF-IQA method with other state-of-the-art IQA models, the experiments were performed on the public MEF subjective assessment database [21] provided by Waterloo IVC. Specially, it consists of 17 multi-exposure source image sequences, and each image sequence contains the corresponding fused images generated by 8 MEF algorithms. Therefore, a total of 136 fused images with associated mean opinion score (MOS) are included in this database, and more details can be referred to Figure 8 and Table 1. Symmetry 2019, 11, x FOR PEER REVIEW 13 of 18 Figure 8. Source image sequences contained in the MEF database [21]. Each image sequence is represented by one image, which is a fused image with the best quality in the subjective test.

. Evaluation Criteria
According to the related standard in the field of image quality assessment formulated by video quality expert group (VQEG) [22], three evaluation criteria, i.e., Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SROCC) and root mean square error (RMSE), are selected to evaluate the performance of IQA models, and the most excellent model will be  [21]. Each image sequence is represented by one image, which is a fused image with the best quality in the subjective test. Table 1. Information about source image sequence in the MEF database [21].

No.
Source Sequences Size Image Source

Evaluation Criteria
According to the related standard in the field of image quality assessment formulated by video quality expert group (VQEG) [22], three evaluation criteria, i.e., Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SROCC) and root mean square error (RMSE), are selected to evaluate the performance of IQA models, and the most excellent model will be generated when the values of PLCC and SROCC are 1, RMSE is 0. Moreover, a 5-parametric logistic regression process is employed to make the predicted quality closer to subjective scores before calculating PLCC and RMSE.

Experimental Parameters
In terms of the proposed MEF-IQA method, there are several experimental parameters to be fixed in the process of feature extraction. Specifically, the filtering radius and regularization parameter of guided filter (i.e., r s , ε s , r b and ε b ) are mainly applied for the structure transfer, so we set r s = 11, ε s = 10 −6 , r b = 21 and ε b = 0.3, respectively, which are in accordance with the advice in [9]. Moreover, the feature dimension M of DSIFT descriptor is the same as the ones in [20], that is, M = 32. Generally, the above parameters cannot make a significant impact on the final performance of MEF-IQA models, so we strictly follow the recommendation in the previous study rather than setting the parameters randomly. Finally, considering that feature extraction is conducted in the multi-scale space, and the scale number, l, will affect the performance evidently. Therefore, we select the optimal scale value (i.e., l = 3) to achieve a trade-off between complexity and accuracy, and the specific details about analyzing the impact of the scale number on performance will be discussed in the following sections.

Performance Comparison
To verify the performance of the proposed MEF-IQA method, we compare it on MEF database [21] with nine existing state-of-the-art IQA metrics, including six GF-IQA metrics [11][12][13][14][15][16] and three MEF-IQA metrics [17][18][19]. Remarkably, the proposed MEF-IQA method adopts the supervised learning approach to obtain the image quality, thus the MEF database is first divided into training and testing subsets. Then, 17-fold cross validation is used to evaluate the performance of model, that is, for each train-test stage, 16 distorted image sequences and the rest ones are used for training and testing, respectively. The results of performance comparison for each source image sequence are tabulated in Tables 2 and 3, and only the values of PLCC and SROCC are presented for brevity, where the best two performances are highlighted in boldface. Furthermore, we simultaneously record the corresponding hit count of performance highlighted in boldface to discriminate the performance difference among ten IQA metrics more intuitively. From Tables 2 and 3, we can have the following findings. First, compared with six GF-IQA metrics, three MEF-IQA metrics specially designed for multi-exposure images achieve a more outstanding performance, which indicates that MEF images have the special perceptual characteristics due to the distortion introduced in the imaging process, such as color imbalance, structure degradation and abnormal exposure. Second, the related color distortion metric is extra considered in [19], which makes the performance of MEF-IQA metric in [19] slightly improved against the other two existing MEF-IQA metrics [17,18]. It is mainly due to the fact that although HVS is more sensitive to luminance than chrominance in real situation, an image with serious color distortion also causes the bad visual experience. Therefore, color information cannot be ignored in terms of MEF-IQA. Finally, the proposed MEF-IQA method achieves 0.952 and 0.897 on PLCC and SROCC on average, respectively. Obviously, it outperforms all the competing MEF-IQA and GF-IQA metrics, because it considers three perceptual factors, namely color, structure and exposure.

Impacts of Multi-Scale Scheme and Different Feature
To analyze how much of the contributions coming from each kind of feature in the proposed MEF-IQA model, the evaluation performance resulted from three perceptual attributes (i.e., color, structure and exposure) are investigated on the MEF database [21]. The corresponding results averaged on all source image sequences in the database are reported in Table 4, where F 1 , F 2 and F 3 denote the extracted feature vector about color, structure and exposure, respectively. From Table 4, it can be found that the structure feature has a more significant influence on the final performance compared with the color and exposure features, which demonstrates that HVS is highly sensitive to the structure degradation in an image. Moreover, we also explore the impact of the scale number on the final performance in the multi-scale scheme. Specially, we assign l = {1, 2, 3, 4} and calculate the PLCC, SROCC and RMSE for each situation, respectively, and the results are shown in Table 5. Evidently, when the value of l is 3, the proposed MEF-IQA model achieves an optimal performance, so we set the scale number as 3 in the paper to guarantee the accuracy of method.

Conclusions
In this paper, a human visual perception-based multi-exposure fusion image quality assessment (MEF-IQA) method is proposed by considering three perceptual features about color, structure and exposure, and the superiority of our approach is mainly reflected in the following three aspects. First, the chrominance information, which is usually ignored in the existing IQA models for image fusion, is utilized to form the local color saturation similarity and global color distortion metric. Second, dense scale invariant feature transform (DSIFT) descriptor is used for obtaining the structure information of image from multiple orientations because it is more robust and accurate than gradient amplitude. Third, Gauss exposure function is designed to evaluate the luminance inconsistency among the adjacent pixels, and multi-scale scheme is adopted in the process of feature extraction to explore the perceptual difference from the fine level to the coarse level. Extensive experiments have indicated that the proposed method is more effective in predicting the quality of MEF images compared with other state-of-the-art metrics. However, the natural scene is almost moving in practice, and these moving objects in the imaging process will make the final fused images appear ghosting. In future work, we will focus on the MEF-IQA models in dynamic scene, which are more practical than those in static scene.
Author Contributions: Y.C. and A.C. provided the initial idea for this work; Y.C. and A.C. designed the algorithm. Y.C., A.C., B.Y., S.Z. and Y.W. contributed to the analyses of results. Y.C. contributed to the collection and analysis of field test data. Y.C. and A.C. wrote the paper.