Multi-Exposure Fusion of Gray Images Under Low Illumination Based on Low-Rank Decomposition

Existing multi-exposure fusion (MEF) algorithms for gray images under low-illumination cannot preserve details in dark and highlighted regions very well, and the fusion image noise is large. To address these problems, an MEF method is proposed. First, the latent low-rank representation (LatLRR) is used on low-dynamic images to generate low-rank parts and saliency parts to reduce noise after fusion. Then, two components are fused separately in Laplace multi-scale space. Two different weight maps are constructed according to features of gray images under low illumination. At the same time, an energy equation is designed to obtain the optimal ratio of different weight factors. An improved guided filtering based on an adaptive regularization factor is proposed to refine the weight maps to maintain spatial consistency and avoid artifacts. Finally, a high dynamic image is obtained by the inverse transform of low-rank part and saliency part. The experimental results show that the proposed method has advantages both in subjective and objective evaluation over state-of-the-art multi-exposure fusion methods for gray images under low-illumination imaging.


Introduction
Low-illumination imaging can compensate for problems in information collection in low-illumination environments, thus improving national defense ability. The dynamic range of ordinary charge-coupled device (CCD) sensors is approximately 10 3 , which is much smaller than that of a real scene [1,2]. In addition, images taken in low-light conditions are often of low visibility. In fact, the image quality of gray images under low illumination can be improved by using image enhancement methods [3,4]. However, it is difficult to recover image details that are lost due to dynamic range limitation. Multi-exposure image fusion techniques have been introduced to solve the abovementioned issues.
Multi-exposure fusion (MEF) methods are mainly categorized as tone-mapping based on the inverse camera response function (CRF) in the radiance-domain and transforming fusion coefficients in the spatial-domain [5]. The first method involves CRF computation and tonal image mapping, which is time-consuming and difficult to implement [6][7][8]. Therefore, MEF methods based on pixel-level fusion in the spatial-domain are extensively used currently [9][10][11].
Three main types of solutions exist for MEF methods in the spatial-domain. First, most previous exposure fusion approaches merged multiple exposure source images at the pixel level based on some defined weight measures [12][13][14]. Mertens used image contrast, good exposure and saturation information to calculate the initial weight maps based on the Laplacian pyramid. Three adjustment factors were adopted to control the ratio of weights. This method achieved good fusion effects, however there was a problem of losing brightness and dark details [15]. Other measures were proposed to maintain details in the brightest and darkest regions, and most of them focused on how to build weights: an image reconstruction technique based on the Haar wavelet in the gradient domain [16], weighted guided image filter [17,18], the adaptive weight function [19], weight maps based on gradient information [20], adaptive weights based on the relative intensity between the images and global gradients [21] and structural patch decomposition [22]. Weighted sparse representation and a guided filter in gradient domain were proposed to retain image edges more adequately in gray images [23]. The input images are decomposed by non-subsampled contourlet transform and then different fusion rules are applied for low and high frequency NSCT coefficients. Finally the fused image is obtained by the inverse transform [24], and this method obtained good results. On the whole, the pixel-based image fusion method is simple and easy to implement. The fusion effect is limited in extracting details due to its fixed initial weights, and the weights obtained based on a single pixel are susceptible to noise and easily produce visual artifacts in the fusion image.
Patch-based MEF methods have attracted more attention [25,26]. These methods divide multi-exposure images into different patches. Kede Ma proposed an effective structural patch by decomposing an image patch into three conceptually independent components: signal strength, signal structure, and mean intensity. Then, three components are fused separately [27,28]. Wang utilized a super-pixel segmentation approach to divide images into non-overlapping image patches composed of pixels with similar visual properties [29]. An MEF framework based on low-level image features and image patch structure decomposition was proposed to improve the robustness of ghost removal in a dynamic scene, and preserved more detailed information [30]. Overall, this method mainly pays attention to the block segmentation. However, different patches contain pixels with different color and brightness characteristics. If the same fusion rule is used on these pixels with different characteristics, then the color or detail information tends to be lost.
Recently, the convolutional neural network (CNN) has also been applied to extract exposure-invariant features to generate artifact-free fusion images [31,32]. An MEF algorithm for gray images is proposed based on the decomposition CNN and weighted sparse representation [33]. In general, learning-based approaches have a good effect on ghost suppression in dynamic scenes. However, it is difficult to obtain real high dynamic range (HDR) datasets and better utilization of the CNN for MEF requires the solution of the problems associated with the training dataset, network classification accuracy, and loss function.
Generally, there are fewer multi-exposure fusion methods only for gray images under low illumination. Most MEF algorithms are designed for color images under normal illumination. Although these methods can be used in gray images by preprocessing, but the saturation factor in the method [15] is zero and is without practical significance for gray images. Multi-exposure fusion of gray images under low illumination faces two challenges: images under low illumination have low contrast and relatively large noise resulting in a lower signal-to-noise ratio of the fusion image; and compared with color images, gray images have less information available for building fusion weight maps resulting in detail loss in the dark and highlighted regions.
Given these problems, a novel approach based on latent low-rank representation (LatLRR) and adaptive weights is proposed. Moreover, an improved guided filter is used to refine the weight maps to reduce artifacts. The proposed algorithm makes the following contributions: (1) Before constructing multi-scale Laplace fusion space, LatLRR is utilized to decompose low dynamic range (LDR) images, which is more effective in reducing noise. The global and local structures are treated separately in the multi-scale space.
(2) According to characteristics of gray images under low-light imaging, adaptive weight factors are constructed for the decomposed global and local structures to avoid detail loss in low dark areas and highlighted areas. (3) An energy function is formulated to obtain the optimal ratio of contrast and exposure scale factors to obtain better robustness. (4) An improved guided filtering algorithm based on an adaptive regularization factor is proposed to refine the weight maps to maintain spatial consistency and avoid artifacts.
The remainder of this paper is organized as follows: relevant technical background is briefly reviewed in Section 2. In Section 3, the proposed multi-exposure image fusion method is introduced in detail. The experimental results and analysis are shown in Section 4. The conclusions are presented in Section 5.

Latent Low-Rank Representation
The latent low-rank representation decomposes the image into a global structure (low-rank representation), a local structure (salient features) and sparse noise [34], and it is robust against noise. The LatLRR decomposition can be defined as follows: where, λ is the equilibrium factor, . * denotes the nuclear norm and . 1 is the l 1 − norm. X is an image with the size of M × N. Z is the low-rank coefficient, L is the saliency coefficient and E is the sparse noise. Then the low-rank representation part XZ(X L ), saliency part LX (X S ) and sparse noise part E can be derived. The noise is removed and only the low-rank and saliency parts are performed in fusion processing.

Guided Filter
The guided filter is an edge-preserving smoothing filter proposed by He et al. [18] and is defined as a linear model using a guidance image X: where Z i and X i are the i − th pixel value of the output and guidance image, respectively. a k and b k are linear coefficients in a local window ω k with a size of (2r + 1) × (2r + 1) centered on pixel k. The coefficients a k and b k can directly be estimated using: where Y is the original image, µ k and δ k represent the mean and variance of guidance image X in local window ω, respectively. A fixed regularization factor ε was applied to different local regions in [18], and the difference in image texture between different regions is not taken into consideration. This leads to over-smooth or under-smooth effects.
In reality, a smaller ε is required for regions with rich edge information to sharpen and highlight the edges. By contrast, a larger ε is required for flat regions to produce a stronger smoothing effect.

Proposed Multi-Exposure Fusion Method
In this section, an adaptive MEF method based on digital time delay integration (TDI) images with multiple integral series is presented to promote the application of multi-exposure fusion technology in the field of remote sensing under low-light imaging. Figure 1 shows the framework of the proposed method. First, K images with different integral series are selected and used as multi-exposure images. Second, the image is divided into a low-rank representation part and saliency part using the LatLRR method. Third, for decomposed images, different fusion weight maps are designed according to the characteristics of low-illumination imaging. Additionally, an energy function is formulated to adjust the proportion of contrast and exposure factors. Then, the improved guided filtering based on an adaptive regularization factor is proposed to smooth weight maps to maintain spatial consistency and avoid artifacts. Finally, decomposition and reconstruction of the low rank part and the saliency part are implemented in Laplace multi-scale space to obtain the high dynamic image. In the following subsections, we provide more details of the multi-exposure image fusion process.

Generation of Multi-Exposure Images
For practical engineering applications, it is difficult to obtain multi-exposure images without adding additional sensors. Digital TDI brings a new opportunity for high dynamic fusion [35]. The fundamental principle is to improve the sensitivity of the system by multiple exposures to the same target by delay integration. For the global exposure mode, the target is exposed in each frame cycle to obtain one image at a time. N images corresponding to the same ground scene are digitally superimposed to realize N series TDI imaging in the digital domain. Different integral series images can be obtained in the process of digital TDI imaging, and images with a low integral series are equivalent to the low-exposure images, and images with a high integral series are equivalent to the highexposure images. At the same time, according to the orbit altitude, satellite flight speed and exposure time of the remote imaging sensor, it can be seen that the considerable motion between multi-integral images is far less than one pixel. Therefore, the multi-exposure images produced in this way can be regarded as images of static scenes, which is convenient to the MEF method. We analyze the performance of images with different TDI series. The relations between information entropy and mean gradient and the TDI series are examined as shown in Figure 2. It can be seen that both increase first and then decrease. Thus we can calculate the summation (marked as S) of the information entropy and mean gradient for each integral series. After that, the average value of S can be calculated. The relative minimum and maximum integral series can be found for which the value of S just exceed the average value. At the same time, the image with the largest value of S is involved in fusion. In this way, three images with different series can be selected to participate in the fusion to ensure the quality of the high-dynamic image. However, in practical application, if the memory on-board is sufficient, then more than three images with different TDI series can be chosen to improve the image quality. Without loss of generality, this paper describes images with K integral series selected as low-dynamic images.

Latent Low-Rank Image Decomposition
To improve the signal-to-noise ratio (SNR) of the fusion image under low illumination, LatLRR is utilized to decompose the source images. LatLRR is more efficient and robust to noise and outliers [36]. An example of LatLRR decomposition using Equation (1) is shown in Figure 3. Figure 3a is the original image. Figure 3b shows the low-rank part of the image, Figure 3c depicts the saliency features and Figure 3d depicts the noise in the 3D display. The decomposed image sequence is obtained as: X k L , k = 1, 2, . . . , K and X k S , k = 1, 2, . . . , K . X k L and X k s are the low-rank and saliency part of the k-th image, respectively. In the following process, they are treated separately and synthesized in the end to remove the effect of noise.

Weight Map Construction
For the multi-exposure fusion of gray images, weight factors directly affect the fusion effect, and different weights factors are designed for gray images considering the LatLRR decomposition.

Low Rank Part
The low-rank part contains global information, energy information, and brightness and contrast information of the image. First, a contrast factor is constructed according to the fact that the Laplace operator is isotropic. The image is filtered to obtain the high-frequency information D k by Equation (6). A Gaussian low-pass filter is applied to the absolute value of D k to obtain the contrast map w c,k .
where, X k L is the k-th low-rank image, H laplacian is the Laplacian operator and ⊗ represents the convolution operation. D k is the high-frequency information of the image, and ψ 2 is a symbol of two-dimensional Gaussian filter, the variance of Gaussian filter kernel is 0.5 and the filtering radius is 5. |.| represents the absolute value.
Second, the image exposure is considered. The idea of calculating the exposure factor in [15] is that after normalization of the image, a gray value close to 0.5 is considered to be moderately exposed and is given a larger weight. When deviating from 0.5, the exposure is insufficient or overexposed, and a smaller weight value is given. The method used by [15] misfits the case in which the overall value of an image is too bright or too dark. Under low-light imaging compared to normal light conditions, the overall luminance value is lower in both high and low exposure images. When the whole image intensity is too bright, a darker pixel value should be given a larger weight. By contrast, when it is too dark, a brighter pixel should be given a larger weight. This paper aims at low-illumination imaging, and gray values are lower than normal illumination on the whole. For this reason, the following formulas are designed: where, 0.8 and 0.2 are empirical parameters that come from a large number of experimental statistics. X k L (x, y) is the gray value of the k-th low-rank image at (x, y). The gray value is normalized to [0,1], z max and z min represent the maximum and minimum values of X k L , respectively. mean(X K L ) represents the mean value of the image. σ is the standard deviation and is an empirical parameter set to 0.5 in this paper [15,28]. From Equation (10) it can be seen that z mid is a relatively large value when the overall gray value of the image is low, so a large gray value will be given a larger weight by Equation (9) and vice versa.
At last, an initial weight map of the low-rank part is constructed by using the contrast weight and exposure weight together in Equation (11): According to different images, the fixed proportional factor λ c and λ s cannot guarantee the robustness of the algorithm. Addressing the above issue, an energy function for λ c and λ s is proposed: The constraints of the upper energy equation are as follows:λ s + λ c = 1 and, λ s ≥ 0, λ c ≥ 0. w s,k and D k are calculated by Equations (5) and (6). k is the serial number of the image. x denotes the pixel position. K is the total number of image sequences. The closer the image exposure is to 0.5, the smaller the E value; at the same time, the greater the contrast of adjacent pixels, the smaller the E value. The process of finding the optimal solution involves minimizing the linear energy equation. It can be seen that the weight factors obtained in this way will adapt to different input images to preserve more details.

Saliency Part
The saliency part contains prominent local features and special brightness distributions. The structural and texture factor are combined to construct the fusion factor of the salient part in this paper. The fusion of the saliency part aims at preserving the details of all input images as much as possible. This paper proposes a texture factor of the saliency part designed as Equation (13).
is the norm of the image, and µ k S is the average value of the k-th saliency image. When the norm X k S − µ k S of the image becomes larger, the image has more abundant texture and detail information, and should be given a larger weight value for the corresponding texture. The second term V gabor (x, y) is the Gabor value of the image calculated by Equation (14).
where x = x cos θ + y sin θ, and y = −x sin θ + y cos θ. λ is expressed in pixels when participating in the calculation. Moreover, * represents the multiplication of the corresponding position of the matrix. In general, it is less than one-fifth the size of the input image and greater than or equal to 2. θ represents the direction of the Gabor filter, and ranges from 0 • to 360 • . ϕ is the phase offset that ranges from −180 • to 180 • . γ is used to adjust the elliptic aspect ratio after the Gabor transform. When γ equals 1, the shape is approximately circular. σ is the standard deviation of the Gaussian function in the Gabor function. a is a gain parameter and is set to 3. We selected the following parameters through extensive experimentation about statistical texture features: λ = 2, θ = 45 • , γ = 0.5, ϕ = 0 • , and σ = 0.5. The design strategy of parameters selection ensures that the half-peak magnitude supports of the filter responses in the frequency spectrum touch each other in most of the experimental images [37].

Weight Maps Refinement
The construction of initial weight factors of the low-rank and saliency parts written as w k L and w k S and they have been completed through the above steps. Then, an improved guided filtering is used to suppress artifacts and avoid the block effect due to the lack of spatial consistency in the fusion process.
From Equation (4) we can see that a fixed regularization factor ε was applied to different local regions in original guided filtering, and the difference in image texture between different regions is not taken into consideration. The adaptive weight factor based on the image gradient is proposed to adjust ε and an improved guided filtering is proposed to refine weight maps. For w k L and w k S , the calculation process of their regularization factors is not discussed separately, and only the guide image is different. Weight coefficients w l,k and w s,k of the k-th image were filtered, and guide images of them are the low-rank part X k L and the saliency part X k S of the k-th image obtained by LatLRR decomposition in Section 3.2. In the process of calculating the regularization factor, the low-rank part X k L and the saliency part X k S are uniformly represented as X. In this paper,w = G(X, w, r, ε) is used to represent the guided filtering operation. w is the original weight coefficient, and the w is the filtered weight coefficient. For pixel i of X, the adaptive weight factor is T i , and the regularization factor ε is replaced by T i ε to improve the robustness of the improved adaptive guided filter. First, the gradient of a guided image X is calculated by Equation (15): where ∇g X x = X ⊗ h x , ∇g X y = X ⊗ h y , and F is the gradient image. ⊗ represents the convolution operation. Values of h x and h y are as follows: The weight factor of pixel i in the gradient image F is calculated using the following equation: where δ 2 θ1 (i) and δ 2 θ2 (i) are the variances of region A1 and A2, respectively; and µ 2 θ1 (i) and µ 2 θ2 (i) are the average values of regions A1 and A2, respectively. A1 and A2 are centered on i, the size of A1 is 3 × 3, and A2 is the shaded area as shown in Figure 4. α 1 and β 1 are small numbers to avoid the instability caused by approaching 0, and their values are set to 10 −9 . ε is set to 0.04, the regularization factor Tε is obtained in this way. Generally, T i of an edge position is larger than that of a flat area, and the smoothing force of the guided filter is smaller, and vice versa. Finally, the guided filter is improved by replacing the original regularization term ε with Tε to prevent insufficient or excessive smoothing in some regions. The refined weight maps of the low-rank and saliency parts by the improved filtering G are shown below: where, weight coefficients w l,k and w s,k of the k-th image were filtered, and guide images are the low-rank part X k L and the saliency part X k S of the k-th image obtained by LatLRR decomposition in the Section 3.3. w l,k and w s,k are weight factors after improved guided filtering. And r is the size of a local window. Finally, the weights of K images need to be normalized toŵ l,k andŵ s,k , and an example of weight maps refinement is given in Figure 5.ŵ

Multi-Scale Fusion
First, the image sequence X k L , k = 1, 2, . . . , K of the low-rank part and X k S , k = 1, 2, . . . , K of the saliency part are fused in Laplacian multi-scale space to obtain the low-rank and saliency parts of multi-exposure fusion, respectively. And the reconstructed low-rank and saliency parts are recorded as F L and F S . The construction and refinement of weight maps have been discussed in the above sections. The weight map of a Gaussian pyramid is generated based onŵ l,k andŵ s,k . This process is similar to that of the method in [15]. Second, the HDR image F is obtained by the following formula: The details of our MEF method is as follows:

5.
For X k L , k = 1, 2, . . . , K and X k S , k = 1, 2, . . . , K , the reconstruction and fusion in Laplacian multi-scale space are implemented to obtain the fusion images F L and F S .

6.
The HDR image is obtained by Equation (22).

Experimental Results and Analysis
We select 20 sets of static scene images with different TDI series under low illumination to verify the performance of the proposed method. The test sets include representative indoor and outdoor images. Four state-of-the-art algorithms in [15,16,28,29] are chosen to cover a diverse types of MEF methods in terms of methodology and behavior. The method in [16] developed a virtual image based on the gradients of input images. The method in [15] is the classic multi-exposure fusion in multi-scale space, and the construction in multi-scale space used by this paper is based on the method in [15]. The methods in [28,29] are based on patch decomposition, and an image is divided into three conceptually independent components: signal strength, signal structure, and mean intensity. Upon fusing these three components separately, a desired patch is reconstructed and the fusion image can be obtained. And this process is similar to that in our paper. The image is decomposed by LatLRR and then the image is fused by inverse transformation in this paper. Therefore, these four representative methods are chosen for the comparison. Due to space constraints, only the fusion images of reference [15,16,29] are provided here. However, all evaluation indicators of the four methods are calculated and given in the paper. The existing data sets are mostly color images and are taken under normal illumination, while this paper is aimed at gray image fusion under low illumination. The images under the low illumination used by this paper are obtained by a low-illumination camera Gense400BSI in digital TDI imaging mode. The size of the images used by this paper is mostly 1000 × 1000. Experimental data are available and can be download at "https://pan.baidu.com/s/1NTccS607zFEtWJ4VKdn4cQ" and the password is "ntyy".
The proposed method consists of two types of parameters including the LatLRR decomposition and a guided filter. λ in LatLRR is set to 0.8. The maximum number of iterations is 50. If images contain excessive noise, the parameter can be appropriately increased. Guided filtering based on the proposed adaptive factor involves only one parameter r with the size of 3 × 3. Other parameters involved are described in the paper. All experiments are implemented in MATLAB 2016a on a computer with an Intel Core i7, 3.40-GHz CPU, 16 GB of RAM, and the Microsoft Windows 7 operating system.

Subjective Analysis
The experimental results of different methods for the target plate image are shown in Figure 6. The overall appearance of all methods is good, and the dynamic range of the image is enhanced from (b)~(e). However we can see that details of the green border are different from Figure 6j-m. Other methods lost details in the bright area, and the target information was lost. The proposed method better preserves the details, and the overall appearance of the fusion image is more appealing. From images (f)~(i), it can be seen that the image noise is high and the uniform wall is distorted by the method in [15,16]. The saturation factor in [15] is no longer suitable for gray images. Compared to the two methods, the patch decomposition approach made some progress in keeping details of the highlighted areas. The uniform whiteboard area is selected to calculate the SNR of the fusion images. The SNRs of the methods in [15,16,29] and our method are 32.4023, 32.6441, 31.1991 and 34.2309, respectively. It can be seen that under low-illumination imaging, the obvious advantage of this paper is denoising by using low-rank decomposition.  [16]. (c) The fusion image of [29]. (d) The fusion image of [15]. (e) The fusion image of our method. (f) Local image of method [16]. (g) Local image of method [29]. (h) Local image of method [15]. (i) Local image of our method. (j) Local image of method [16]. (k) Local image of method [29]. (l) Local image of method [15]. (m) Local image of our method. Figure 7 shows the fusion images of "Outdoor Roads". The proposed method preserves the information of most of the highlighted areas, and the whole image has better clarity and contrast as shown in Figure 7e. Mertens and the gradient method have better effect than patch decomposition in maintaining load information as shown Figure 7f-i. However, overall, the indication label on the road is visible, and the results by Mertens and the proposed method have more natural appearance with respect to the human visual system. Comparatively, the details and texture of the door are clear by the proposed method in Figure 7i, and the tree texture and numbers in the road are clear as shown in Figure 7m,q. The proposed method effectively preserves details in the brightest and darkest regions. Guided filtering based on an adaptive factor is used to avoid over-smoothing or under-smoothing, effectively reducing the block effect.  [16]. (c) The fusion image of [29]. (d) The fusion image of [15]. (e) The fusion image by our method. (f) The local image of by [16]. (g) The local image of [29]. (h) The local image of [15]. (i) The local image by our method. (j) The local image by [16]. (k) The local image by [29]. (l) The local image by [15]. (m) The local image by our method. (n) The local image by [16]. (o) The local image by [29]. (p) The local image by [15]. (q) The local image by our method. Figure 8 shows the roads and buildings after multi-exposure fusion. The overall contrast of all fusion images is lower. The images are taken at night, and the texture of the ground is clear as shown in Figure 8i. The proposed method and the method in [16] achieved better results for the wall, and the gray distribution of the wall is uniform without super-saturation as shown in Figure 8j,m.  [16]. (c) The fusion image of [29]. (d) The fusion image of [15]. (e) The fusion image of our method. (f) The local image by [16]. (g) The local image by [29]. (h) The local image by [15]. (i) The local image by our method. (j) The local image by [16]. (k) The local image by [29]. (l) The local image by [15]. (m) The local image by our method. The original low dynamic images are shown in Figure 9. Figure 10 shows the MEF fusion of computer screen images taken in the door. Overall, all four methods have restored some desk information and obtained computer screen information, as shown in Figure 10e-h. Moreover, local images are shown in Figure 10a-d. As shown from the local images, we can see that for the icon at the top left corner and the picture at the lower right corner of the computer in the highlighted area, the proposed method maintains better details and contour information, and the grayscale distribution of our method is natural at the same time. However, other images bring gray distortion to different degree.   [16]. (b) The local image of [29]. (c) The local image of [15]. (d) The local image of our method. (e) The fusion image of [16]. (f) The fusion image of [29]. (g) The fusion image of [15]. (h) The fusion image of our method. Figure 11 shows outdoor images with noise. The image with a low TDI series (short exposure) has higher noise under low-light imaging as shown in Figure 9c. The fusion image by this paper has fewer halos around lights and has a relatively clear texture of trees near the lights as show in Figure 11d. The image noise is reduced by our method in uniform sky area as shown in Figure 11h.  [16]. (b) The fusion image of [29]. (c) The fusion image of [15]. (d) The fusion image of our method. (e) The local image of [16]. (f) The local image of [29]. (g) The local image of [15]. (h) The local image of our method.

Objective Analysis
To quantitatively evaluate the performance of the proposed method, three objective indicators including mutual information (MI), edge retentiveness (Q AB/F ) and average gradient (AG) are chosen [38]. The MI value can be used to measure how much information from the original image is retained in the fusion image [29,32]. The AG value can sensitively reflect the details of the image, and can be used to evaluate the image blur [31,33]. Q AB/F measures how well the amount of edge information is kept in the fusion images [31]. For all three indicators, the larger the value is, the better the image quality. MI is defined as the sum of mutual information between each source image and the fusion image. The MI value reflects the total quantity of information in the fused image which is obtained from the input source images. For more than two images involved in multi-exposure fusion, Q AB/F is defined as the average of the value calculated by any two images and the fusion image [31]. Four state-of-the-art MEF algorithms in [15,16,28,29] are adopted to fuse 20 groups of gray images with different TDI series under low illumination.
The performance of the four algorithms is shown in Tables 1-3, and the largest indicator value is shown in bold. It is clear that the proposed method performs better than the other methods. They are almost the best values in all results. For the indicator MI and Q AB/F , the proposed method shows the best performance in 16 sets of image sequences. Moreover, for the rest of the image sequences, our method ranks in the top two overall. For the indicator AG, the proposed method shows the best performance in 14 sets of image sequences. Even if not the maximum, the indicators are relatively large. The proposed method preserves more detailed information than the other fusion methods and produces fewer artifacts and less noise.

Self-Comparison
To further verify advantages of some new ideas proposed by our method, we compared the fusion effect of the following three conditions marked as Con1, Con2 and Con3: • Con1: using the fixed proportion factor (λ c = 1 and λ s = 1) and not using the adaptive weight factor by Equation (12); • Con2: using the original guided filtering and not using the improved guided filtering by Equations (15)-(19); • Con3: using the weight map construction only by contrast factor and not using Equations (6)- (14).
In each case, we guarantee one single variable. This means there is only one strategy participating in the comparison changes, and the rest remains the same. The results of the above comparison are shown in Table 4, and it represent the average value of 20 sets of data. In Table 4, a comparison between each row and the last row reflects the improvement on the overall performance under the corresponding conditions. It can be seen from the first row that the energy equation proposed in this paper obtains the adaptive proportional factor (λ c and λ s ), which mainly affects the information entropy, and can ensure that the HDR image retains as much detail as possible. The second row shows that AG and Q AB/F greatly decrease when the improved guide filter is not used. The improved guided filtering avoids the over-smooth and under-smooth and is helpful in keeping the edge information. We can see from the third row that the weight map designed by our method for gray images under low-illumination imaging can maintain the detail in the process of high-dynamic fusion and has a great influence on all indicators.
To show the visual effect of the proposed ideas on the performance of our method more clearly, one comparative experiment is added. The fusion effect under three different situations is compared as shown in Figure 12. Figure 12a shows that the fixed factor (λ c = 1 and λ s = 1) cannot adjust the proportion of weight maps in the fusion process adaptively, leading to losses in detail. It can be seen from Figure 12b that in the absence of adaptive guided filtering, the block effect is relatively large, and the ability of edge protection is not sufficient. It is not difficult to find this from Figure 12c without using weight maps proposed in the paper, the image representation ability is reduced, and details of the fusion image in the highlighted and darkened areas are obviously lost.

Time Complexity Comparison
First, the complexity and efficiency of different algorithms are evaluated by the running time as shown in Table 5. These methods are representative of their fields. The more images involved in the fusion, the longer the processing time. Here, we consider two images involved in the calculation. The size of the images used by this paper is mostly 1000 × 1000. The average time is taken as the contrast result. Different iteration numbers of LatLRR used by our method are adopted. For an image with the size of 1000 × 1000, the super-pixel segmentation approach method used by reference [29] specifies the number of super-pixels as 200, and the fixed-sized patches are rectangles of 50 × 50 used by reference [28]. The methods in [15,16] are relatively efficient in running time. The weight calculation of both methods is relatively simple. An intermediate image is constructed in the gradient domain by method [16], which takes more time than that in [15]. The execution efficiencies of method [28] and the proposed method applying 20 iterations are not much different. Our method is time-consuming in the stage of low rank decomposition. The patch and structure decomposition technique is used in method [28,29], and it is time-consuming to compare and calculate parameters between blocks. Compared with the methods in [28,29] and our method, the method in [23] is more efficient. In addition, the time-consuming nature of this method depends mainly on the number of pyramid layers. The method in [33] divided the multi-exposure image into image detail layer and a contour layer by image decomposition. Moreover, for the image detail layer, a CNN network is used to achieve detail-layer fusion. This is relatively time-consuming. The CNN and sparse representation algorithm require too much time due to their own operating schemes.
Second, the running time for applying 10, 20 and 30 iterations of our method is counted as shown in Table 6. With an increase in the number of iterations, the running time of our method increases significantly. The main time-consuming item in this paper is the low-rank decomposition. The effect of the iteration number on the fusion performance is also analyzed, as shown in Table 6. The result in Table 6 is the average value of 20 sets of data. It is clear that with the increase of iteration numbers, the performance improves gradually, but the difference between 20 iterations and 30 iterations is not obvious. The number of iterations can be selected according to the practice requirement. At the same time, without increasing the amount of computation, the method of selecting the optimal iteration number used by our paper is given here. LatLRR is utilized to decompose the source image into global and local structure. We calculate the information entropy of the global structure image produced by different iterations. When the number of iterations increases, the information entropy changes slightly, and the iteration can be stopped, or the maximum number of iterations is reached. When ensuring that the indicator can meet the actual requirement, a relatively small number of iterations can be chosen. In summary, our method is not the fastest, and the running time increases as the iteration time increases, but this method is still efficient and compromises between the running time and quality.

Conclusions and Future Work
In this paper, a multi-exposure fusion of gray images under low illumination based on low-rank decomposition was presented. The latent low-rank representation was used on low-dynamic images to generate low-rank parts and saliency parts to reduce noise after fusion. The idea of this paper was to decompose image sequences separately in multi-scale space. Thus, two different weight maps were constructed according to features of gray images. To remove the artifacts, an improved guided filtering method based on an adaptive regularization factor was proposed to refine the weight maps. At the same time, the adaptive ratio of weight factors was calculated based on an energy equation to enhance the robustness of the method. Then, the decomposed low-rank images and saliency images were fused in Laplacian space using Gaussian pyramid weight factors to obtain a fused low-rank image and a fused saliency image. Finally, the high-dynamic image was reconstructed by adding the low-rank image and the saliency image.
Although the proposed method can produce high-quality in high dynamic images under low-illumination imaging, it is not suitable for real-time application. Remote sensing images have a large amount of data; therefore, the algorithm should be optimized to improve the efficiency for practical engineering applications.

Data Availability Statement:
The data presented in this study are available in the article.