Improved Image Fusion Method Based on NSCT and Accelerated NMF

In order to improve algorithm efficiency and performance, a technique for image fusion based on the Non-subsampled Contourlet Transform (NSCT) domain and an Accelerated Non-negative Matrix Factorization (ANMF)-based algorithm is proposed in this paper. Firstly, the registered source images are decomposed in multi-scale and multi-direction using the NSCT method. Then, the ANMF algorithm is executed on low-frequency sub-images to get the low-pass coefficients. The low frequency fused image can be generated faster in that the update rules for W and H are optimized and less iterations are needed. In addition, the Neighborhood Homogeneous Measurement (NHM) rule is performed on the high-frequency part to achieve the band-pass coefficients. Finally, the ultimate fused image is obtained by integrating all sub-images with the inverse NSCT. The simulated experiments prove that our method indeed promotes performance when compared to PCA, NSCT-based, NMF-based and weighted NMF-based algorithms.


Introduction
Image fusion is an effective technology that synthesizes data from multiple sources and reduces uncertainty, which is beneficial to human and machine vision. In the past decades, it has been adopted OPEN ACCESS in a variety of fields, including automatic target recognition, computer vision, remote sensing, robotics, complex intelligent manufacturing, medical image processing, and military purposes. Reference [1] proposed a framework for the field of image fusion. The fusion process is performed at different levels of the information representation, which is sorted in ascending order of abstraction: pixel, feature, and decision levels. Of these, pixel-level fusion has been broadly studied and applied for it is the foundation of other two levels.
Until recently, the multi-resolution decomposition based algorithms have been widely used in the multi-source image fusion field, and effectively overcome spectrum distortion. Wavelet transformation provides great time-frequency analytical features and is the focus of multi-source image fusion. NSWT is made up of the tensor product of two one-dimension wavelets, solving the shift-invariant lacking problem that the traditional wavelets cannot do. Being lacking in anisotropy, NSWT fails to express direction-distinguished texture and edges sparsely. In 2002, Do and Vetteri proposed a flexible contourlet transform method that may efficiently detect the geometric structure of images attributed to their properties of multi-resolution, local and directionality [13], but the spectrum aliasing phenomenon occurs posed by unfavorable smoothness of the basis function. Cunha et al. put forward the NSCT method [15] in 2006; improvements have been made in solving contourlet limitations, and it was an ultra-perfect transformation with attributes of shift-invariance, multi-scale and multi-directionality [16].
Non-Negative Matrix Factorization (NMF) is a relatively new matrix analysis method [17] presented by Lee and Seung in 1999, and has been proven to converge to its local minimum in 2000 [18]. It has been successfully adopted in a variety of applications, including image analysis [19,20], text clustering [21], speech processing [22], pattern recognition [23][24][25], and so on. Unfortunately, some NMF-involved works are time consuming. In order to reduce time costs, an improved NMF algorithm has been introduced in this paper. Our improved NMF algorithm is applied to fuse the low-frequency information in he NSCT domain, while the fusion of high-frequency details can be realized by adopting the Neighborhood Homogeneous Measurement (NHM) technique used in reference [26]. The experimental results demonstrate that the proposed fusion method can effectively extract useful information from source images and inject it into the final fused one which has better visual effects, and the running of the algorithm takes less CPU time compared with the algorithms proposed in [27] and [18].
The remainder of this paper is organized as follows: we introduce NSCT in Section 2. This is followed by a brief discussion on how NMF is constructed, and how we improve it. Section 4 presents the whole framework of the fusion algorithm. Section 5 shows experimental results for image fusion using the proposed technique, as well as the discussion and comparisons with other typical methods. Finally, the last Section concludes with a discussion of our and future works.

Non-Subsampled Contourlet Transform (NSCT)
NSCT is proposed on the grounds of contourlet conception [13], which discards the sampling step during the image decomposition and reconstruction stages. Furthermore, NSCT presents the features of shift-invariance, multi-resolution and multi-dimensionality for image presentation by using a nonsampled filter bank iteratively.
The structure of NSCT consists of two parts, as shown in Figure 1(a): Non-Subsampled Pyramid (NSP) and Non-Subsampled Directional Filter Banks (NSDFB) [15]. NSP, a multi-scale decomposed structure, is a dual-channel non-sampled filter that is developed from the àtrous algorithm. It does not contain subsampled processes. Figure 1(b) shows the framework of NSP, for each decomposition of next level, the filter H (z) is firstly sampled an using upper-two sampling method, the sampling matrix is D = (2, 0; 0, 2). Then, low-frequency components derived from the last level are decomposed iteratively just as its predecessor did. As a result, a tree-like structure that enables multi-scale decomposition is achieved. NSDFB is constructed based on the fan-out DFB presented by Bamberger and Smith [28]. It does not include both the super-sampling and sub-sampling steps, but relies on sampling the relative filters in DFB by treating D = (1, 1; 1, −1), which is illustrated in Figure 1(c). If we conduct L levels of directional decomposition on a sub-image that decomposed by NSP in a certain scale, then 2 L number of band-pass sub-images, the same size to original one, are available. Thus, one low-pass sub-image and band-pass directional sub-images are generated by carrying out L levels of NSCT decomposition.

Nonnegative Matrix Factorization (NMF)
NMF is a recently developed matrix analysis algorithm [17,18], which can not only describe low-dimensional intrinsic structures in high-dimensional space, but achieves linear representation for original sample data by imposing non-negativity constraints on its bases and coefficients. It makes all the components non-negative (i.e., pure additive description) after being decomposed, as well as realizes the non-linear dimension reduction. NMF is defined as: Conduct N times of investigation on a M-dimensional stochastic vector v, then record these data as v j , j = 1,2,…, N, [17]. The equation can also be wrote in a more intuitive form of that . .
In the purpose of finding the appropriate factors W and H, the commonly used two objective functions are depicted as [18]: In respect to Equations (1) and (2), ∀i, a, j subject to W ia > 0 and H aj > 0, a is a integer. ||•|| F is the Frobenius norm, Equation (1) is called as the Euclid distance while Equation (2) is referred to as K-L divergence function. Note that, finding the approximate solution to V ≈ WH is considered equal to the optimization of the above mentioned two objective functions.

Accelerated Nonnegative Matrix Factorization (ANMF)
Roughly speaking, the NMF algorithm has high time complexity that results in limited advantages for the overall performance of algorithm, so that the introduction of improved iteration rules to optimize the NMF is extremely crucial to promote the efficiency. In the point of algorithm optimization, NMF is a majorization problem that contains a non-negative constraint. Until now, a wide range of decomposition algorithms have been investigated on the basis of non-negative constraints, such as the multiplicative iteration rules, interactive non-negative least squares, gradient method and projected gradient [29], among which the projected gradient approach is capable of reducing the time complexity of iteration to realize the NMF applications under mass data conditions. In addition, these works are distinguished by meaningful physical significance, effective sparse data, enhanced classification accuracy and striking time decreases. We propose a modified version of projected gradient NMF that will greatly reduce the complexity of iterations; the main idea of the algorithm is listed below.
As we know, the Lee-Seung algorithm continuously updates H and W, fixing the other, by taking a step in a certain weighted negative gradient direction, namely: where η ij and ζ ij are individual weights for the corresponding gradient elements, which are expressed like follows: and then the updating formulas are: We notice that the optimal H related to a fixed W can be obtained, column by column, by independently: where e j is the j th column of the n × n identity matrix. Similarly, we can also acquire the optimal W relative to a fixed H by solving, row by row: where e i is the i th column of the m × m identity matrix. Actually, both Equations (7) and (8) can be changed into an ordinary form: where A ≥ 0 and b ≥ 0. As the variables and given data are all nonnegative, the problem is therefore named the Totally Nonnegative Least Squares (TNNLS) issue. We propose to revise the algorithm claimed in article [17] by using the same update rule with step-length α in [27] to the successive updates in improving the objective functions about the two TNNLS problems mentioned in Equations (7) and (8). As a result, this brings about a modified form of the Lee-Seung algorithm that successively updates the matrix H column by column and W row by row, with individual step-length α and β for each column of H and each row of W respectively. So we try to write the update rule as: where η ij and ζ ij are set equal to some small positive number as described in [27], α j (j = 1,2,…,n) and β i (i = 1,2,…,m) are step-length parameters can be computed as follows. Let x > 0, where the symbol "./" means component-wise division and " " denotes multiplication. Then we introduce variable ô ∈(0, 1): We can easily obtain the step-length formula of α j or β i if (A, b, x) is replaced by (W, Ae j , He j ) or (H T , A T e i , W T e i ), respectively. It is necessary to point out that q is the negative gradient of the objective function, and the search direction p is a diagonally scaled negative gradient direction. The step-length α or β is either the minimum of the objective function in the search direction or a τ-fraction of the step to the boundary of the nonnegative quadrant.
Learning from article [27] that both quantities, p T q/p T A T Ap and max{â : x + âp ≥ 0} are greater than 1 in the definition of the step α, thereby, we make α j ≥ 1 and β i ≥ 1 by treating τ sufficiently close to 1. In our experiment, we choose τ = 0.99 which practically guarantees that α and β are always greater than 1.
Obviously, when α←1 or β←1, update Equations (10) and (11) reduce to updates Equations (3) and (4). In our algorithm, the step-length parameters are allowed to be greater than 1. It is this indicates that for any given (W, H), we can get at least the same or greater decrease in the objective function than the algorithm in [27]. Hence, we call the proposed algorithm the Accelerated NMF (ANMF). Besides, the experiments in Section 5.5 will demonstrate that ANMF algorithm is indeed superior to that algorithm by generating better test results, especially when the amount of iterations is not too big.

The Selection of Fusion Rules
As we know, approximation of an image belongs to the low-frequency part, while the high-frequency counterpart exhibits detailed features of edge and texture. In this paper, the NSCT method is utilized to separate the high and low components of the source image in the frequency domain, and then the two parts are dealt with different fusion rules according to their features. As a result, the fused image can be more complementary, reliable, clear and better understood.
By and large, the low-pass sub-band coefficients approximate the original image at low-resolution; it generally represents the image contour, but high-frequency details such as edges, region contours are not contained, so we take the ANMF algorithm to determine the low-pass sub-band coefficients which including holistic features of the two source images. The band-pass directional sub-band coefficients embody particular information, edges, lines, and boundaries of region, the main function of which is to obtain as many spatial details as possible. In our paper, a NHM based local self-adaptive fusion method is adopted in band-pass directional sub-band coefficients acquisition phase, by calculating the identical degree of the corresponding neighborhood to determine the selection for band-pass coefficients fusion rules (i.e., regional energy or global weighted).

The Course of Image Fusion
Given that the two source images are A and B, with the same size, both have been registered, F is fused image. The fusion process is shown in Figure 2 and the steps are given as follows ( Figure 2): [ , ] where v A and v B are column vectors consisting of pixels coming from A and B, respectively, according to principles of row by row. n is the number of pixels of source image. We perform the ANMF algorithm described in Section 3.2 on V, from which W that is actually the low-pass sub-band coefficients of fused image F is separated. We set maximum iteration number as 1,000 with τ = 0.99. The fusion rule NHM is applied to band-pass directional sub-band coefficients , ( , ) where E i,l (m, n) is regarded as the neighborhood energy under resolution of 2 l in direction i, N i,l (m, n) is the 3 × 3 neighborhood centers at point (m, n). In fact, NHM quantifies the identical degree of corresponding neighborhoods for two images, the higher the identical degree is, the greater the NHM value should be. Because 0 ≤ NHM i,l (m, n) ≤ 1, we define a threshold T; generally we have it that 0.5 < T < 1. As the quality of fusion image is partly influenced by T (see Table 1), we take two factors into consideration [i.e., when T =0.75 the SD (Standard Deviation) and AG (Average Gradient) are better], so the threshold is given as T = 0.75. The fusion rule of band-pass directional sub-band coefficients is expressed as: when NHM i,l (m, n) < T:

Experimental Conditions and Quantified Evaluation Indexes
To verify the effectiveness of the proposed algorithm, three groups of images are used under the MATLAB 7.1 platform in this Section. All source images must be registered and with 256 gray levels.
By comparison with the five typical algorithms below: NSCT-based method (M1), NMF-based method (classic NMF, M2), weighted NMF-based method (M3), PCA and wavelet, we can learn more about the one presented in our paper.
It may be possible to evaluate the image fusion subjectively, but subjective evaluation is likely affected by the biases of different observers, psychological status and even mental states. Consequently, it is absolutely necessary to establish a set of objective criteria for quantitative evaluation. In this paper, we select the Information Entropy (IE), Standard Deviation (SD), Average Gradient (AG), Peak Signal to Noise Ratio (PSNR), Q index [30], Mutual Information (MI), and Expanded Spectral Angle Mapper (ESAM) [31] as our evaluation metrics. IE is one of the most important evaluation indexes, whose value directly reflects the amount of information in the image. The larger the IE is the more information is contained in a fused image. SD indicates the deviation degree between the gray values of pixels and the average of the fused image. In a sense, the fusion effect is in direct proportion to the value of the SD. AG is capable of expressing the definition extent of the fused image, the definition extent will be better with an increasing AG value. PSNR is the ratio between the maximum possible power of a signal and the power of corrupting noise. The larger the PSNR is, the better is the image. MI is a quantity that measures the mutual dependence of the two random variables, so a better fusion effect makes for a bigger MI. Q index measures the amount of edge information "transferred" from source images to the fused one to give an estimation of the performance of the fusion algorithm. Here, larger Q value means better algorithm performance. ESAM is an especially informative metric in terms of measuring how close the pixel values of the two images are and we take the AE (average ESAM) as an overall quality index for measuring the difference between the two source images and the fused one. The higher the AE, the less the similarity of two images will be. The AE is computed using a sliding window approach, in this work, sliding windows with a size of 16 × 16, 32 × 32, and 64 × 64 are used.

Multi-Focus Image Fusion
A pair of "Balloon" images are chose to be source images, both are 200 by 160 in size. As can be seen from Figure 3(a), the left side of the image is in focus while the other side is out of focus. The opposite phenomenon emerges in Figure 3(b). Six variant approaches, M1-M3, PCA, wavelet (bi97), and our method, are applied to test the fusion performance. Figure 3(c-h) show the simulated results.
From an intuitive point of view, the M1method produces a poor intensity that makes Figure 3(c) somewhat dim. On the contrary, the other five algorithms generate better performance in this aspect, but artifacts located in the middle right of Figure 3(e) can be found. Compared with the M2 and M3 methods, although the definition of the bottom left region in our method is slightly lower than that of the two algorithms, the holistic presentation is superior to the two. As for PCA and wavelet, the similar visual effects as Figure 3 Figure 5(a) has a clear background, but infrared thermal sources cannot be detected. Conversely, Figure 5(b) highlights the person and house but its ability to render other surroundings is weak. Effective fusions are achieved by the six methods. After concrete analysis on the six fused images, we draw the following conclusions: we can find that the image based on method M1 is the worst in overall effect, especially a dark area around the person, which is partly caused by the significant differences between two source images. Method M2 produces more smooth details than M1, as a case in point, the road on the right side of the image and the grass on the other side can easily be recognized for the enhancement of intensity. Approximate effects displayed in Figure 5(e-h) are achieved by using M3, PCA, wavelet and our method, from which we can easily distinguish most parts of the scene except the lighting beside the house in Figure 5(e) that can hardly be observed. It is difficult to judge the performances of the latter four methods through visual observation in case of the concrete data are not provided by Table 6.
In so far as IE, AG, and PSNR are concerned, the proposed technique is evidently better than the former four ones. Specially, the value of our method exceeds them by 1.6%, 4.9%, and 0.7% while the SD is slightly smaller when compared with M3. In index Q, the optimal value is obtained on the basis of the wavelet approach, while that of M1 holds the final place. As for MI, our method still ranks the first place in Table 6. Analogous effects are achieved in Table 7, statistics show that the similarities between visible light, infrared and fused images generated by our method are the best in that both AE aF and AE bF are the smallest.

Numerical Experiment on ANMF
In this section, we compare the performance of ANMF with that of algorithms presented in article [27] and [18] in order to prove its advantages. The algorithms are implemented in Matlab and applied to the Equinox face database [32]. The contrast experiments are conducted four times, where p is as described in Section 3.2 and n denotes for the number of images chosen from the face database. The Y axis of Figure 6 represents the number of iterations repeated by the three algorithms and the X axis is the time consumption scale. We choose one group of these experiments and demonstrate the results in Figure 6 with p = 100 and n = 1,000, in which algorithm in [18] is first performed for a given number of iterations and record the time elapsed and then run algorithm in [27] and our algorithm until the time consumed is equivalent to that of the former, respectively. We note that our algorithm offers improvements in all given time points, however, the relative improvement percentage of our method over other two algorithms goes down when the number of iterations increases. Actually, the performance of our method increases about 36.8%, 26.4%, 15.7%, 12.6%, 7.5% and 37.9%, 29.6%, 19.4%, 17.8%, 12.6%, respectively, when comparing with the algorithms in [27] and [18] at each of five time points. In other words, our method converges faster, especially at the early stages, but the percentage tends to decline, which implies that this attribute is merely useful for real-time applications without very large-scale data sets.

Discussion
Image fusion with different models and numerical tests are conducted in our experiments, where the above four experiments indicate that the proposed method has a notable superiority in image fusion performance over the four other techniques examined (see , and has better iteration efficiency (see Section 5.5). We observed that images based on wavelet and our proposed methods enjoy the best visual effect, and then the PCA, M3, M2, and M1 are the worst. In addition to visual inspection, quantitative analysis is also conducted to verify the validity of our algorithm from the angles of information amount, statistical features, gradient, signal to noise ratio, edge preservation, information theory and similarity of structure. The values in these metrics prove that the experiments achieve the desired objective.

Conclusions
In this paper, we have presented a technique for image fusion based on the NSCT and ANMF model. The accelerated NMF method modifies the traditional update rules of W and H, which achieves better effect by adopting the theory of matrix decomposition. The current approaches on the basis of NMF usually need more iterations to converge when compared to the proposed method, but the same or better results can be attained by our technique via less iterations. The results of simulation experiments show that the proposed algorithm can not only reduce computational complexities, but achieve better or equal performances when compared with other mentioned techniques both from the visual and statistical standpoints. The optimization for our method will be the next step in order to apply it in large scale data sets.