A Novel Infrared and Visible Image Information Fusion Method Based on Phase Congruency and Image Entropy

In multi-modality image fusion, source image decomposition, such as multi-scale transform (MST), is a necessary step and also widely used. However, when MST is directly used to decompose source images into high- and low-frequency components, the corresponding decomposed components are not precise enough for the following infrared-visible fusion operations. This paper proposes a non-subsampled contourlet transform (NSCT) based decomposition method for image fusion, by which source images are decomposed to obtain corresponding high- and low-frequency sub-bands. Unlike MST, the obtained high-frequency sub-bands have different decomposition layers, and each layer contains different information. In order to obtain a more informative fused high-frequency component, maximum absolute value and pulse coupled neural network (PCNN) fusion rules are applied to different sub-bands of high-frequency components. Activity measures, such as phase congruency (PC), local measure of sharpness change (LSCM), and local signal strength (LSS), are designed to enhance the detailed features of fused low-frequency components. The fused high- and low-frequency components are integrated to form a fused image. The experiment results show that the fused images obtained by the proposed method achieve good performance in clarity, contrast, and image information entropy.


Introduction
Both infrared and visible images are widely used in daily life. Due to the difference in wavelength, infrared and visible light contain different image information. Infrared images can reflect all the objects that emit infrared radiation. Visible-light images can provide the scene details. No matter whether an infrared or visible-light image, it is difficult for an image captured by a single shot to contain all-in-focus images in one scene. Infrared-visible fusion techniques can effectively combine the complementary information, which are the indicative features and detailed information extracted from infrared and visible images, respectively [1]. In the fused infrared-visible image, the targeted item can be highlighted and the corresponding indicative features as well as detailed information are retained. At present, image fusion techniques as a type of image pre-processing methods, especially coefficients are combined with fusion influence factors, and the fused image is reconstructed by fusing sparse coefficients. This paper proposes a novel precise decomposition framework for infrared-visible image fusion, in which image energy and details can be preserved well. First, NSCT is used to decompose source images to obtain corresponding high-and low-frequency components. The high-frequency sub-bands of each decomposed layer contain different information. For the top decomposed layer, the activity level of high-frequency coefficients is measured by a PCNN model [17]. For other decomposed layers, the absolute value of each high-frequency coefficient is taken as the activity level value following the absolute (ABS) maximum rule [10]. For low-frequency bands, PC is used as the image feature, whose value is not affected by image brightness, contrast, and illumination intensity. According to the information of PC, LSCM, and LSS, the low-frequency fusion rule is formulated. This rule enhances the detailed features of each source image. Finally, the fused image is reconstructed by performing inverse NSCT on the fused high-and low-frequency images. The main contributions of this paper can be summarized as follows: • The high-and low-frequency components of source images are processed separately based on their own features.

•
It applies PCNN and ABS to high-frequency sub-bands of different layers, which achieves a more precise decomposition of high-frequency components.

•
The proposed image fusion algorithm can capture the details of source images well by integrating the advantages of NSCT, PCNN, and PC.
The rest of the sections of this paper are structured as follows: Section 2 proposes an infrared-visible image fusion framework based on an NSCT domain and specifies the corresponding technical details; Section 3 analyzes the results of comparative experiments; and Section 4 concludes this paper.

The Proposed Algorithm
The proposed infrared-visible image fusion framework is shown in Figure 1, which has four main steps: image decomposition, the fusion of both high-and low-frequency sub-bands, and image reconstruction. It decomposes source images into 5-layer high-and low-frequency sub-bands first. Then, it applies different methods to the fusion of high-and low-frequency sub-bands, respectively. The decomposed high-frequency sub-bands are further categorized into two parts, H l,k A,l<5 , H l,k B,l<5 , and H l,k A,l=5 , H l,k B,l=5 . H l,k A,l<5 and H l,k B,l<5 are fused by the method of maximum absolute value. The fused high-frequency sub-bands contain the overall image structure information. PCNN is used to fuse H l,k A,l=5 and H l,k B,l=5 . (The related details are explained in the following paragraph.) The fused low-frequency sub-bands retain the detailed information and the residual image information. Finally, it combines the fused high-and low-frequency sub-bands, which can make the fused image more informative.

NSCT
NSCT can overcome the frequency aliasing phenomenon caused by upsampling and downsampling on CT [18,19]. NSCT is a discrete image calculation framework that achieves shift-invariant, multi-scale, and multi-direction by using non-subsampled pyramid filter banks (NSPFBs) and non-subsampled directional filter banks (NSDFBs). Thus, the proposed solution uses NSCT to decompose source images into high-and low-frequency components.
Two source images are decomposed into high-frequency H l,k A , H l,k B and low-frequency {L A , L B } bands by performing L-level NSCT decomposition. H l,k A and H l,k B represent the high-frequency components at the decomposition level l and direction k of source image A and B, respectively, while L A and L B are the corresponding low-frequency components of source image A and B, respectively.

Fusion of High-Frequency Sub-Bands
The high-frequency sub-bands of different decomposed layers contain different information, which retains the overall image structure information. The maximum absolute value and PCNN fusion rules are applied to the fusion of different high-frequency sub-bands, which ensures that the structure information of source images is retained. (1), for the high-frequency sub-bands of the decomposed layer l = 5, the activity level of high-frequency coefficients is measured by PCNN fusion rule. In our previous experiments, we used the different number of decomposition layers to test the performance of the proposed solution many times. According to the objective evaluation metrics, the corresponding results were compared. There is one trade-off between the performance and processing time. The performance of four decomposition layers is poor, and the processing time of six decomposition layers is long. Five decomposition layers can use a relatively short time to achieve a good performance in PCNN fusion. The proposed solution uses two different methods to fuse the high-frequency sub-bands from five decomposition layers. The method of maximum absolute value is used to fuse the high-frequency sub-bands from 1-4 layers. PCNN is applied to the fusion of the high-frequency sub-bands from the 5th layer. The fusion effects of the high-frequency sub-bands can be effectively improved, which is confirmed by the comparative experiments:

As shown in Equation
In Equation (1), H l,k F (i, j) represents the fused high-frequency coefficients. H l,k F (i, j) l=5 represents the 5-level high-frequency fusion coefficients, which can be obtained by Equation (2). Equation (2) integrates the PCNN model, in which the entropy of the absolute value of high-frequency band is used as the network input. Then, the PCNN excitation times of high-frequency components M l,k A,ij [N] and M l,k B,ij [N] are calculated by Equation (3), where N denotes the number of iterations: where P ij [n] denotes the output model of PCNN [17]. Figure 2 shows the architecture of PCNN model used in the proposed image fusion method. In PCNN, F ij [n] and L ij [n] are the feeding input and the linking input of the neuron at position (x, y) in iteration n, respectively, which can be obtained by Equations (4) and (5).
where F ij [n] is related to the intensity of input image S ij during the whole iteration process. L ij [n] is associated with the previous exciting status of eight surrounding neurons through the synaptic weights shown in Equation (6): The parameter V L represents the amplitude of linking input. U ij [n] is the internal activity that consists of two terms, which can be calculated by Equation (7): In the first term, e −a f U ij [n − 1] is a decay of its previous value, where the parameter a f is an exponential decay coefficient. The second term F ij [n](1 + βL ij [n]) denotes the nonlinear modulation of L ij [n] and F ij [n], where the parameter β is the linking strength. The output module P ij [n] of the PCNN has two statuses, including excited (P ij [n] = 1) and unexcited (P ij [n] = 0): The status depends on its two inputs, which are current internal activity U ij [n] and previous dynamic threshold E ij [n − 1]. According to Equation (9), the iteration is updating the dynamic threshold, where a e and V E are the exponential decay coefficient and the amplitude of E ij [n], respectively.
Similarly, H l,k F (i, j) l<5 represents the 1-to-4 level high-frequency fusion coefficients, which can be obtained by Equation (10): In Equation (10), Entropy(H l,k A l<5 ) and Entropy(H l,k B l<5 ) represent the information entropy of high-frequency components H l,k A and H l,k B , respectively. The information entropy of high frequency component H l,k x can be calculated by Equation (11): where m and n are the total column and row number of H l,k x , and |H x l,k (i, j)| is the maximum entropy of the ABS. The maximum entropy of the ABS is used as the fusion measurement of high-frequency sub-bands.

Fusion Rule of Low-Frequency Sub-Bands
The low-frequency sub-bands of NSCT filtered images mainly describe the detailed information that corresponds to the texture and edge information of source images. In medical imaging, organ or cell lesions are often identified by the detailed information. Thus, the enhancement of detailed features from each source image is the key of low-frequency sub-bands fusion.
This paper uses PC to enhance image features that make low-frequency sub-bands more informative. PC as a dimensionless measure can evaluate the significance of each image feature. In low-frequency sub-bands, PC value reflects the sharpness of image object. Thus, PC is used as the phase of the coefficient with maximal local sharpness. Since an image can be regarded as 2D signals [9], PC of an image pixel at location (x,y) can be calculated by Equation (12).
where θ k is the orientation angle at k scale [9], A n,θ k denotes the amplitude of the n-th Fourier component, and angle θ k , ε is a positive constant to remove the PC components of image signals. E θ k (x, y) can be calculated by Equation (13): where F θ k (x, y) = ∑ n b n,θ k (x, y) and H θ k (x, y) = ∑ n c n,θ k (x, y). b n,θ k (x, y) and c n,θ k (x, y) are the convolution results of input image pixel at location (x,y), which can be evaluated by Equation (14): where I L (x, y) is the low-frequency image pixel value at location (x, y). M b n and M c n are the even-and odd-symmetry filters of 2D log-Gabor at scale n. As a contrast invariant, PC has defects that do not reflect the local contrast changes. To compensate the lack of PC, a measure of sharpness change (SCM) shown in Equation (15) is developed: where Ω 0 represents a local area at location (x, y). Meanwhile, LSCM shown in Equation (16) is introduced to calculate the contrast of location (x, y) neighborhood: where (2M + 1) × (2N + 1) denotes the neighborhood size. Since LSCM and PC cannot fully reflect the local signal strength, LSS shown in Equation (17) is introduced: where x ij is the pixel in location of this image patch, µ MN represents the mean value of this image patch.
As shown in Equation (18), a global measurement (GM) is proposed that integrates PC, LSCM, and LSS complements to measure different aspects of image information: where α, β, and γ are the parameters used in GM to adjust PC, LSCM, and LSS, respectively. When GM is obtained, the fused image of low-frequency sub-bands can be calculated by the rule proposed in Equation (19): where L F (x, y), L A (x, y), L B (x, y) are low-frequency sub-bands of the fused image, source image I A and I B , respectively. Lmap i (x, y) denotes a decision map for the fusion of low-frequency sub-bands, which can be calculated by Equation (20): where is the cardinality of a set, and Φ i (x, y) can be calculated by Equation (21). The cardinality of a set is helpful to obtain the abundant image details and structure information: where Ω 1 represents a sliding window with a size ofM ×Ñ centered at location (x,y), and K is the number of source images. GM defined in Equation (18) is expressed as a general term. In Equation (21), the subscript of GM is used to select the corresponding maximum value from source images. if the decomposed layer l = 5 of high-frequency sub-bands then 6: Measure the activity level of high-frequency coefficients by PCNN. 7: Obtain the 5 th layer fusion coefficient of high-frequency sub-bands by PCNN as follows: end if 9: if the decomposed layer l < 5 of high-frequency sub-bands then 10: Use the maximum entropy of the ABS of coefficient as the actually measured value of activity level.

11:
Obtain the first four-layer coefficients of high-frequency sub-bands as follows: end if 13: The obtained image of high-frequency sub-bands is H l,k F (i, j) = H l,k F (i, j) l=5 + H l,k F (i, j) l<5 14: end for 15: for each source image A and B do 16: It uses PC, LSCM, and LSS to design GM: GM (x, y) = (PC (x, y)) α · (LSCM (x, y)) β · (LSS (x, y)) γ

17:
Calculate the image of low-frequency sub-bands by the following rule: 18: end for 19: The inverse NSCT is applied to the obtained images of high-and low-frequency sub-bands {H F , L F } to get the fused image F.

Experiment Preparation
In comparative experiments, 30 sets of infrared-visible images are used to test the fusion performance. The resolution of test images are 256 * 256. The infrared wavelength is 700-2526 nm, and the visible wavelength is 390-700 nm. Infrared-visible image pairs were collected by Liu [10] and can be downloaded from quxiaobo.org. All the experiment's program's codes are programmed in Matlab 2014a (MathWorks, Natick, MA, USA) on an Intel(R) Core(TM)i7-4790CPU (Intel, Santa Clara, CA, USA) @ 3.60 GHz Desktop with 8.00 GB RAM.

Objective Evaluation Metrics
For the evaluation of fused image, a single evaluation metric cannot fully reflect the performance of fused image [20,21]. Therefore, it is necessary to use multiple metrics to do comprehensive performance analysis. This paper uses five objective metrics to evaluate the performances of different fusion methods, which include Q TE [22,23], Q AB/F [24,25], Q MI [23], Q CB [23,26], and Q V IF [25,27]. Q TE is used to evaluate the Tsallis entropy of the fused image. Q AB/F as a gradient-based quality index measures the edge information. Q MI is used to evaluate the similarity between the fused image and source images. Both Q CB and Q V IF measure the human visual performance of the fused image.

Experiment Results of Infrared-Visible Image Fusion
In this section, the proposed NSCT-based fusion framework is compared with seven popular fusion methods, such as the adaptive spare representation (ASR) based image fusion method proposed by Liu [28], the convolutional neural network (CNN) based image fusion method proposed by Liu [29], the multi-channel medical image fusion (CT) proposed by Zhu [25], the multi-modality image fusion method with joint patch clustering based dictionary learning (KIM) proposed by Kim [30], the image fusion based on multi-scale transform and sparse representation (MST-SR) proposed by Liu [10], a novel infrared and visual image fusion algorithm based on NSST and improved PCNN (NSST-PCNN) was proposed by Li [31], and an infrared and visible image fusion scheme based on NSCT and PC information (NSCT-PC) proposed by Li [9]. This section only picks the fused results of six comparative experiments from thirty attempts to analyze the fusion performance. Figure 3 shows the fused results of infrared-visible image fusion experiment-1. As shown in Figure 3c,f, the fused images obtained by ASR and KIM have low brightness. The light brightness in source image (a) is not well preserved in both (c) and (f), so images (c) and (f) have overall poor visual performance. The CNN method does not perform well in some local areas as shown in Figure 3d. According to the partially enlarged areas in Figure 3e, some local areas of the fused image obtained by CT have high brightness, and the image detailed information is not obvious. In Figure 3i, the saturation of the fused image is high, and the edge detailed information is not obvious. In addition, the fused image obtained by NSCT-PCNN has low contrast, and the global image features have poor performance. Compared the fused images (h) and (j) as well as the corresponding partially magnified images in Figure 3, NSCT-PC and the proposed method have the close visual performance of human eyes. Figure 4 shows the fused results of infrared-visible image fusion experiment-2. After the comparisons of fused images obtained by different methods, it gets the following conclusions.

Comparative Experiments-2
In Figure 4c,f, the fused images obtained by ASR and KIM have low brightness, and poor performance in global features. As shown in the magnified areas of Figure 4d,e, CNN does not obtain the clear details of fused image, the contrast of the partially enlarged image obtained by CT is high, and the corresponding edge information is not obvious. For the fused image (f) in Figure 4 obtained by KIM, the connection area of sky and forest has high edge brightness. As shown in Figure 4h, the fused image obtained by NSCT-PCNN has high brightness and poor visual effect. Compared with the experiment results of the other six fusion methods, the fused images obtained by NSCT-PC and the proposed method have better fusion performance.   Figure 5 shows the fused results of infrared-visible image fusion experiment-3. In Figure 5c,f, both ASR and KIM obtain the fused images with high brightness, and do not preserve the detailed information of source image (b). Comparing with ASR, the fused image obtained by KIM is fuzzy and not conducive to human-eye observation. As shown in Figure 5d, the fused image obtained by CNN has high saturation. In Figure 5e,g, the detail texture information of fused images obtained by CT and MST-SR is not clear by observing the partially enlarged areas. Compared with the proposed method, the fused image obtained by NSCT-PCNN in Figure 5h has low saturation and poor performance in global features. As shown in Figure 5i,j, NSCT-PC and the proposed method have good performance in both global and local features.  Figure 6c, the fused image obtained by ASR has a general visualization performance. As shown in Figure 6d, the car light has high brightness in the fused image obtained by CNN. In Figure 6e,g, the fused images obtained by CT and MST-SR are dark, and have poor overall visualization performance. As shown in Figure 6f,h, the fused images obtained by KIM and NSCT-PC have high brightness. After the analysis of detailed information, the detailed textures of fused images are not obvious, which are not conducive to human-eye observation. Comparing with NSCT-PC, the proposed method has better performance in both global and local features of source images.

Analysis of Comparative Experiment Results
As the analysis of 30 comparative experiments, Table 1 and Figure 7 show the average objective evaluation results of infrared-visible image fusion. In Table 1, all the best results are marked in bold. According to the results shown in Table 1 and Figure 7, the proposed method achieves the best performance in Q AB/F , Q MI , Q CB , and Q V IF , and the second best performance in Q TE . Q TE of the proposed method is a little bit lower than the best one obtained by NSST-PCNN. It means that both the proposed method and NSST-PCNN can retain more information of source images. Meanwhile, the similarities between the fused images obtained by these two methods and source images are also comparable. For the Q AB/F metric, the proposed method is slightly higher than other methods. Thus, the proposed method performs better in the preservation of image edge details. Additionally, the proposed method can also preserve the global and local features of source images well, and also achieve a good performance in human-eye visualization. As shown in Figure 7, the proposed method uses the shortest processing time in infrared-visible image fusion among all the eight fusion methods, which is much less than others as well as about 40% of the second shortest processing time. Thus, the results of comparative experiments confirm that the proposed infrared-visible image fusion solution has a low algorithm complexity and can effectively reduce the related costs.

Conclusions
In this paper, an NSCT-based precise high-frequency decomposition method for infrared-visible image fusion is proposed. The fusion method combines NSCT, PCNN model, and PC information to improve the visual quality of fused images. Specifically, the method uses NSCT to achieve the highand low-frequency decomposition of source images. The fusion of high-frequency image coefficients is realized by introducing PCNN and ABS as the activity metrics of high-frequency coefficients. In the fusion of low-frequency components, it integrates the fusion rules of LSCM, LSS, and PC features to achieve the energy preservation and detail extraction of low-frequency components. Finally, the fused image is obtained by inverse NSCT over the fused high-and low-frequency components. Compared to other image fusion methods, the proposed method achieves good performance on the structural similarity and detail preservation in fused images. The experiment results confirm that the proposed method has good effectiveness and high speed in infrared-visible image fusion.
In the future, the proposed method will be optimized to increase the processing speed. A weighted fusion will be explored to improve the fusion performance. The statistical tests, such as Friedman's test, will be introduced to compare the performance of the proposed method. The proposed image fusion method will also be extended to other multi-modality image fusion areas, such as medical image fusion, multi-focus image fusion, and so on as well as face recognition, especially in night scenes.