Next Article in Journal
Public Key-Based Authentication and En-Route Filtering Scheme in Wireless Sensor Networks
Next Article in Special Issue
Three-Dimensional Holographic Electromagnetic Imaging for Accessing Brain Stroke
Previous Article in Journal
Fast and Reliable Determination of Virgin Olive Oil Quality by Fruit Inspection Using Computer Vision
Previous Article in Special Issue
High-Resolution Aerial Imagery Semantic Labeling with Dense Pyramid Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fusing Infrared and Visible Images of Different Resolutions via Total Variation Model

1
Electronic Information School, Wuhan University, Wuhan 430072, China
2
Air Force Early Warning Academy, Wuhan 430019, China
*
Author to whom correspondence should be addressed.
Sensors 2018, 18(11), 3827; https://doi.org/10.3390/s18113827
Submission received: 9 September 2018 / Revised: 3 November 2018 / Accepted: 5 November 2018 / Published: 8 November 2018
(This article belongs to the Special Issue Sensors Signal Processing and Visual Computing)

Abstract

:
In infrared and visible image fusion, existing methods typically have a prerequisite that the source images share the same resolution. However, due to limitations of hardware devices and application environments, infrared images constantly suffer from markedly lower resolution compared with the corresponding visible images. In this case, current fusion methods inevitably cause texture information loss in visible images or blur thermal radiation information in infrared images. Moreover, the principle of existing fusion rules typically focuses on preserving texture details in source images, which may be inappropriate for fusing infrared thermal radiation information because it is characterized by pixel intensities, possibly neglecting the prominence of targets in fused images. Faced with such difficulties and challenges, we propose a novel method to fuse infrared and visible images of different resolutions and generate high-resolution resulting images to obtain clear and accurate fused images. Specifically, the fusion problem is formulated as a total variation (TV) minimization problem. The data fidelity term constrains the pixel intensity similarity of the downsampled fused image with respect to the infrared image, and the regularization term compels the gradient similarity of the fused image with respect to the visible image. The fast iterative shrinkage-thresholding algorithm (FISTA) framework is applied to improve the convergence rate. Our resulting fused images are similar to super-resolved infrared images, which are sharpened by the texture information from visible images. Advantages and innovations of our method are demonstrated by the qualitative and quantitative comparisons with six state-of-the-art methods on publicly available datasets.

1. Introduction

Compared with information provided by a single sensor image, multi-sensor images can supply additional information with feature complementarity for complex and complete scene representations and improved visual understanding. By integrating useful information from different sensor images, we can generate a single composite image, which is informative and suitable for follow-up processing and decision-making. In particular, infrared and visible image fusion is an indispensable branch that plays an important role in military and civilian applications [1] and has been used to enhance the performance in terms of human visual perception, object detection, and target recognition [2,3,4,5,6,7,8].
Visible sensors can capture reflected light to provide background details, such as vegetation, texture, area, and soil with high spatial resolution because they operate in the same waveband that can be recognized by human vision [9]. However, under unsatisfactory light conditions, scene information is lost. By contrast, infrared sensors map thermal radiation emitted from objects to gray images; therefore, these sensors exhibit unique advantages in overcoming adverse light conditions and highlighting thermal targets, such as pedestrians and vehicles. Nevertheless, due to limitations of hardware and environments, infrared images are often accompanied by blurred details, serious noise, and considerably low resolution. These limitations encourage the fusion of infrared and visible images to produce a single image, which can simultaneously highlight thermal radiation for target detection and reserve texture information for characterization of appearance.
According to different representations and applications, the fusion process can be conducted at the following three levels [10]: pixel, feature, and symbol levels. Pixel-level fusion is on the lowest level and fuses different physical parameters. Feature-level fusion involves feature descriptor, probability variables, and object labels. Symbol-level fusion uses high abstraction of information, such as representative symbols or decisions, for fusion. The focus of this paper is on pixel level. From another perspective, existing infrared and visible image fusion methods are divided into eight categories according to their corresponding theories [1], including multi-scale transform-based methods [11,12], sparse representation-based methods [13,14], neural network-based methods [15,16], subspace-based methods [17,18], saliency-based methods [19,20], hybrid methods [21,22], deep learning-based methods [23,24], and other fusion methods [10,25].
Despite the notable progress of image fusion, fusing infrared and visible images of different resolutions remains a challenging task. Infrared images constantly suffer from considerably lower resolution compared with visible images. Under this condition, if existing methods are applied to fusing source images, then eliminating resolution differences, through downsampling visible images or upsampling infrared images as an example, is the first problem to solve. As shown in Figure 1, to intuitively illustrate the fusion process, we take a typical image pair and a typical fusion method, such as hybrid multi-scale decomposition (HMSD) [26], as an example. In procedure (a), the visible image is downsampled before using as a source image for fusion, and the fused image is upsampled considering consistency and comparability. Although the fused image can be forcibly converted into high resolution, the discarded high-quality texture information in the visible image caused by downsampling operation by image processing cannot be retrieved from the viewpoint of information theory; hence, the clarity of the fused image is not improved. In procedure (c), the original infrared image is upsampled to high resolution before fusion. As shown in the figure, upsampling results in blur and inaccuracy in the infrared image, which is also fused into the resulting image; hence, the fused image in procedure (c) is accompanied by serious noise. In addition, the principle of existing fusion rules typically focuses on preserving texture details in source images. This condition may be problematic in the fusion of infrared and visible images because infrared thermal radiation information is characterized by pixel intensities and usually contains few texture details that may fail to be fused in the resulting image. However, such information is often crucial to highlight the targets. This can be shown in procedures (a) and (c), in which the fused image loses the property of high contrast similar to that in the infrared image, submerging the target in the clutter background.
To address the above-mentioned challenges, in this paper, we propose a new algorithm termed as Different Resolution Total Variation (DRTV) for infrared and visible image fusion. In particular, we formulate the fusion problem as a convex optimization problem characterized by total variation model and seek the optimal fused image that minimizes an energy function. This formulation accepts different resolutions of infrared and visible images. The energy consists of a data fidelity term and a regularization term. The data fidelity term constrains that the downsampled fused image should present similar pixel intensities with the original infrared image. The regularization term ensures that the gradient distribution in the visible image can be transferred into the fused image. The two terms can guarantee that our fusion result contains the thermal radiation information from the infrared image and the texture details from the visible image. To optimize the solution procedure, we apply the fast iterative shrinkage-thresholding algorithm (FISTA) [27] framework to accelerate convergence, e.g., the convergence speed is increased to O 1 / k 2 with k being the iteration number. Owing to the preceding causes, the cost for acquiring high-quality infrared and visible fused images can be considerably reduced, thereby providing substantial benefit to military equipment and civilian industries relying on infrared and visible image fusion. For intuitive comparison, Figure 1b shows the fused result of DRTV on the same source image pair, where we zoom in a small region in the red box for clear comparison. Figure 1 shows that the fused image generated by our DRTV, i.e., Figure 1b, is clearer with less noise compared with Figure 1a,c due to the forced resolution conversion. In addition, our fusion result looks like a super-resolved infrared image, which preserves the property of high contrast in the infrared image, clearly highlighting the target.
The contributions of this paper include the following two aspects. On the one hand, the proposed DRTV makes a breakthrough in fusing visible and infrared images of different resolutions, which can considerably prevent loss of texture information caused by compression or blur and inaccuracy due to forced upsampling, without expenses on upgrading hardware devices. On the other hand, the fused images obtained by DRTV visually benefit from rich texture information; simultaneously, the thermal targets are easily detected. By changing the regularization parameter in our model, we can subjectively adjust the visual similarity between the fused image and visible/infrared image.
The remainder of this paper is organized as follows. Section 2 describes the problem statement and formulation for infrared and visible image fusion. Section 3 provides the optimization strategy of our algorithm. In Section 4, we compare DRTV with several state-of-the-art methods on publicly available datasets and analyze the results. Conclusions and future works are presented in Section 5.

2. Problem Statement and Formulation

We denote Vis R m × n the visible image of size m × n and IR R m c × n c the infrared image of size m c × n c , where c is a constant denoting the ratio between the infrared image resolution and the resolution of the visible image. X R m × n denotes the fused image. Note that the size of X is m × n rather than m c × n c .
Our goal for the given Vis and IR is to fuse them to generate a high-resolution fused image, which should virtually retain important information in source images, e.g., thermal radiation information in the infrared image, and texture features in the visible image.

2.1. Data Fidelity Term

Previous methods generally apply a restriction on source images, in which visible and infrared images should share the same resolution. When this restriction is not satisfied, downsampling the visible image is one solution, but the information translated into the fused image also is similarly limited. Employing the upsampled infrared image as the prior knowledge to maintain thermal radiation information is another solution. However, thermal information after upsampling is constantly blurred and inaccurate. If we constrain fused information to follow the blurred source, then the fused result is even less accurate. Therefore, we assume in this study that the fused image after downsampling possesses similar pixel intensities to the original infrared image. Least squares fitting is used to model this relationship:
ε 1 X = 1 2 ψ X IR F 2 ,
where · F denotes the Frobenius norm of matrix, which is defined as X F = i , j X i , j 2 . ψ denotes the downsampling operator of the nearest interpolation. If the high-resolution image is defined as S M × N and the downsampled image is O m × n = ψ S M × N , then the operator ψ can be denoted as O i , j = S i × M m , j × N n , where · represents rounding the data to the nearest integer. ε 1 X is a data fidelity term. According to the preceding constraint, the thermal target remains prominent in the fused image.

2.2. Regularization Term

The regularization term is essentially the prior knowledge used to avoid overfitting of least squares. One requirement is to allow thermal targets in the fused image to remain noticeable. Additional detailed appearance information in the visible image is expected to be preserved in the fused result. The direct approach is to constrain similar pixel intensity distribution in the fused and visible image. However, the same constraint is applied in ε 1 X , and pixel intensity distributions of infrared and visible images considerably vary due to their different manifestations. Therefore, this approach is contradictory, and we attempt to search another feature to characterize appearance information in the visible image.
According to the human visual perception system, image edges and textures are abundant when the spatial frequency is large. Moreover, the spatial frequency is a metric based on gradient distribution. Understanding that the detailed appearance information is basically represented by gradient distribution of the image is easy. On the basis of this observation, we require that the fused image should present similar gradients with the visible image, which is represented by the p norm ( p 0 ):
ε 2 X = X Vis p ,
where ∇ is the gradient operator and ε 2 X should be as small as possible.
By using i , j = i , j h , i , j v to represent the image gradient at pixel ( i , j ) and x i , j to denote the pixel of the i-th row and j-th column in the image, we obtain the following:
i , j h x = x i , j x i , j + 1 , i , j v x = x i , j x i + 1 , j ,
i , j x = i , j h x 2 + i , j v x 2 .
If i is the last row of the image, then we set i , : v x = 0 . Similarly, if j is the last column, then we set : , j h x = 0 .
We now consider the p norm in ε 2 X . Given that the image intensities are often piece-wise smooth across the image domain, their gradients tend to be sparse, and the non-zero elements usually correspond to the boundaries [28,29]. For constraining the fused image to exhibit similar pixel gradient distribution with the visible image, the differences of gradients between the images should be as sparse as possible; that is, the number of non-zero entries of gradient difference should be as small as possible. This condition is theoretically equivalent to minimizing the 0 norm of gradient difference, i.e., p = 0 . However, we must traverse all cases to obtain the 0 norm of a matrix; hence, obtaining the 0 norm is non-deterministic polynomial-time hard (NP-hard). Fortunately, for vectors, the 1 norm is the optimal convex approximation of the 0 norm and can be easily solved. The restricted isometry condition [30] also theoretically guarantees the exact recovery of sparse solution by 1 norm. For a vector v R m n × 1 , the 1 norm is defined as v 1 = i = 1 m n v i . Specifically, if the energy function is the 1 norm of the gradient, it is a total variation model [10,31]. If ε 2 X is calculated by a norm of vectors, we can directly set the p norm as 1 norm. However, in Equation (2), ε 2 X is the norm of the matrix. For the matrix X Vis , if the optimal convex approximation of vectors above is to be applied, the norm of the matrix should be equal to the sum of absolute values of all elements. Note that X Vis p is a norm on the gradient and the discrete isotropic TV-norm [32], i.e., · T V I is defined by the sum of gradients of all elements, which is defined as:
x T V I = i = 1 m 1 j = 1 n 1 x i , j x i + 1 , j 2 + x i , j x i , j + 1 2 + i = 1 m 1 x i , n x i + 1 , n + j = 1 n 1 x m , j x m , j + 1 ,
where x is a matrix of size m × n . It motivates us to convert the minimization of the 1 norm to that of the TV-norm.
In addition, another important advantage of the regularization based on · T V I is its insensitivity to outliers, which always correspond to sharp edges in image processing. Therefore, we use · T V I to define ε 2 X as follows:
ε 2 X = X Vis T V I .

2.3. Formulation of the Fusion Problem

Taking comprehensive consideration of ε 1 X and ε 2 X , positive regularization parameter λ is introduced herein to control the trade-off between the data fidelity term and the regularization term. Then, the fusion problem can be expressed as follows:
X = arg min X ε X = ε 1 X + λ ε 2 X ,
which is equivalent to:
X = arg min X ε X = 1 2 ψ X IR F 2 + λ X Vis T V I .
Thus far, the fusion problem is converted to solve the fused image X in Equation (8), which minimizes the energy function ε X .

3. Optimization

The optimization problem (8) is clearly a convex optimization problem; therefore, a global optimal solution that minimizes ε X exists. In our model, the data fidelity term is smooth, while the regularization term is non-smooth; thus, we can conclude that problem (8) is in the form of:
X = arg min X F X = f X + g X ,
where f is a smooth convex function, of which the gradient is Lipschitz continuous, and g is a continuous, but possibly non-smooth, convex function. Specifically,
f X = 1 2 ψ X IR F 2 , g X = λ X Vis T V I .
The iterative shrinkage-thresholding algorithm (ISTA) [33] can be used to resolve the preceding problem. Although ISTA can solve general convex optimization problems, its convergence rate is proven to be only O 1 / k with k being the iteration number. For optimizing the slow convergence of ISTA, Beck and Teboulle proposed FISTA [27]. Its convergence rate can be improved to O 1 / k 2 . The iterative step in FISTA can be rewritten as follows:
X k = arg min X L 2 X Y k 1 L f Y k F 2 + λ X Vis T V I .
We denote ψ T as the inverse operation of ψ and set Y = Y k ψ T ψ X k 1 IR / L , where L is a Lipschitz constant of ψ T ψ X k 1 IR , which can be set as 1 herein. By approximating f Y k with f X k 1 in the FISTA framework, the formula (11) then can be rewritten as:
X k = arg min X L 2 X Y F 2 + λ X Vis T V I .
According to the standard FISTA framework, we can optimize Equation (8) by using Algorithm 1. From the iteration process, each iterate of FISTA depends on the previous two iterates and not only on the last iterate as in ISTA. Therefore, FISTA can achieve a fast convergence rate.
Algorithm 1: FISTA framework for DRTV.
Input: 
L = 1 , λ , t 1 = 1 , X 0 = ψ T IR , Y 1 = ψ T IR
Output: 
X
1:
for k = 1 toMaxiterationdo;
2:
    Y = Y k ψ T ψ X k 1 IR / L ;
3:
    X k = arg min X L 2 X Y F 2 + λ X Vis T V I ;    \\ solved by Algorithm 2;
4:
    t k + 1 = 1 + 1 + 4 t k 2 / 2 ;
5:
    Y k + 1 = X k + t k 1 t k + 1 X k X k 1 ;
6:
end for;
7:
X is obtained as X k in the last iteration.
To solve the optimization problem in Line 3 in Algorithm 1, we set Z = X Vis . Then, we obtain the following minimization problem:
Z k = arg min Z L 2 Z Y Vis F 2 + λ Z T V I .
The minimization problem in Equation (13) is the problem of total variation (TV) minimization [31], and X k can be updated by Z k + Vis . The first term reflects that Z should be similar to Y Vis in terms of content. The second term, i.e., the TV term, is a constraint on the piece-wise smoothness of Z . The smooth constraint has been proven to allow the existence of step edges; hence, the boundary is not blurred when minimizing the energy function [34].
We follow the standard procedure [32,35] for solving the TV minimization problem in Equation (13) and summarize the entire procedure in Algorithm 2. The convergence rate is also considerably superior to other methods based on gradient projection. Refer to [32] for the detailed derivation. For ease of understanding, we use the same symbol notations as those in [36]. The linear operator is defined as: L R , S i , j = R i , j R i 1 , j + S i , j S i , j 1 . The corresponding inverse operator is presented as follows: L T X = R , S , where R i , j = X i , j X i + 1 , j , S i , j = X i , j X i , j + 1 . P is a projection operator to ensure R i , j 2 + S i , j 2 1 , R i , n 1 , and S m , j 1 .
Algorithm 2: TV minimization.
Input: 
λ , Y , Vis , U 1 , V 1 = R 0 , S 0 = 0 , 0 , t 1 = 1 , B = Y Vis
Output: 
X
1:
for k = 1 toMaxiterationdo;
2:
    R k , S k = P U k , V k + 1 8 λ L T B λ L U k , V k ;
3:
    t k + 1 = 1 + 1 + 4 t k 2 / 2 ;
4:
    U k + 1 , V k + 1 = R k , S k + t k 1 t k + 1 R k R k 1 , S k S k 1 ;
5:
end for;
6:
Z = B λ L R k , S k ;
7:
X = Z + Vis .

4. Experimental Results

In this section, we conduct experiments on publicly available datasets in comparison with several state-of-the-art methods to verify the effectiveness of our DRTV. We first introduce the testing data and compare methods and evaluation metrics. Then, we report the experimental results, followed by parameter analysis and computational cost comparison. All experiments are performed on a laptop with 1.9 GHz Intel Core i3 4030 CPU, 8 GB RAM, and Matlab code.

4.1. Dataset

The publicly available image dataset TNO Human Factors, which contains multispectral night images of different military scenarios, is used for performance evaluation. We select 20 infrared/visible images with different types of scenes for testing: airplane_in_trees, helicopter, nato_camp_sequence, Farm, house_with_3_men, Kaptein_01, Kaptein_1123, Marne_02, Marne_03, Marne_11, Movie_01, Duine_7408, heather, house, Kaptein_1654, man_in_doorway, Movie_12, Movie_18, Movie_24, and Reek.
Notably, visible and infrared images in the dataset are of the same resolution. As in real-world applications, the infrared images often present considerably low resolution. Here, we downsample the infrared images in our testing data for the experimental conditions to approximate the real-world situation. Therefore, all infrared source images present a low resolution (e.g., ratio c described in Section 2 is fixed to 2). We do not need image registration [37,38,39,40,41,42] before fusion because the source image pairs are all aligned.
We compare our DRTV with six state-of-the-art fusion methods, including pyramid transform-based methods such as Laplacian pyramid (LP) [43] and ratio of low-pass pyramid (RP) [44], multi-scale transform-based method such as HMSD [26]), edge-preserving decomposition-based method such as cross-bilateral filter (CBF) [45], subspace-based method such as directional discrete cosine transform and principal component analysis (DDCTPCA) [46], and total variation-based method such as gradient transfer fusion (GTF) [10]. Parameters in these methods are set by default as suggested in the original papers.

4.2. Evaluation Metrics

In numerous cases, the visual differences between fused results are insufficiently significant; therefore, fusion methods should be evaluated comprehensively from different perspectives. At present, evaluations are mainly divided into two directions: subjective evaluation and objective evaluation.
Subjective evaluation relies on human eyes and concentrates on the perception of details, contrast, sharpness, and target highlighting. While the quality of image is also affected by other factors, our eyes are insensitive to such factors. Numerous types of objective evaluation, which are mainly based on information theory, structural similarity, image gradients, and statistics, are available. Objective evaluation is consistent with human visual perception system and subject to experimental conditions.
The first two metrics, i.e., standard deviation ( S D ) [47] and entropy ( E N ) [48], are only dependent on the fused image. The three remaining metrics, i.e. mutual information ( M I ) [49], structural similarity index measure ( S S I M ) [50] and Peak signal-to-noise ratio ( P S N R ) [51], are determined by the fused and source images. In this paper, if the resolution of the fused image X is higher than that of the source image A (either the infrared image IR or the visible image Vis ), then we must downsample the fused image first and then calculate the metrics.
(1)
S D
S D is a statistical concept reflecting distribution and contrast. S D is mathematically defined as follows:
S D = 1 M N i = 1 M j = 1 N X i , j μ 2 ,
where μ is the mean value of X . Areas with high contrast are more likely to attract attention. Therefore, a large S D means that the fused image is improved.
(2)
E N
The E N of the fused image can measure the amount of information contained in it and is mathematically defined as:
E N = l = 0 L 1 p l log 2 p l ,
where L is the number of gray levels and p l is the normalized histogram of the corresponding gray level. When the E N is large, additional information is contained in the fused image, and the performance of the fusion method is improved. However, E N is conventionally used as an auxiliary metric because it may be influenced by noise.
(3)
M I
M I is used to measure the amount of information transmitted from the source image to the fused image. M I is defined as follows:
M I = M I IR , X + M I Vis , X ,
where M I IR , X and M I Vis , X respectively represent the amount of information transmitted from the infrared image and the visible image to the fused image. M I between two random variables can be calculated by the Kullback–Leibler metric:
M I A , X = a , x p A , X a , x log 2 p A , X a , x p A a p X x ,
where p A a and p X x respectively represent the edge histograms of A and X . p A , X a , x is the joint histogram of A and X . A large M I metric means that considerable information is transferred from source images to the fused image, thereby indicating good fusion performance.
(4)
S S I M
To model image loss and distribution, S S I M was proposed by Wang et al. [52] as a universal image quality index based on structural similarity. The formula of the index is as follows:
S S I M A , X = a , x b ( a , x ) · c ( a , x ) · s ( a , x ) .
In the formula above, to measure the structural similarity between source image A and fused image X , S S I M A , X is calculated and it mainly consists of three components: similarities of light, contrast and structure information, denoted as b ( a , x ) , c ( a , x ) and s ( a , x ) , respectively. The image patches of source image A and fused image X in a sliding window are respectively denoted by a and x. The specific definitions of the three components are as follows:
b ( a , x ) = 2 μ a μ x + C 1 μ a 2 + μ x 2 + C 1 ,
c ( a , x ) = 2 σ a σ x + C 2 σ a 2 + σ x 2 + C 2 ,
s ( a , x ) = σ a x + C 3 σ a σ x + C 3 ,
where μ a and μ x indicate the mean values of image patches a and x, respectively, reflecting brightness information; σ a and σ x represent S D , reflecting contrast information; σ a x denotes the covariance of source and fused images, reflecting the similarity of structure information; C 1 , C 2 , and C 3 are positive parameters used to avoid unstable phenomenon. Specifically, when C 1 = C 2 = C 3 = 0 , the index is equivalent to the universal image quality index [53]. Thus, for all source and fused images, the index measure S S I M is defined as follows:
S S I M = ω V i s S S I M Vis , X + ω I R S S I M , IR , X ,
where S S I M Vis , X and S S I M IR , X indicate structural similarities between visible/infrared and fused images, respectively.
(5)
P S N R
P S N R is used to reflect the distortion by the fusion operation and is defined by the ratio of peak value power and noise power in the generated fused image:
P S N R = 10 log 10 r 2 M S E ,
where r denotes the peak value of the fused image and M S E denotes the the mean square error between the fused image and source images. The larger the P S N R , the less distortion the fusion method produces.

4.3. Results and Analysis

To maintain a considerable amount of information in source images, we select the strategy of upsampling infrared images rather than downsampling visible images for the comparison methods before fusion, that is, e.g., the procedure (c) in Figure 1. Our DRTV involves only one parameter λ , which we set to 50 as its default value.
We first conducted several qualitative comparisons of different methods on four typical image pairs, including airplane_in_trees, Kaptein_01, nato_camp_sequence, and sandpath, as shown in Figure 2. Compared with other six fusion methods, DRTV considerably improves the resolution of the fused images and does not suffer from strong noise caused by upsampling infrared images before fusion. Although the resolution of infrared images is only half of that of visible images, DRTV allows preservation of additional detailed texture information, which benefits from the downsampling operator in ε 1 X and the · T V I norm of gradient differences in ε 2 X , in visible images. Furthermore, the fused results generated by our DRTV are similar to high-resolution infrared images with detailed scene representations, which are beneficial for detecting and recognizing potential targets.
Next, we provide quantitative comparisons on the total dataset with the following five metrics described in the last section: S S I M , M I , S D , E N and P S N R . The results are shown in Figure 3. For the metrics S S I M , M I and P S N R , our DRTV can generate the largest average values and exhibits the best performance on 17, 15 and 9 image pairs, respectively. For the metric S D , DRTV can produce comparable results. Meanwhile, our method for the metric E N does not achieve the best results, which can be explained as follows. Entropy is defined as a measure of uncertainty. When the probability of each gray value is evenly distributed, the uncertainty and the entropy are both large. For our DRTV, we constrain the fused image and the infrared image to present similar intensity distribution to facilitate discrimination of the thermal target by human eyes. Therefore, our fusion result maintained the intensity distribution of the original infrared image, where the gray value is unevenly distributed as that in the visible image, leading to a small E N value.

4.4. Parameter Analysis

Only one parameter is presented in our proposed method, i.e., the regularized parameter λ , which controls the trade-off between ε 1 X and ε 2 X . When λ is small, the fused image preserves additional thermal radiation information in the infrared image, allowing the visual effect to be highly similar to the infrared image. Conversely, when λ is sufficiently large, the texture information in the visible image remains in the fused image to a large extent; hence, the fused image is similar to the visible image. When λ + , the fused image is the original visible image. To analyze the influence of parameter λ on fusion performance in detail, we test different values of λ on two image pairs, including nato_camp_sequence and lake. We gradually increase the value of λ and observe the changes of the fused images. Figure 4 and Figure 5 show the changes of the fused images of the two image pairs when λ is fixed to 1, 10, 50, 100, 200, and 500.
As λ changes from 1 to 500, the fence and hill around the house in Figure 4 gradually appear. In addition, the outline of the chimney in the bottom left corner becomes gradually clear. For the results in Figure 5, the most evident change is that the texture of vegetation on the river is clearly delineated, and the chair is gradually separated from the road. Thus, we can conclude that texture features in the fused image gradually become clear as the value of λ increases.
By contrast, we can observe that, in Figure 5, when λ is smaller than 200, the pedestrian in nato_camp_sequence is legible, and the gray distribution of the house in the bottom left corner is similar to that in the infrared image. Meanwhile, when λ is set as 500, the pedestrian’s thermal radiation information is completely lost in the fused image and the house is similar to that in the visible image. In Figure 5, the word “AUTO” in the upper right corner gradually disappears in the fused image as λ changes from 1 to 500. Therefore, we conclude that, as λ increases, thermal radiation information becomes gradually lost in the fused image. This finding is also the flexibility and advantage offered by our method. By adjusting the value of parameter λ , users can conveniently control the similarity of the fused image to the infrared image or the visible image.

4.5. Computational Cost Comparison

Given that the FISTA framework is introduced in our model to accelerate the convergence rate, to evaluate it, we compare the model solved by FISTA framework and that solved by the previous variational model, ISTA. The convergence rate is observed through the value of f X k . The objective function f X is denoted by ε X in Equation (8) and X k denotes the fused image obtained by the k-th iteration. Figure 6 demonstrates the convergence rate comparison corresponding to two image pairs in Figure 2, e.g., airplane and Kaptein_01, as an example. Inheriting the advantage of the FISTA framework, our method typically converges only after 20 to 50 iterations while the method based on ISTA often converges in 70 to 90 iterations. Additional iterations will increase the computational cost without substantial performance gain. To stop the iteration automatically by the value of f X k , we apply the residual ratio of f X k to stop the iteration process. The residual ratio ( R R ) is defined as:
R R = f X k f X k 1 f X k 1 .
During the iteration process, the iteration is stopped when the value of R R is less than a predefined threshold and in our method, and the threshold is set as 0.3 % empirically.
To evaluate the efficiency of our proposed method, we compare our method with six other methods in terms of runtime. The computational costs of seven methods on 20 image pairs are shown in Figure 7, and the average runtime of seven algorithms are reported in the legend. In DDCTPCA, the source images are divided into many non-overlapping square blocks and there are eight directional modes to be performed on each block. All the coefficients in eight modes are used for fusion. Thus, the process of obtaining and fusing the coefficients for all the blocks takes up most of the runtime. Therefore, the runtime has a lot to do with the number of blocks, i.e., the size of source images. This is the reason why the runtime of DDCTPCA is oscillating, while, in our method, we formulate the problem as a convex optimization problem characterized by total variation model and sizes of source images have less effects on computational cost. This finding demonstrates that our method can achieve comparable efficiency.

5. Conclusions and Future Work

In this paper, we proposed a method for fusing infrared and visible images of different resolutions called DRTV. The fusion process is similar to conducting infrared image super-resolution, which simultaneously integrates the texture detail information in the visible image. The quantitative comparisons on several metrics with six state-of-the-art fusion methods demonstrate that our method can not only keep more detailed information in the source images, but also considerably reduce noise in the results caused by upsampling infrared images before fusion which typically occurs in existing methods. The fused image can also retain the thermal radiation information to a large extent, thereby benefiting fusion-based target detection and recognition systems.
In our objective function, we have used the first-order TV to preserve the texture detail information in the visible image. The first-order TV performs well in preserving edges of object in the piecewise constant image compared with other total variational models. However, it will produce staircase effects. For the purpose of eliminating the staircase effect in the first-order TV model, high-order models have been proposed by modifying the TV regularizer in the first order variational model to other regularizers and have shown promising performance, e.g., the TGV regularizer in [54], the bounded Hessian regularizer together with an edge diffusivity function in [55], Laplacian regularizer and regularizers combining the aforementioned regularizers with the TV regularizer [56]. Thus, it deserves attention to modify our model with high-order variational models to remove staircase effects in the follow-up work.

Author Contributions

Q.D., H.X. and F.F. have made substantial contributions to the algorithm design. H.X. has made substantial contributions to the algorithm implementation and manuscript preparation. Q.D. has made substantial contributions to the experimental data collection and the result analysis. Q.D., Y.M. and J.H. revised the final manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 61605146.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
  2. Huang, Y.; Bi, D.; Wu, D. Infrared and Visible Image Fusion Based on Different Constraints in the Non-Subsampled Shearlet Transform Domain. Sensors 2018, 18, 1169. [Google Scholar] [CrossRef] [PubMed]
  3. Kong, S.G.; Heo, J.; Boughorbel, F.; Zheng, Y.; Abidi, B.R.; Koschan, A.; Yi, M.; Abidi, M.A. Multiscale fusion of visible and thermal IR images for illumination-invariant face recognition. Int. J. Comput. Vis. 2007, 71, 215–233. [Google Scholar] [CrossRef]
  4. Gao, Y.; Ma, J.; Yuille, A.L. Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Trans. Image Process. 2017, 26, 2545–2560. [Google Scholar] [CrossRef] [PubMed]
  5. Yang, M.; Su, T.; Lin, H. Fusion of Infrared Thermal Image and Visible Image for 3D Thermal Model Reconstruction Using Smartphone Sensors. Sensors 2018, 18, 2003. [Google Scholar] [CrossRef] [PubMed]
  6. Singh, R.; Vatsa, M.; Noore, A. Integrated multilevel image fusion and match score fusion of visible and infrared face images for robust face recognition. Pattern Recognit. 2008, 41, 880–893. [Google Scholar] [CrossRef]
  7. Ma, J.; Jiang, J.; Liu, C.; Li, Y. Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration. Inf. Sci. 2017, 417, 128–142. [Google Scholar] [CrossRef]
  8. Liu, T.; Liu, H.; Chen, Z.; Lesgold, A.M. Fast blind instrument function estimation method for industrial infrared spectrometers. IEEE Trans. Ind. Inform. 2018. [Google Scholar] [CrossRef]
  9. Bavirisetti, D.P.; Dhuli, R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
  10. Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar] [CrossRef]
  11. Pajares, G.; De La Cruz, J.M. A wavelet-based image fusion tutorial. Pattern Recognit. 2004, 37, 1855–1872. [Google Scholar] [CrossRef]
  12. Dogra, A.; Goyal, B.; Agrawal, S. From Multi-Scale Decomposition to Non-Multi-Scale Decomposition Methods: A Comprehensive Survey of Image Fusion Techniques and Its Applications. IEEE Access 2017, 5, 16040–16067. [Google Scholar] [CrossRef]
  13. Li, S.; Yin, H.; Fang, L. Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Trans. Biomed. Eng. 2012, 59, 3450–3459. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, Q.; Liu, Y.; Blum, R.S.; Han, J.; Tao, D. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review. Inf. Fusion 2018, 40, 57–75. [Google Scholar] [CrossRef]
  15. Kong, W.; Zhang, L.; Lei, Y. Novel fusion method for visible light and infrared images based on NSST–SF–PCNN. Infrared Phys. Technol. 2014, 65, 103–112. [Google Scholar] [CrossRef]
  16. Liu, Y.; Chen, X.; Cheng, J.; Peng, H.; Wang, Z. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1850018. [Google Scholar] [CrossRef]
  17. Bavirisetti, D.P.; Xiao, G.; Liu, G. Multi-sensor image fusion based on fourth order partial differential equations. In Proceedings of the International Conference on Information Fusion, Xi’an, China, 10–13 July 2017; pp. 1–9. [Google Scholar]
  18. Patil, U.; Mudengudi, U. Image fusion using hierarchical PCA. In Proceedings of the International Conference on Image Information Processing, Shimla, India, 3–5 November 2011; pp. 1–6. [Google Scholar]
  19. Zhang, X.; Ma, Y.; Fan, F.; Zhang, Y.; Huang, J. Infrared and visible image fusion via saliency analysis and local edge-preserving multi-scale decomposition. J. Opt. Soc. Am. A 2017, 34, 1400–1410. [Google Scholar] [CrossRef] [PubMed]
  20. Shibata, T.; Tanaka, M.; Okutomi, M. Visible and near-infrared image fusion based on visually salient area selection. In Proceedings of the Digital Photography XI. International Society for Optics and Photonics, San Francisco, CA, USA, 27 February 2015; Volume 9404, p. 94040G. [Google Scholar]
  21. Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
  22. Ma, J.; Zhou, Z.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
  23. Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
  24. Liu, Y.; Chen, X.; Wang, Z.; Wang, Z.J.; Ward, R.K.; Wang, X. Deep learning for pixel-level image fusion: Recent advances and future prospects. Inf. Fusion 2018, 42, 158–173. [Google Scholar] [CrossRef]
  25. Ma, Y.; Chen, J.; Chen, C.; Fan, F.; Ma, J. Infrared and visible image fusion using total variation model. Neurocomputing 2016, 202, 12–19. [Google Scholar] [CrossRef]
  26. Zhou, Z.; Wang, B.; Li, S.; Dong, M. Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf. Fusion 2016, 30, 15–26. [Google Scholar] [CrossRef]
  27. Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
  28. Chen, C.; Li, Y.; Liu, W.; Huang, J. Image fusion with local spectral consistency and dynamic gradient sparsity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2760–2765. [Google Scholar]
  29. Mei, X.; Ma, Y.; Li, C.; Fan, F.; Huang, J.; Ma, J. Robust GBM hyperspectral image unmixing with superpixel segmentation based low rank and sparse representation. Neurocomputing 2018, 275, 2783–2797. [Google Scholar] [CrossRef]
  30. Candes, E.J.; Romberg, J.K.; Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 2006, 59, 1207–1223. [Google Scholar] [CrossRef] [Green Version]
  31. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
  32. Beck, A.; Teboulle, M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 2009, 18, 2419–2434. [Google Scholar] [CrossRef] [PubMed]
  33. Daubechies, I.; Defrise, M.; Mol, C.D. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 2003, 57, 1413–1457. [Google Scholar] [CrossRef]
  34. Wang, C. Research on Image Processing Technology Based on Variational Problems and Partial Differential Equations. Ph.D. Thesis, Science and Technology University of China, Hefei, China, 2007. [Google Scholar]
  35. Chen, C.; Li, Y.; Liu, W.; Huang, J. SIRF: Simultaneous Satellite Image Registration and Fusion in a Unified Framework. IEEE Trans. Image Process. 2015, 24, 4213. [Google Scholar] [CrossRef] [PubMed]
  36. Chen, C.; Li, Y.; Huang, J. Calibrationless parallel MRI with joint total variation regularization. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Nagoya, Japan, 22–26 September 2013; pp. 106–114. [Google Scholar]
  37. Ma, J.; Zhao, J.; Tian, J.; Bai, X.; Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 2013, 46, 3519–3532. [Google Scholar] [CrossRef]
  38. Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality preserving matching. Int. J. Comput. Vis. 2018. [Google Scholar] [CrossRef]
  39. Yan, J.; Cho, M.; Zha, H.; Yang, X.; Chu, S.M. Multi-graph matching via affinity optimization with graduated consistency regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1228–1242. [Google Scholar] [CrossRef] [PubMed]
  40. Yan, J.; Li, C.; Li, Y.; Cao, G. Adaptive discrete hypergraph matching. IEEE Trans. Cybern. 2018, 48, 765–779. [Google Scholar] [CrossRef] [PubMed]
  41. Ma, J.; Wu, J.; Zhao, J.; Jiang, J.; Zhou, H.; Sheng, Q.Z. Nonrigid Point Set Registration with Robust Transformation Learning under Manifold Regularization. IEEE Trans. Neural Netw. Learn. Syst. to be published. [CrossRef] [PubMed]
  42. Ma, J.; Jiang, J.; Zhou, H.; Zhao, J.; Guo, X. Guided locality preserving feature matching for remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4435–4447. [Google Scholar] [CrossRef]
  43. Burt, P.J.; Adelson, E.H. The Laplacian Pyramid as a Compact Image Code. Read. Comput. Vis. 1987, 31, 671–679. [Google Scholar]
  44. Toet, A. Image fusion by a ratio of low-pass pyramid. Pattern Recognit. Lett. 1989, 9, 245–253. [Google Scholar] [CrossRef]
  45. Kumar, B.K.S. Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process. 2015, 9, 1193–1204. [Google Scholar] [CrossRef]
  46. Naidu, V.P.S. Hybrid DDCT-PCA based multi sensor image fusion. J. Opt. 2014, 43, 48–61. [Google Scholar] [CrossRef]
  47. Hu, H.M.; Wu, J.; Li, B.; Guo, Q.; Zheng, J. An Adaptive Fusion Algorithm for Visible and Infrared Videos Based on Entropy and the Cumulative Distribution of Gray Levels. IEEE Trans. Multimed. 2017, 19, 2706–2719. [Google Scholar] [CrossRef]
  48. Zhou, X.; Wang, W. Infrared and Visible Image Fusion Based on Tetrolet Transform; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  49. Zhou, Y.; Mayyas, A.; Omar, M.A. Principal Component Analysis-Based Image Fusion Routine with Application to Automotive Stamping Split Detection. Res. Nondestruct. Eval. 2011, 22, 76–91. [Google Scholar] [CrossRef]
  50. Zhao, C.; Shao, G.; Ma, L.; Zhang, X. Image fusion algorithm based on redundant-lifting NSWMDA and adaptive PCNN. Optik-Int. J. Light Electron Opt. 2014, 125, 6247–6255. [Google Scholar] [CrossRef]
  51. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
  52. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  53. Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  54. Duan, J.; Lu, W.; Tench, C.; Gottlob, I.; Proudlock, F.; Samani, N.N.; Bai, L. Denoising optical coherence tomography using second order total generalized variation decomposition. Biomed. Signal Process. Control 2016, 24, 120–127. [Google Scholar] [CrossRef]
  55. Duan, J.; Qiu, Z.; Lu, W.; Wang, G.; Pan, Z.; Bai, L. An edge-weighted second order variational model for image decomposition. Digit. Signal Process. 2016, 49, 162–181. [Google Scholar] [CrossRef]
  56. Lu, W.; Duan, J.; Qiu, Z.; Pan, Z.; Liu, R.W.; Bai, L. Implementation of high-order variational models made easy for image processing. Math. Methods Appl. Sci. 2016, 39, 4208–4233. [Google Scholar] [CrossRef]
Figure 1. A typical example for fusing infrared and visible images of different resolutions through (a) downsampling the visible image; (b) our method; and (c) upsampling the infrared image.
Figure 1. A typical example for fusing infrared and visible images of different resolutions through (a) downsampling the visible image; (b) our method; and (c) upsampling the infrared image.
Sensors 18 03827 g001
Figure 2. Fusion results of different fusion methods on four image pairs. The four groups of results from left to right: airplane_in_trees, Kaptein_01, nato_camp_sequence and sandpath, where the corresponding subfigures are: (a) infrared image; (b) visible image; (c) our DRTV; (d) LP [43]; (e) RP [44]; (f) HMSD [26]; (g) CBF [45]; (h) DDCTPCA [46]; (i) GTF [10].
Figure 2. Fusion results of different fusion methods on four image pairs. The four groups of results from left to right: airplane_in_trees, Kaptein_01, nato_camp_sequence and sandpath, where the corresponding subfigures are: (a) infrared image; (b) visible image; (c) our DRTV; (d) LP [43]; (e) RP [44]; (f) HMSD [26]; (g) CBF [45]; (h) DDCTPCA [46]; (i) GTF [10].
Sensors 18 03827 g002
Figure 3. Quantitative comparisons of six fusion methods on twenty image pairs.
Figure 3. Quantitative comparisons of six fusion methods on twenty image pairs.
Sensors 18 03827 g003
Figure 4. Fusion results of our DRTV on nato_camp_sequence when parameter λ increases.
Figure 4. Fusion results of our DRTV on nato_camp_sequence when parameter λ increases.
Sensors 18 03827 g004
Figure 5. Fusion results of our DRTV on lake when parameter λ increases.
Figure 5. Fusion results of our DRTV on lake when parameter λ increases.
Sensors 18 03827 g005
Figure 6. Convergence rate comparison between FISTA framework and previous variational model. ((a) result on airplane_in_trees; (b) result on Kaptein_01).
Figure 6. Convergence rate comparison between FISTA framework and previous variational model. ((a) result on airplane_in_trees; (b) result on Kaptein_01).
Sensors 18 03827 g006
Figure 7. Runtime comparison of different methods on the whole test dataset.
Figure 7. Runtime comparison of different methods on the whole test dataset.
Sensors 18 03827 g007

Share and Cite

MDPI and ACS Style

Du, Q.; Xu, H.; Ma, Y.; Huang, J.; Fan, F. Fusing Infrared and Visible Images of Different Resolutions via Total Variation Model. Sensors 2018, 18, 3827. https://doi.org/10.3390/s18113827

AMA Style

Du Q, Xu H, Ma Y, Huang J, Fan F. Fusing Infrared and Visible Images of Different Resolutions via Total Variation Model. Sensors. 2018; 18(11):3827. https://doi.org/10.3390/s18113827

Chicago/Turabian Style

Du, Qinglei, Han Xu, Yong Ma, Jun Huang, and Fan Fan. 2018. "Fusing Infrared and Visible Images of Different Resolutions via Total Variation Model" Sensors 18, no. 11: 3827. https://doi.org/10.3390/s18113827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop