Improving Empirical Mode Decomposition Using Support Vector Machines for Multifocus Image Fusion

Empirical mode decomposition (EMD) is good at analyzing nonstationary and nonlinear signals while support vector machines (SVMs) are widely used for classification. In this paper, a combination of EMD and SVM is proposed as an improved method for fusing multifocus images. Experimental results show that the proposed method is superior to the fusion methods based on à-trous wavelet transform (AWT) and EMD in terms of quantitative analyses by Root Mean Squared Error (RMSE) and Mutual Information (MI).


Introduction
Due to the limited depth-of-focus of optical lenses, cameras cannot be focused simultaneously on all objects at different distances from them to gain a clear image [1]. On the other hand, many patternrelated processing tasks, such as machine vision and target tracking, are better implemented using focused images rather than defocused ones [2]. Therefore, it is often advantageous to construct an image with every object in focus using image fusion methods by fusing the multifocus images taken from the same view point under different focal settings [3].
Up to now, various methods at pixel, feature or decision levels have been presented for image fusion [3][4][5]. Arithmetic algorithms at the pixel level often cause undesirable side effects such as reduced contrast [6]. Another alternative approach using image block and spatial frequency suffers from a tradeoff between block size and the quality of the fused image. That means using a large image block will lead to a less clear image while using a small image block may lead to saw-tooth effect [7].
Another family of methods has been explored based on undecimated 'à-trous' wavelet transform (AWT) [8], [9]. The basic idea is to implement an AWT on each multifocus image, and then fuse all wavelet coefficients by their magnitudes to produce one composite wavelet representation, from which the focused image can be recovered by performing the inverse AWT (IAWT) [2].
Empirical mode decomposition (EMD) is a more recent signal processing method for analyzing nonlinear and nonstationary data, which was developed by Huang et al. [10,11]. The final representation of signal is an energy-frequency distribution that gives not only sharp identifications of salient information but also other "smoother" part of the signal. The EMD is a highly efficient and adaptive method and offers higher frequency resolution and more accurate timing of nonlinear and nonstationary signal events than traditional integral transform techniques [12][13][14][15][16]. In this paper, a combination of EMD and support vector machines (SVMs) is proposed to produce a better EMD representation of the fused image from fusing multifocus images.
The SVM is a supervised classification method that outperforms many conventional approaches in many applications [2]. The improvement of the EMD based multifocus image fusion using the SVM is presented in Section 2. An experiment in Section 3 illustrates that the presented method produces the focused images better than the traditional fusion methods based on EMD and AWT quantitatively.

Fusion Principle
Here, the processing of two images A and B is considered, though the algorithm can be extended to handle more than two. Each multifocus image is firstly decomposed by EMD into one residue and a series of intrinsic mode functions (IMFs). Then a SVM is trained to determine which IMF plane is clearer at each location at each level. In the end, the focused image is recovered by carrying out the inverse EMD (IEMD).

EMD-based multifocus image fusion using the SVM
The EMD can represent the details and smooth part of an image and this framework is well suitable to fuse images by managing different IMFs [12][13][14][15][16][17]. For a two-dimensional image, the EMD process that generates the IMFs is summarized as follows [12]: 1) Treating the original image I as the initial residue I 0 .
2) Connecting all the local maxima and minima along rows using constructed smooth cubic splines to get upper envelope ue r and lower envelope le r . Similarly, upper envelope ue c and lower envelope le c along columns are also obtained. The mean plane ul is defined: Then, the difference between I 0 and ul is ul This is one iteration of the sifting process. Because the value of ul decreases rapidly for the first several iterations and then decreases slowly, this suggests that the appropriate number of iterations can be used as the stopping criterion. Hence, the appropriate number of iterations to build IMFs is used in this paper. This sifting process is ended until ω 1 becomes an IMF. The residue is obtained by: Multifocus image fusion method based on the EMD is to fuse the residues and the IMFs by the activity levels to produce a composite decomposition of the fused image. However, this simple fusion rule sometimes may not produce optimal EMD representation of the fused image when adjacent EMD coefficients are jointly considered to take fusion judgment where a decision fusion rule is needed. With the SVM, one expects much room for improvement over the activity level based fusion schemes.
The SVMs are a set of related supervised learning methods used for classification and regression. Interested readers may consult [18] for details. Given a group of labeled patterns {(x j , y j )}. x j and y j are the pattern and the corresponding class label, respectively. Training a SVM is equivalent to tackling a quadratic programming problem (QPP) in a number of variables equal to the number of patterns. The solution to the QPP has the following form: is the kernel function used to calculate the inner production of x i and x which means respectively the support vector and validated input vector. L is the number of support vectors. α i is the coefficient corresponding to x i . C is a user-defined regularization parameter. y r is different from y s .
Based on the outputs of the SVM corresponding to the inputs, the activity level based fusion rule can be upgraded to the decision fusion rule in such a way that the trained SVM can be used to pick out the focused EMD coefficients for preserving the salient information at each pixel location at each level.

The procedure of the proposed method
The proposed method (Figure 2) takes the following steps: 1) Extract generalized spatial frequency (S) of each pixel of A and B using a small window (W) centered at the current pixel position according to formula (6). In this paper, the W of 3×3 is used. Let I and I(m, n) denote A or B and its gray value at (m, n), respectively. Then S I (m, n) is given by: S is used to measure the overall activity level of a pixel value because it is a manner that gray value switches to its neighbors.
2) Collect training patterns as follows: ) , where x i and x j denote the training patterns given by equations (7) and (8). 4) Decompose A and B with EMD along rows and columns to J levels, resulting in a residue and a total of J IMF planes, respectively. 5) Derive the S value of the EMD coefficients of A and B at each position at each level according to formula (6) 7) Finally, the fused image is recovered by implementing IEMD according to formula (4). In Figure 2, the position (m, n) has been omitted in order to be concise.

Experiments
In this section, multifocus image fusion based on the AWT, the EMD, and the proposed method is tested on two sets of images: green pepper (512×512) and leopard (480×360). Each reference image When performing the AWT based fusion algorithm, because multiresolution analysis based on à trous filter can preserve translation invariance, short decomposition/reconstruction filters are needed to avoid ringing artifacts [19]. Max scheme choosing is used to select the significant coefficient, à trous filter 2 -1/2 (1/16, 1/4, 3/8, 1/4, 1/16), together with a decomposition level of three, coefficient based activity. For the EMD, cubic spline function, along with two levels of decomposition and coefficient based max scheme is used. For performing the proposed method termed EVM (Empirical support Vector Machine), the SVM20 with the radial basis function is used, and this software was downloaded from T U http://liama.ia.acU T .cn/PersonalPage/lbchen/svm20.zip (accessed in 2004). Based on formulae (6), (7), and (8), the training patterns are abstracted from the input images. In this experiment, each pixel in the multifocus images generates one training pattern. The fused images produced by the three methods are shown in Figures 3(d)  (i) reference leopard image; (j) fused image using AWT; (k) fused image using EMD; (l) fused image using EVM (C=6500).
Here, h F,I is the normalized joint gray level histogram of images F and I, h F and h I are the normalized histograms of F and I, and L is the number of gray levels. RMSE is used to measure the difference between F and I. MI is used to measure the reduction in uncertainty about I compared with F, so a less RMSE and a larger MI are preferred. Quantitative comparison of their performance is shown in Tables 1 and 2. As can be found from Tables 1 and 2, the EVM exhibits significant improvements over the AWT and EMD. The fused images produced by the EVM are nearly a combination of the good-focus parts of the input images. In comparison, the fused images produced by the AWT and EMD are inferior.
The key reason for the superiority of the EVM over the AWT and EMD is the usage of generalized spatial frequency in representing image clarity, which produces good input features for the SVM in deciding which input image has the better focus at a specific pixel position.  The SVM requires the presetting of a regularization parameter [C in formula (5)] that trades off the margin with training errors. In general, using the C value too large or too small is undesirable, and this is corroborated by Figures 4(a) and (b), which show the effects of the C on the RMSE and MI by the EVM with radial basis function (Kernel) and linear basis function (Linear) on processing Figures 3(a) and (b), respectively. Initially, the parameter C is set to 5000. Then the C value is added and subtracted by 2000 each time. Figures 4(a) and (b) are plot using the RMSE and MI values of the fused images corresponding to the values of the C. In general, using a C value too large or too small is undesirable because the performance is relatively stable over a large range of C.

Conclusions
In this paper, we study the wedding of EMD and SVM for fusing images with different focuses of the same scene in order to get an image with every object in focus. The EMD is used for the multiresolution decomposition, while the SVM is employed to find the multifocus image with the better focus at a given pixel position. Based on the outputs of the SVM, the fusion scheme based on the activity level of the EMD coefficients can be improved to the decision fusion rule. This fusion scheme is used to select the source multifocus image that has the best focus at each pixel location. Experiments corroborate that the proposed method does better than the traditional AWT and EMD based fusion schemes in fusing multifocus images in terms of the evaluation based on RMSE and MI. By working on the EVM fused image rather than on the original defocused image, vision-related processing tasks can be expected to yield more accurate results. Compared with the separate AWT and EMD based methods, the EVM based method is more computational intensive when implemented to perform real-time image fusion. However, overall evaluation shows that it is a promising method.
In remote sensing community, one of the most challenging tasks is fusion of images with different imaging geometry and spatial resolution, for example, synthetic aperture radar images and Landsat Thematic Mapping images. In the future, we intend to extend the proposed fuser to merge multisensor images. Another is the fusion of images with obviously different pixel sizes and spectral properties, such as Moderate Resolution Imaging Spectroradiometer (MODIS) images and TM images [20]. Here, the fusion problem for the SVM then becomes how to choose the input image with the best spectral and spatial response at each location.