1. Introduction
Multi-sensor image fusion is a synthesis technique that can fuse source images from multiple sensors into a high-quality image with comprehensive information [
1,
2,
3]. The technique is widely used in visual sensor networks, such as military defense, security monitoring, and image inpainting. In digital photography, it is difficult for the single-lens reflex camera to take an image that can present all objects into focus [
4,
5]. To obtain all-in-focus images, multisource images from the same scene with different focuses are fused into one signal image, which is named the multi-focus image fusion [
6]. Most of the existing multi-focus image fusion methods can be classified into two strategies: signal processing-based fusion methods (such as transform domain methods, spatial domain methods, the hybrid methods), and machine learning-based fusion methods (such as artificial neural network, fuzzy system, and support vector machine).
Generally, the transform domain-based fusion methods include three stages: first, the source images are transformed to obtain the decomposed sub-band coefficients of each image; then, a certain fusion rule is performed to integrate the corresponding sub-band coefficients to obtain the fused coefficients; at last, the fused coefficients are used to obtain the fused image by inverse transformation [
7,
8,
9]. The classical signal processing-based fusion methods include principal component analysis (PCA) [
10], discrete wavelet transform (DWT) [
11], nonsubsampled operation-based transform (such as nonsubsampled shearlet transform, non-subsampled contourlet transform, stationary wavelet transform) [
12], multi-resolution singular value decomposition (MSVD) [
13], discrete cosine harmonic wavelet transform (DCHWT) [
14], and so on. However, the conventional image fusion methods may produce unpredictable errors between the transform and inverse transform, and these errors may produce the problem of image distortion and artifacts.
With the development of neural networks, researchers are devoted to introducing deep learning into image fusion, especially the field of multi-focus image fusion, which can model as a pixel classification task [
15,
16,
17,
18,
19]. In recent years, image fusion methods based on deep learning models have emerged and shown great development potential in some situations [
20,
21]. Liu et al. [
15], in 2017, applied a deep convolutional neural network (DCNN) to multi-focus image fusion. This method regarded image fusion as a binary classification problem, but it was still a fusion method based on the spatial domain method that may have the block effect. To solve this problem, Mustafa et al. [
22] proposed a multi-focus image fusion method, which combined the feature extraction, fusion and reconstruction task together as a complete unsupervised end-to-end model. With the development of generative adversarial networks (GANs), it has shown great capacity in the field of image fusion. Guo et al. [
23] proposed a multi-focus image fusion method based on conditional generative adversarial network (cGANs), which achieved good image fusion performance. However, the image fusion methods based on deep learning also have some limitations, for example, a mass of samples and computational resources are needed for training a good model with plenty of time; moreover, many hyper-parameters are adjusted manually [
24]. Considering the tradeoff of calculated quantity and fusion performance, shallow machine learning methods also have some superiorities in image fusion because these methods require limited computing resources and fewer training samples. The support vector machine (SVM), which can be regarded as a classical shallow learning model with a hidden layer, is normally trained by using some extracted features to distinguish the focused and unfocused regains that are employed for generating fusion decisions [
18,
19]. Because of the lack of feature extraction capability for the shallow machine learning model, it is necessary to employ a given feature extraction method to present the image features (such as texture, structure, and edge), which has great significance on the improvement of image fusion performance.
In this work, a novel multi-focus image fusion method based on SVM, multiscale PCA, and the feature extraction method is introduced. The method first uses the sliding window technique to extract the detailed features of different source images. Then, the focused and unfocused areas of source images are extracted by a pre-trained SVM. In the fusion stage, the fusion decisions of different source images are combined with a set of logic operations, and then CV is carried out to optimize the decisions. At last, a new pixel-weighted image fusion scheme is designed based on multi-scale PCA to process the disputed decisions at the same positions of different source images. The contributions of this work are summarized as follows.
This work designs a regional feature extraction method based on five image fusion evaluation metrics and the extracted regional features are then employed as the input of an SVM model to produce pixel fusion decisions. This design can avoid inputting the complete image into SVM.
An SVM-based spatial image focus detection method is introduced to distinguish the focused and unfocused regions for integrating different source images, and the new method requires a few training samples to identify the focused and unfocused areas.
A multi-scale weighted image fusion method based on PCA is proposed to handle the disputed regions that come from the same position of the decision masks of different source images. The proposed multi-scale image fusion method based on PCA has better performance compared to the conventional PCA methods.
The remaining sections of the paper are presented as follows. In
Section 2, the basic theories of the SVM and PCA-based image fusion method are briefly reviewed. In
Section 3, the proposed image fusion method is reported. The experimental results and analysis are described in
Section 4.
Section 5 concludes this work.
4. Experimental Results and Analysis
This section first shows two experiments to verify the validity of the proposed MWPCA. Conventional PCA [
10,
34] and single-scale PCA-based weight (SWPCA) are used to compare with our proposed MWPCA. To further verify the effectiveness of the proposed image fusion method, some popular image fusion algorithms are also employed to compare with our proposed model by six widely-used image metrics. In the feature extraction stage, the sliding window size is set as 9 × 9. In the SVM model training, the libsvm package provided by Professor Lin Zhiren from Taiwan university is used to train and test the performance of the model. The parameters of SVM are optimized by PSO, as g = 400 and c = 0.005. After our repeated experiments, MWPCA with four-scales is suitable for the proposed method. The experimental images are six pairs of popular multi-focus images, which are shown in
Figure 4. The evaluation metrics are: edge-based on a similarity measure (
QAB/F), mutual information (MI), STD, SF, feature mutual information (FMI), and AG.
The comparison methods are: DWT [
35], gradient pyramid (GP) [
36], MSVD [
11], convolutional sparse representation (CSR) [
37], fsd pyramid (FSD) [
34], discrete cosine harmonic wavelet transform (DCHWT) [
14], multi-scale guided image and video fusion (MGFF) [
38], multi-exposure and multi-focus image fusion in gradient domain (MMGD) [
39], stationary wavelet transform (SWT) [
40], image fusion method with Laplacian pyramid transform and pulse coupled neural networks (LPPCNN) [
15], image fusion method with fourth order partial differential equations (FPED) [
17], image fusion method with boosted random walks-based algorithm (BRWIF) [
16]. The proposed image fusion method is denoted by SVM-MWPCA.
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10 display the source images and the fused images of different image fusion methods. The experiments show that some previous methods cannot fuse the source images effectively. In
Figure 5, DWT, GRP, and MSVD cannot fuse the detailed features of the source images, thus the fused images are distorted to some extent. We can clearly see that a good fusion image is not obtained by the FPED method, especially at the junction of focus and multi-focus images. In
Figure 6, the fused images of DWT, MSVD, FSD, DCHWT, MMGD, and FPED have obvious distortion. In particular, the fusion image obtained by the FPED method has a serious loss of details. The images fused by our proposed fusion method are superior to those of other methods in terms of edges, details, and textures, and our fused images are most similar to the source images. The enlarged images confirm the above situations. In
Figure 7, the fused images of GRP and MSVD have obvious distortion, and the results are worse than other methods. In
Figure 8, we found that apart from the FPED method, it is difficult to judge the difference of the fused images of different methods by human eyes. In
Figure 9, the fused images obtained by GRP, MSVD, CSR, FSD, DCHWT, and MMGD cannot effectively represent the details of the source images, especially the clear and fuzzy edges. In
Figure 10, the difference among the fused images cannot be recognized very well by human eyes, thus some evaluation metrics are employed to verify the performance of different methods. In general, our proposed image fusion method generally produces better visual effect when compared with these of other comparison methods.
By employing the experimental data in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6, we can find that the proposed MWPCA has the largest values of
QAB/F and MI in source images “head”, “office”, “boat”, “wine bottle”, and “bread” when compared with the conventional PCA and SWPCA methods. For the source image “flora”, the MWPCA method has the best values of
QAB/F.
QAB/F and MI are the two most crucial evaluation metrics in image fusion. MWPCA has the largest values in almost all of the rest evaluated metrics. The fused images obtained by MWPCA have much better clarity than those of conventional PCA methods. Thus, the fusion image obtained by the proposed MWPCA has better visual effects and more superior objective indicators.
The comparison of the evaluation indexes of different image fusion methods is provided in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6. Generally, four digits are used in the field of image fusion because some indicators are approximate. Among the above evaluation metrics,
QAB/F and MI are the most important parameters to evaluate the fused image quality. The
QAB/F metrics indicate how much edge information from the source image is retained. The MI metrics indicate how much source image information is transferred to a fused image. Other indicators include metrics as auxiliary indicators. The higher the evaluation metrics value, the higher the fused image quality.
Table 1 shows that the
QAB/F and MI values of the proposed method are the largest in “head”.
Table 2 shows that the
QAB/F and MI values of the proposed method are the largest in “office”.
Table 3 displays the values of
QAB/F are the second largest in “boat”, which is only 0.0062 below the maximum.
Table 4 shows our proposed image fusion method can obtain the best values for the source image pair “flora” in
QAB/F indexes.
Table 5 shows that the proposed image fusion method can obtain the best values for the source image pair “wine bottle” in
QAB/F, MI, and SF.
Table 6 shows that the proposed method can obtain the best values for the source image pair “bread” in
QAB/F, MI, STD, and AG. According to these experiments, we can find that our proposed image fusion method always has the best values of
QAB/F and MI. Among other indicators, metric values fluctuate due to the calculation method. The STD, SF, and AG are independent of the source images and only depend on the fused images. Therefore, STD, SF, and AG are not always effective to analyze the fused images. However, the values of STD, SF, and AG are better than most of the other methods. To sum up, our proposed image fusion method has better performance compared with those of other comparison methods.