DDGANSE: Dual-Discriminator GAN with a Squeeze-and-Excitation Module for Infrared and Visible Image Fusion

: Infrared images can provide clear contrast information to distinguish between the target and the background under any lighting conditions. In contrast, visible images can provide rich texture details and are compatible with the human visual system. The fusion of a visible image and infrared image will thus contain both comprehensive contrast information and texture details. In this study, a novel approach for the fusion of infrared and visible images is proposed based on a dual-discriminator generative adversarial network with a squeeze-and-excitation module (DDGANSE). Our approach establishes confrontation training between one generator and two discriminators. The goal of the generator is to generate images that are similar to the source images, and contain the information from both infrared and visible source images. The purpose of the two discriminators is to increase the similarity between the image generated by the generator and the infrared and visible images. We experimentally demonstrated that using continuous adversarial training, DDGANSE outputs images retain the advantages of both infrared and visible images with signiﬁcant contrast information and rich texture details. Finally, we compared the performance of our proposed method with previously reported techniques for fusing infrared and visible images using both quantitative and qualitative assessments. Our experiments on the TNO dataset demonstrate that our proposed method shows superior performance compared to other similar reported methods in the literature using various performance metrics.


Introduction
Image fusion is a technique to gather all the important information from multiple images to produce fewer images or a single comprehensive and informative image for subsequent processing functions [1]. In particular, the image fusion technique is widely used in infrared and visible images for object detection and target recognition applications [2][3][4]. Infrared (IR) images can provide enhanced contrast between the target object and the background which is an important feature in imaging systems. However, IR images usually do not provide detailed texture which leads to low quality or low resolution images. On the other hand, visible images have the advantage of higher resolution and detailed texture information at the expense of poor contrast between target object and the background. Figure 1 illustrates examples of visible and infrared images of a same scene. As seen in Figure 1, the visible images contain rich texture details but poor contrast (top The key aspect of image fusion is to extract the most important information from the input images and fusing them into an image containing all important information from the input images. For IR and visible images, the most important information are contrast and texture details and as such, a fused image should contain both information. Many fusion methods for IR and visible images have been introduced in the past decades. These methods can be classified into traditional methods and deep learning methods. Typical traditional methods include multi-scale transformation-based methods, sparse representation-based methods [5,6], subspace-based methods [7], saliency-based methods [8,9], hybrid methods [10][11][12], and other fusion methods [13][14][15]. In these methods, information extraction and fusion rules are designed manually to achieve a better fusion performance. However, due to the diversity of information extraction and fusion rule designs, fusion methods have become increasingly complex. Recently, deep learning methods have been applied extensively for image fusion tasks with IR and visible images, such as FusionGAN [16], Densefuse [17], convolutional sparse representation [18], proportional maintenance of gradient and intensity [19], and other methods [20][21][22]. Although existing methods have yielded good results under most conditions, there are certain limitations that should not be overlooked. First, most of the proposed generative adversarial networks (GAN) [23] only utilize one discriminator for image fusion. These methods only retain the information of one source image and can lose a significant amount of information from the other source image, resulting in a final fused image which does not contain all the details from source images.
Second, with most of the existing methods, it is difficult to achieve a good balance in maintaining all of the source image information.
Finally, in the case where the actual problem of utilizing one discriminator is not addressed, it is not sufficient to rely only on the loss function to solve the problem.
To overcome these limitations, we propose a dual-discriminator-based GAN with squeeze-and-excitation (SE) networks [24]. Our main contributions are summarized as follows. First, we used a GAN with dual discriminators. The two discriminators are trained with the generator to ensure that the fused image retains the high contrast of the IR image and the richly detailed texture information of the visible image. We use the source image as the real data for comparison with the fused image to ensure that the fused image is analogous to the source image. The key aspect of image fusion is to extract the most important information from the input images and fusing them into an image containing all important information from the input images. For IR and visible images, the most important information are contrast and texture details and as such, a fused image should contain both information. Many fusion methods for IR and visible images have been introduced in the past decades. These methods can be classified into traditional methods and deep learning methods. Typical traditional methods include multi-scale transformation-based methods, sparse representation-based methods [5,6], subspace-based methods [7], saliency-based methods [8,9], hybrid methods [10][11][12], and other fusion methods [13][14][15]. In these methods, information extraction and fusion rules are designed manually to achieve a better fusion performance. However, due to the diversity of information extraction and fusion rule designs, fusion methods have become increasingly complex. Recently, deep learning methods have been applied extensively for image fusion tasks with IR and visible images, such as FusionGAN [16], Densefuse [17], convolutional sparse representation [18], proportional maintenance of gradient and intensity [19], and other methods [20][21][22]. Although existing methods have yielded good results under most conditions, there are certain limitations that should not be overlooked.
First, most of the proposed generative adversarial networks (GAN) [23] only utilize one discriminator for image fusion. These methods only retain the information of one source image and can lose a significant amount of information from the other source image, resulting in a final fused image which does not contain all the details from source images.
Second, with most of the existing methods, it is difficult to achieve a good balance in maintaining all of the source image information.
Finally, in the case where the actual problem of utilizing one discriminator is not addressed, it is not sufficient to rely only on the loss function to solve the problem.
To overcome these limitations, we propose a dual-discriminator-based GAN with squeeze-and-excitation (SE) networks [24]. Our main contributions are summarized as follows. First, we used a GAN with dual discriminators. The two discriminators are trained with the generator to ensure that the fused image retains the high contrast of the IR image and the richly detailed texture information of the visible image. We use the source image as the real data for comparison with the fused image to ensure that the fused image is analogous to the source image.
Second, we added SE networks to the generator to assist in learning the correlation between channels, screen out attention for the channel, and further improving the performance of the network.
Third, a dense connection is added to the generator to realize feature reuse and solve the problem of gradient vanishing.
Next, we have designed a specific loss function comprising two parts: L G and L D . The generator loss integrates the traditional loss, which further enhances the constraint of the generator. At the same time, the discriminator loss focuses more on the relationship between the original image and fused image. L D has two parts: L Di and L DV ; the L Di loss function improves the contrast information of the fused image through confrontation training, and the L DV loss function further enriches the details of the fused image. The loss function designed in this study is crucial for obtaining a larger amount of information from source images, including contrast information and detailed texture information.
Finally, the dual-discriminator GAN with squeeze-and-excitation module (DDGANSE) network is tested on the TNO dataset. Our extensive experimental data clearly demonstrate that our proposed fusing technique has significant advantages compared to other existing methods.

Methods
This chapter describes in detail the DDGANSE method proposed in this study. First, the overall framework of the DDGANSE is explained in detail. Second, we describe the network architecture for the generator and discriminator. Finally, we provide details regarding the loss of function of the network.

Overall Framework
As illustrated in Figure 2, the DDGANSE framework consist of two main parts. The first part is the generator. We have added the SE module based on the traditional convolutional neural network and reasonably use the advantage of a dense connection. The second part is the discriminator, which is improved using a traditional GAN. In this study, two discriminators are used for adversary learning. In addition, adversarial learning between the generation and discrimination networks can effectively correct the prediction error. Second, we added SE networks to the generator to assist in learning the correlation between channels, screen out attention for the channel, and further improving the performance of the network.
Third, a dense connection is added to the generator to realize feature reuse and solve the problem of gradient vanishing.
Next, we have designed a specific loss function comprising two parts: LG and LD. The generator loss integrates the traditional loss, which further enhances the constraint of the generator. At the same time, the discriminator loss focuses more on the relationship between the original image and fused image. LD has two parts: LDi and LDV; the LDi loss function improves the contrast information of the fused image through confrontation training, and the LDV loss function further enriches the details of the fused image. The loss function designed in this study is crucial for obtaining a larger amount of information from source images, including contrast information and detailed texture information.
Finally, the dual-discriminator GAN with squeeze-and-excitation module (DDGANSE) network is tested on the TNO dataset. Our extensive experimental data clearly demonstrate that our proposed fusing technique has significant advantages compared to other existing methods.

Methods
This chapter describes in detail the DDGANSE method proposed in this study. First, the overall framework of the DDGANSE is explained in detail. Second, we describe the network architecture for the generator and discriminator. Finally, we provide details regarding the loss of function of the network.

Overall Framework
As illustrated in Figure 2, the DDGANSE framework consist of two main parts. The first part is the generator. We have added the SE module based on the traditional convolutional neural network and reasonably use the advantage of a dense connection. The second part is the discriminator, which is improved using a traditional GAN. In this study, two discriminators are used for adversary learning. In addition, adversarial learning between the generation and discrimination networks can effectively correct the prediction error.

DDGANSE of the Generator
The generator part of the network architecture is illustrated in Figure 3. We divide the generator into two paths, a gradient path and a contrast path, to extract information

DDGANSE of the Generator
The generator part of the network architecture is illustrated in Figure 3. We divide the generator into two paths, a gradient path and a contrast path, to extract information in a complete and clear manner. The gradient path mainly extracts the texture information which is mainly represented in the visible images with few information in the IR images. in a complete and clear manner. The gradient path mainly extracts the texture information which is mainly represented in the visible images with few information in the IR images. Therefore, one IR and two visible images together comprise the input of the gradient path as indicated in Figure 3 (top path). Similarly, the contrast path will extract the contrast information of the target object and the background where IR images contain more information than visible images. Therefore, two IR images and one visible image together comprise the input of the contrast path as indicated in Figure 3 (bottom path).
The experimental results on the TNO dataset using the proposed network architecture is presented in Table 1. We used a four-layer convolutional neural network in each information extraction path. The first and second layers use 5 × 5 filters, and the third and fourth layers utilize 3 × 3 filters. The stride in each layer is set to 1 with batch normalization and the Leaky ReLU activation function. In contrast to the ReLU activation function, the Leaky ReLU activation function assigns a non-zero slope to all negative values (instead of setting all assignments to zero in ReLU). As a result, the utilization of the Leaky ReLU function solves the problem of neuron death encountered with the ReLU function. In addition, we added a SE module after each convolutional layer. The SE module can help with learning the correlation between channels to allow the image generated by the generator to contain more information from the source image. On this basis, we also added dense connections to the generator. Using dense connections enables the mitigating gradient disappearance enhancing feature delivery, and feature reuse, which allows the generator to use the feature information of the source image more effectively and enhance the network properties of the generator. Among them, we use two paths to extract information from the generator, and introduce the attention module in the generator, which is also one of the main differences between this study and DDcGAN [25]. Table 1. Quantization results for the selection of 3 × 3 and 5 × 5 filters in DDGANSE on the TNO dataset (the best value is marked in yellow, and the second-best value is marked in red).  Therefore, one IR and two visible images together comprise the input of the gradient path as indicated in Figure 3 (top path). Similarly, the contrast path will extract the contrast information of the target object and the background where IR images contain more information than visible images. Therefore, two IR images and one visible image together comprise the input of the contrast path as indicated in Figure 3 (bottom path).

Generator
The experimental results on the TNO dataset using the proposed network architecture is presented in Table 1. We used a four-layer convolutional neural network in each information extraction path. The first and second layers use 5 × 5 filters, and the third and fourth layers utilize 3 × 3 filters. The stride in each layer is set to 1 with batch normalization and the Leaky ReLU activation function. In contrast to the ReLU activation function, the Leaky ReLU activation function assigns a non-zero slope to all negative values (instead of setting all assignments to zero in ReLU). As a result, the utilization of the Leaky ReLU function solves the problem of neuron death encountered with the ReLU function. In addition, we added a SE module after each convolutional layer. The SE module can help with learning the correlation between channels to allow the image generated by the generator to contain more information from the source image. On this basis, we also added dense connections to the generator. Using dense connections enables the mitigating gradient disappearance enhancing feature delivery, and feature reuse, which allows the generator to use the feature information of the source image more effectively and enhance the network properties of the generator. Among them, we use two paths to extract information from the generator, and introduce the attention module in the generator, which is also one of the main differences between this study and DDcGAN [25]. Table 1. Quantization results for the selection of 3 × 3 and 5 × 5 filters in DDGANSE on the TNO dataset (the best value is marked in yellow, and the second-best value is marked in red).

Generator
Discriminator Performance Metrics We then fuse the features proposed in the two paths using the concat method. To merge the information from the source images completely, the fused images are processed by a Photonics 2022, 9, 150 5 of 16 1 × 1 kernel convolution with a stride size of 1 and a Tanh activation function. Throughout the process, we use convolutions with a stride size of 1 and therefore, the size of the feature map is the same as the size of the input image.

DDGANSE of the Discriminator
The DDGANSE of the discriminator is illustrated in Figure 4. In our DDGANSE architecture, we use two discriminators. D i is used to discriminate the fused image and IR image, and D v is used to discriminate the fused image and visible image. Both discriminators are composed of four convolution layers and a linear layer. According to the experimental results presented in Table 1, the four convolutional layers use 3 × 3 convolution kernels and the Leaky ReLU activation function. In addition, batch normalization is added to the last three convolution layers and the step size of each convolutional layer is set to be 2. The architecture of discriminator is not similar to generator network. The reason is that the discriminator is a classifier which first obtains the feature maps from the input image and proceeds to categorize them. Thus, by assigning the stride to 2, it functions similar to a pooling layer. The linear layer discriminates the input against the features extracted from the first four convolutional layers and outputs the categorization probabilities. The contrast of the fusion image can be improved by confrontation training of the D i discriminator, and the texture details can be enriched by learning of the D V discriminator. The adversarial network containing two discriminators enables the generator to generate fused images that not only contain rich detailed information but also have significant contrast. For the dual discriminator part, our network structure is different from DDcGAN. We use four-layer convolution and add pool layer and linear layer. In summary, the performance of the fused image can be improved by utilizing two discriminators. We then fuse the features proposed in the two paths using the concat method. To merge the information from the source images completely, the fused images are processed by a 1 × 1 kernel convolution with a stride size of 1 and a Tanh activation function. Throughout the process, we use convolutions with a stride size of 1 and therefore, the size of the feature map is the same as the size of the input image.

DDGANSE of the Discriminator
The DDGANSE of the discriminator is illustrated in Figure 4. In our DDGANSE architecture, we use two discriminators. Di is used to discriminate the fused image and IR image, and Dv is used to discriminate the fused image and visible image. Both discriminators are composed of four convolution layers and a linear layer. According to the experimental results presented in Table 1, the four convolutional layers use 3 × 3 convolution kernels and the Leaky ReLU activation function. In addition, batch normalization is added to the last three convolution layers and the step size of each convolutional layer is set to be 2. The architecture of discriminator is not similar to generator network. The reason is that the discriminator is a classifier which first obtains the feature maps from the input image and proceeds to categorize them. Thus, by assigning the stride to 2, it functions similar to a pooling layer. The linear layer discriminates the input against the features extracted from the first four convolutional layers and outputs the categorization probabilities. The contrast of the fusion image can be improved by confrontation training of the Di discriminator, and the texture details can be enriched by learning of the DV discriminator. The adversarial network containing two discriminators enables the generator to generate fused images that not only contain rich detailed information but also have significant contrast. For the dual discriminator part, our network structure is different from DDcGAN. We use four-layer convolution and add pool layer and linear layer. In summary, the performance of the fused image can be improved by utilizing two discriminators.

Loss Function
The loss function of our DDGANSE consists of two components: the loss function LG, and the loss functions LDi and LDv, which will be described one by one. The loss function of G (LG) consists of two terms and can be obtained as,

Loss Function
The loss function of our DDGANSE consists of two components: the loss function L G , and the loss functions L Di and L Dv , which will be described one by one. The loss function of G (L G ) consists of two terms and can be obtained as, in which L G is the general loss of the generator, and λ is applied to strike a balance between L DDGANSE(G) and L content . The first term on the right side of the equation, L DDGANSE(G) , means the adversarial loss between G, and D i and D v , and is defined as, where I n f used represents the fused image with n ∈ N N , N denotes the count of fused images, and a 1 and a 2 are the output probabilities of the discriminator, where a 1 corresponds to discriminator D i discriminating the fused image from the IR image, and a 2 corresponds to discriminator D v discriminating the fused image from the visible image. Since the generator is not expecting the discriminator to be able to discriminate between the fused image and the real image, both a 1 and a 2 are set to be 1. The second term, L content , represents the content loss. Our content loss function consists of two parts: contrast loss and gradient loss. The contrast loss constrains the contrast between the target object and the background, and the gradient loss causes the fused image to have more abundant texture information. Specifically, L content is defined as, In this case H and W represent the height and width of the input images, respectively. ε 1 and ε 2 are constants that ought to be modified to accomplish the primary and secondary correlation between the parameters in Equation (3).
The contrast loss is then defined as Additionally, the gradient loss is defined as where I fused is the fused image generated by the DDGANSE, I ir is the IR source image, I vis is the visible source image, · F indicates the matrix Frobenius norm, and ∇ is the gradient operator. The discriminators in DDGANSE are D V and D i ; D V is used to distinguish the source image from the visible image, and D i is used to discriminate the source image from the IR image. Through the adversary loss of the discriminators, the generator can generate a fusion image closer to the source image. The discriminator loss is defined as: where b, b i , and b v stand for the labels of the fused image I fused , IR image, I i , and visible image, Iv, respectively. D v (I v ) and D v (I f used ) stand for the classification results of the visible and fused images, separately; and D i (I i ) and D i (I f used ) indicate the classification outcomes of the IR and fused images, respectively.

Experiments
In this chapter, we evaluate the performance of the proposed DDGANSE using the TNO dataset. To evaluate the merits of our proposed method, we compare the imagefusing performance of our approach to seven published methods in the literature. These methods are LPP [26] , LP [27] , CVT [28] , DTCWT [29] , GTF [30] , CNN [31] , GAN-McC [32] , PMGI [33] , FusionGAN , DDcGAN, RFN-Nest [34] , and RCGAN [35] . We also provide an additional ablation experiment to compare the proposed DDGANSE with FusionGAN, GANMcC, PMGI, and DDcGAN. First, we introduce the TNO dataset. Second, we demonstrate the evaluation index used in the experiments. Finally, in order to evaluate the performance of DDGANSE, we analyze our approach according to qualitative and quantitative aspects.

Data
We evaluate DDGANSE and other comparison methods using the TNO dataset. The image pairs in the TNO dataset consisted mainly of various military scenes. In the TNO dataset, there are 60 infrared and visible image pairs, as well as three more serials involving 19, 32, and 23 image pairs. For testing, we used 16 image pairs from the TNO dataset. For training, we adopted overlapping cropping strategies for extension of the data set.

Training Details
The generator and discriminator are trained repeatedly, with a percentage of training times r. The batch size is represented by b, and it takes m steps to iterate through the entire training data, and the total number of training epochs is M. In practice, we set b = 32, r = 1/2, M = 10, and m as the ratio between the total number of patches and b. The original learning rate as 0.0001, and Adam is used as the optimizer. The overall training process is described in the summary of Algorithm 1. The experiments were carried out on a workstation in our laboratory, which was configured with an Intel ® Xeon(R) CPU E5-2620 v4 @ 2.10 GHz × 32 and equipped with two 12 GB TITAN Xp graphics cards.

Performance Metrics
The quality assessment of image fusion is a sophisticated problem and therefore, we conducted not just a qualitative assessment but also a quantitative one.
(1) Structural Similarity Index (SSIM) SSIM is applied to model the image loss and distortion, and it measures the structural similarity between the source and fused images. The larger the value of SSIM, the better the structure is maintained.
(2) Peak Signal-to-Noise Ratio (PSNR) PSNR is an image quality evaluation index based on error sensitivity. It is a fullreference image quality evaluation index. It is the most common and widely used objective image evaluation index.
PSNR is used to measure the ratio between the effective information and noise of an image, which can reflect whether the image is distorted. A larger PSNR value indicates less distortion of the image and a higher image quality. (

3) Entropy (EN)
EN measures the amount of information contained in a fused image. EN is the most commonly used metric to represent the amount of information in an image, and a large EN value indicates that the fused result contains a large volume of information.
(4) Sum of the Correlations of Differences (SCD) SCD is a similarity measurement. SCD will achieve the maximum value if images t and t' are identical. In contrast, SCD will achieve the minimum value −1 when the absolute difference is always equal to π. The SCD metric is primarily concerned with the difference between the fused image and the source image, and then measures the correlation between the difference and the other source image. To a certain extent, SCD can also evaluate pseudo-information. It means that the larger the SCD value, the less pseudo-information the image contains.
(5) Correlation Coefficient (CC) CC measures the degree of linear correlation between the fused image and source images. The larger the CC, the more similar the fused image is to the source images, and as a result, the better the fusion performance.
(6) Standard Deviation (SD) SD is defined based on the statistical concept and reflects the extent to which the values of individual pixels in the image vary from the average value. The regions with high contrast always attract the attention of the human eye, and the fused image with higher contrast often leads to a larger SD, which represents a better visual effect of the fused image.

Results for the TNO Dataset
(1) Qualitative Comparison Figure 5 presents the results of our proposed network architecture. The fused image shows that DDGANSE is clearly better than the other popular fusion methods, including FusionGAN , GANMcC , PMGI, and DDCGAN . On the one hand, the DDGANSE retains the characteristics of IR images with a rich thermal radiation information which can clearly identify the target from the background. This is crucial as most existing algorithms only have good texture details, but lose most of the thermal radiation information which leads to weak target detection. On the other hand, while reserving plenty of thermal radiation information, the results of DDGANSE still retain a high amount of texture details from visible images. The image display of the fusion results is shown in Figure 5. The four fusion algorithms mentioned above can be roughly divided into two types. In the first type, the fusion results retain the advantages of the IR images. Although there is a strong contrast between the background and the target, the texture details are not sufficiently rich, as shown in Figure 6. In the FusionGAN method in the first row and the third column and the PMGI method in the first row and fourth column, the object (person) and the background information (tree) have obvious contrast, but the texture details of the tree are unclear. In comparison (the fifth column in Figure 6), our proposed method contains more texture details. In the second type of fusion algorithm, the fused image retains more information from the visible light image. Although the result has rich texture information, the contrast between the target and the background is not obvious, e.g., the DDCGAN method in the first row and third column of Figure 7. In the resulting image, it is almost impossible to find objects (boats and people) in the background (forest). However, the details of the forest are relatively clear. In contrast, the method proposed in this study can clearly distinguish people and ships in the fusion results. In summary, compared with the other four methods, DDGANSE contains rich texture information on top of the obvious contrast, as shown in the first row and last column of Figure 5. In our results, the contrast between the person (the target) and the tree (the background) is very strong, and tree texture details are equally clear. The image display of the fusion results is shown in Figure 5. The four fusion algorithms mentioned above can be roughly divided into two types. In the first type, the fusion results retain the advantages of the IR images. Although there is a strong contrast between the background and the target, the texture details are not sufficiently rich, as shown in Figure 6. In the FusionGAN method in the first row and the third column and the PMGI method in the first row and fourth column, the object (person) and the background information (tree) have obvious contrast, but the texture details of the tree are unclear. In comparison (the fifth column in Figure 6), our proposed method contains more texture details. In the second type of fusion algorithm, the fused image retains more information from the visible light image. Although the result has rich texture information, the contrast between the target and the background is not obvious, e.g., the DDCGAN method in the first row and third column of Figure 7. In the resulting image, it is almost impossible to find objects (boats and people) in the background (forest). However, the details of the forest are relatively clear. In contrast, the method proposed in this study can clearly distinguish people and ships in the fusion results. In summary, compared with the other four methods, DDGANSE contains rich texture information on top of the obvious contrast, as shown in the first row and last column of Figure 5. In our results, the contrast between the person (the target) and the tree (the background) is very strong, and tree texture details are equally clear.  (2) Quantitative Assessment of Fusion Networks Quantification results for DDGANSE for different performance metrics is demonstrated in Table 2. According to performance metrics in Table 2, It is clear that our method of DDGANSE has the best fusion performance in comparison with other advanced methods, including LPP , LP , CVT , DTCWT , GTF , CNN , FusionGAN, GANMcC, PMGI, DDCGAN, RFN-Nest , and RCGAN . The SSIM is 0.86, PSNR is 19.60, EN is 7.09, SCD is 1.79, CC is 0.52, and SD is 0.13. The detail of the quantitative results can be seen in Figures  8 and 9. Based on the above results, the fusion method of DDGANSE has the obvious advantages of strong contrast and rich texture details over former methods.  (2) Quantitative Assessment of Fusion Networks Quantification results for DDGANSE for different performance metrics is demonstrated in Table 2. According to performance metrics in Table 2, It is clear that our method of DDGANSE has the best fusion performance in comparison with other advanced methods, including LPP , LP , CVT , DTCWT , GTF , CNN , FusionGAN, GANMcC, PMGI, DDCGAN, RFN-Nest , and RCGAN . The SSIM is 0.86, PSNR is 19.60, EN is 7.09, SCD is 1.79, CC is 0.52, and SD is 0.13. The detail of the quantitative results can be seen in Figures  8 and 9. Based on the above results, the fusion method of DDGANSE has the obvious advantages of strong contrast and rich texture details over former methods. (2) Quantitative Assessment of Fusion Networks Quantification results for DDGANSE for different performance metrics is demonstrated in Table 2. According to performance metrics in Table 2, It is clear that our method of DDGANSE has the best fusion performance in comparison with other advanced methods, including LPP , LP , CVT , DTCWT , GTF , CNN, FusionGAN, GANMcC, PMGI, DDCGAN, RFN-Nest , and RCGAN . The SSIM is 0.86, PSNR is 19.60, EN is 7.09, SCD is 1.79, CC is 0.52, and SD is 0.13. The detail of the quantitative results can be seen in Figures 8 and 9. Based on the above results, the fusion method of DDGANSE has the obvious advantages of strong contrast and rich texture details over former methods. Table 2. Quantification results for our proposed method and state-of-art fusion methods (the best value is marked in yellow, and the second-best value is marked in red).     (3) Advantages of the DDGANSE Architecture As seen in Table 3, each of the technological innovations in the DDGANSE method improves the performance of the image fusion and the performance of the network. To further evaluate our method, DDGANSE without two discriminators (only the generator), without one discriminator (generator with only one discriminator), and without SE (two discriminators but a generator without SE) are implemented individually. As shown in Figure 10, our networks are superior to the single-discriminator GAN. A network with only one discriminator can only retain the advantages of one source image in the fused image. For example, the advantages of richly detailed information from the visible image can be retained, but the contrast would be poor, as shown in the red boxes in the third and sixth columns in Figure 10. Alternatively, only the contrast advantage of the IR image can be retained while the texture details are neglected, as shown in the yellow boxes in the third and sixth columns in Figure 10. In this study, a dual discriminator network is presented to preserve both the detail and contrast of the source image in the fused image. In addition, we also include an SE module in the generator part of the network. In Figure 10, the sixth column is better than the fifth column, indicating that the SE module can help the network learn important feature information and improve the performance of the generator. This is because the SE can facilitate learning the correlation between channels, which allows the image generated by the generator to contain more information from the source image. The quantitative results are shown in Figure 11. It is clear that DDGANSE exhibits better performances. Table 3. Quantification results is for our proposed method with different innovative components. (The best value is marked in yellow, and the second-best value is marked in red). only one discriminator can only retain the advantages of one source image in the fused image. For example, the advantages of richly detailed information from the visible image can be retained, but the contrast would be poor, as shown in the red boxes in the third and sixth columns in Figure 10. Alternatively, only the contrast advantage of the IR image can be retained while the texture details are neglected, as shown in the yellow boxes in the third and sixth columns in Figure 10. In this study, a dual discriminator network is presented to preserve both the detail and contrast of the source image in the fused image. In addition, we also include an SE module in the generator part of the network. In Figure 10, the sixth column is better than the fifth column, indicating that the SE module can help the network learn important feature information and improve the performance of the generator. This is because the SE can facilitate learning the correlation between channels, which allows the image generated by the generator to contain more information from the source image. The quantitative results are shown in Figure 11. It is clear that DDGANSE exhibits better performances. Table 3. Quantification results is for our proposed method with different innovative components.

Method
(The best value is marked in yellow, and the second-best value is marked in red).

(4) Complexity Evaluation
We calculated the running time required to apply the above method for image fusion, which can illustrate the complexity of the DDGANSE.
In the training process, the generator and discriminator are trained many times. At this point, the overall number of parameters is the sum of the parameters in the generator and discriminator, which is approximately 3.168 M. These parameters require approximately 180 min for optimization. In the testing process, only the generator is used to generate fused images, and the number of parameters used for testing is 1.867 M. Since the generator is used to complete the task of image fusion, comparing the average running times of various methods during the testing phase gives an indication of the time complexity of the methods. The results are presented in Table 4. Even though the operational efficiencies of FusionGAN and LP are slightly higher than that of DDGANSE, as can be seen from Table 2, the fusion effect of our method is superior compared to both methods. Photonics 2022, 9, x FOR PEER REVIEW 14 of 16 Figure 11. Quantification results for DDGANSE for different performance metrics.

(4) Complexity Evaluation
We calculated the running time required to apply the above method for image fusion, which can illustrate the complexity of the DDGANSE.
In the training process, the generator and discriminator are trained many times. At this point, the overall number of parameters is the sum of the parameters in the generator and discriminator, which is approximately 3.168 M. These parameters require approximately 180 min for optimization. In the testing process, only the generator is used to generate fused images, and the number of parameters used for testing is 1.867 M. Since the generator is used to complete the task of image fusion, comparing the average running times of various methods during the testing phase gives an indication of the time complexity of the methods. The results are presented in Table 4. Even though the operational efficiencies of FusionGAN and LP are slightly higher than that of DDGANSE, as can be seen from Table 2, the fusion effect of our method is superior compared to both methods.

Conclusions
In the present study, an efficient infrared and visible image fusion method called DDGANSE is proposed. Based on the characteristics of visible and infrared images, it is desirable that the fused images have the high contrast of infrared images and rich texture details of visible images. The proposed DDGANSE method utilizes a generator with two paths and two discriminators to realize a fusion task. Extensive experiments on the TNO dataset indicated that the best performance can be achieved using our proposed DDGANSE method compared to other similar reported methods in the literature.

Conclusions
In the present study, an efficient infrared and visible image fusion method called DDGANSE is proposed. Based on the characteristics of visible and infrared images, it is desirable that the fused images have the high contrast of infrared images and rich texture details of visible images. The proposed DDGANSE method utilizes a generator with two paths and two discriminators to realize a fusion task. Extensive experiments on the TNO dataset indicated that the best performance can be achieved using our proposed DDGANSE method compared to other similar reported methods in the literature.