4.1. Improvement of the Adversarial Loss Objective Function
CycleGAN is not a face translation network, and may learn incorrect mappings during training, resulting in poor translation results. For example, with this problem, blurred facial features appear in the translated visible image.
The adversarial loss objective function in CycleGAN uses the cross-entropy loss function. The cross-entropy loss is mainly used for logical classification, pays too much attention to the accuracy of the prediction probability of the correct label during the training process, and ignores the difference in the incorrect label. This can lead to the saturation phenomenon, resulting in gradient dispersion. In addition, when the cross-entropy loss function is used as the objective function, when the discriminator discriminates the image as true, even if the image is far from the real data distribution, the generator will not optimize the image again. As a result, the generator learns an incorrect mapping; for example, the pictures generated by the generator are blurred. The objective function of the adversarial loss is changed from the cross-entropy loss function to the least-squares loss function [
20]. When the discriminator discriminates the image as true, it will be re-discriminated because the image is far from the real data distribution.
Improved adversarial loss:
The ai of the least-squares method for the discriminator is to minimize the objective function, where discriminates the authenticity of the laser image, and discriminates the authenticity of the generated laser image. discriminates the authenticity of the visible image, and discriminates the authenticity of the generated visible image.
4.2. Add Identity Loss Function
CycleGAN has a cycle consistency loss function based on cycle reconstruction, but the following situation is very easy to appear during training. After the previous generator on the same link learns the error map, the second generator also learns the error map, and the two error maps lead to a situation where the cycle consistency loss function is small but the generated picture is not effective. For example, in this experiment, blurry images and hairline mapping errors occur. To enhance the constraints on the generator and improve the quality of the generated images, the identity loss function is introduced, as shown in
Figure 9. The reference identity loss [
6] is used to enhance the hue accuracy; however, in face translation, the dataset used contains Asian faces, which have a single skin color and little color change. In this experiment, identity loss has no restrictions on skin color.
has the ability to generate visible domain images, and the distribution of the generated images after feeding the image into the should be as close as possible to the distribution of the input image . has the ability to generate laser domain images, and the distribution of the generated images after feeding the images into should be as close as possible to the distribution of input image . In training, the L1 norm distance between the input image and the generated image , and the L1 norm distance between the input image and the generated image are calculated, adding the two results together to find the identity loss.
Identity loss is defined as:
where
represents the input of image
into generator
, and
represents the input of image
into generator
.
is used to calculate the L1 norm distance between
and
, and
is used to calculate the L1 norm distance between
and
.
Total loss:
where
and
are the weights of the cycle consistency loss function and the identity loss function, respectively.
4.3. Experimental Results and Analysis
Experimental data were the laser and visible face datasets in
Section 2. The optimizer chose the Adam algorithm with parameters beta1 set to 0.5 and beta2 set to 0.999. The initial learning rate was 0.002, the learning rate of the first 100 epochs was 0.002, and the last 100 epochs were reduced at a rate of approximately 1% until reaching zero.
was the weight parameter of cycle consistency loss, set to 10,
was the weight parameter of identity loss, set to 5.
The experimental comparison results are shown in
Figure 10.
Figure 10a shows the input laser image,
Figure 10b shows the face translation results of CycleGAN,
Figure 10c shows the face translation results, adding the identity loss function, and
Figure 10d shows the face translation result of modifying the adversarial loss objective function.
Figure 10e shows the face translation result of adding the identity loss function and modifying the adversarial loss objective function, and
Figure 10f shows the real visible light image. Subjective vision and objective quantification were used to analyze the translation results. Using SSIM, PSNR, and FID as objective quantitative evaluation methods, the objective quantitative results are shown in
Table 3.
From the perspective of subjective vision, the image translation results of
Figure 10c,d have more obvious facial features than the image translation results of
Figure 10b. However, both
Figure 10c,d have a blurred hairline. Compared with the translation result of
Figure 10b, the translation results of
Figure 10e eliminate the problem of blurred images. At the same time, the facial features are relatively clear, with sufficient details, and the boundaries of hair and skin are clear. However, there is a little deviation in the mapping relationship of individual images, such as erythema on the forehead of the female image, and broken hair being translated into skin.
From the perspective of objective quantification, after adding the identity loss function to the CycleGAN model, the SSIM value and PSNR value are almost unchanged, but the FID value is reduced by 5.54. After the adversarial loss objective function for CycleGAN is modified to the least-square loss function, the SSIM value and PSNR value are almost unchanged, but the FID is reduced by 9.53. After adding the identity loss and improving the adversarial loss objective function to CycleGAN, the SSIM value of the improved model is similar, the PSNR is increased by 1, and the FID is decreased by 11.22.
For the translation of frontal faces, the improved network achieves good results, but the modified network has flaws in the profile translation. For example, the dividing line between hair and face is mistranslated. The results of profile translation were analyzed from the two perspectives of subjective vision and objective quantification.
Figure 11 shows the results of profile image translation, and
Table 4 shows the objective quantification results of profile translation.
From the perspective of subjective vision, comparing the translation results of
Figure 11b, the translation results of
Figure 11c are also blurred, and the overall picture is not clear enough. The image translation results of
Figure 11d are slightly better, eliminating the meaningless ripples in
Figure 11b, but the facial features are also not clear enough. Compared with the translation results of
Figure 11b, the translation results of
Figure 11e eliminate the problem of blurred images. At the same time, the facial features are relatively clear and have sufficient details, but the boundaries between hair and skin are mapped mistakenly.
From the perspective of objective quantification, after adding the identity loss function to CycleGAN, the SSIM value and PSNR value are almost unchanged, but the FID value is reduced by 20.9. After the adversarial loss objective function for CycleGAN is modified to the least-squares loss function, the SSIM value and PSNR value are almost unchanged, but the FID is reduced by 40.51. After adding the identity loss and improving the adversarial loss objective function for CycleGAN, the SSIM value of the improved model is increased by 3.9%, the PSNR is increased by 19.8%, and the FID is decreased by 23.75.
In sum, the modified model is more suitable for the translation of laser frontal face images, and the translation effect of laser side images is poor, so the network needs to be improved again.