HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network

Deng, Xu; Zhang, Hao; Li, Xiaojie

doi:10.3390/electronics12163418

Open AccessArticle

HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network

by

Xu Deng

¹,

Hao Zhang

² and

Xiaojie Li

^2,*

¹

School of Computer Science, University of Sydney, Sydney, NSW 2006, Australia

²

College of Computer Science, Chengdu University of Information Technology, Chengdu 610103, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3418; https://doi.org/10.3390/electronics12163418

Submission received: 6 July 2023 / Revised: 26 July 2023 / Accepted: 2 August 2023 / Published: 11 August 2023

(This article belongs to the Special Issue Research Advances in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

To address the problems of low resolution, compression artifacts, complex noise, and color loss in image restoration, we propose a High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network (HPG-GAN). This mainly consists of Coarse Restoration Sub-Network (CR-Net) and Fine Restoration Sub-Network (FR-Net). HPG-GAN extracts high-quality structural and textural priors and facial feature priors from coarse restoration images to reconstruct clear and high-quality facial images. FR-Net includes the Facial Feature Enhancement Module (FFEM) and the Asymmetric Feature Fusion Module (AFFM). FFEM enhances facial feature information using high-definition facial feature priors obtained from ArcFace. AFFM fuses and selects asymmetric high-quality structural and textural information from ResNet34 to recover overall structural and textural information. The comparative evaluations on synthetic and real-world datasets demonstrate superior performance and visual restoration effects compared to state-of-the-art methods. The ablation experiments validate the importance of each module. HPG-GAN is an effective and robust blind face deblurring and restoration network. The experimental results demonstrate the effectiveness of the proposed network, which achieves better visual quality against state-of-the-art methods.

Keywords:

high-quality prior; blind face restoration; facial feature enhancement; asymmetric feature fusion

1. Introduction

Face restoration aims to recover clear and high-quality faces from unknown degraded low-quality facial images. In real-life scenarios, facial image degradation arises from various factors, including low resolution, blurriness, noise, and compression artifacts. The main challenges of facial image degradation are accurately simulating and restoring diverse degradation processes while effectively handling high-dimensional features and complex facial structures, necessitating the design of suitable network architectures and optimization algorithms. Balancing restoration quality and computational efficiency is crucial to meet practical application requirements.

Blind face restoration (BFR) is a typically ill-posed problem and aims at constructing realistic and faithful high-quality face images from unknown degraded ones. In real-world scenarios, low-quality face images often suffer from multiple unknown degradation factors, including blur [1], noise [2,3], low resolution [4], compression artifacts [5], etc., causing a significant loss of rich and unique identity information [6,7]. In particular, these low-quality face images also seriously hinder the development of tasks in face alignment [8], face recognition [9], and face inpainting [10,11]. Therefore, how to restore high-quality face images has become a challenging project in image processing and computer vision communities.

The primary challenge in BFR is the precise simulation and restoration of complex and diverse degradation processes. Furthermore, effectively dealing with high-dimensional features and intricate facial structures necessitates the design of suitable network architectures and optimization algorithms. Furthermore, BFR needs to achieve a dynamic balance between restoration quality and computational efficiency in real-life scenarios.

Traditional facial restoration methods [12] have partially addressed the problem of facial image restoration. However, they suffer from several limitations. For instance, they require pre-defining the type of degradation, making it challenging to handle unknown degradation types. Moreover, these methods often adopt handcrafted feature extractors, leading to the need for redesigning and selecting feature extractors when dealing with different facial images, thus hampering scalability. Deep-learning-based approaches have made significant progress in facial image deblurring and restoration in recent years with deep learning development and large-scale dataset availability. Compared to traditional methods, these approaches can learn more complex and abstract features, enhancing model robustness and generalization performance and achieving well deblurring and restoration effects. These methods typically rely on facial geometry priors, reference priors, and generative priors to restore realistic texture details during facial restoration. However, these priors are often based on generated or searched images (as illustrated in Figure 1), inevitably degrading with extremely low-quality inputs from the real world, resulting in restored faces lacking authentic texture details, thereby limiting their applicability in real-life scenarios. To overcome the limitations of existing solutions, this paper propose a High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network (HPG-GAN), as illustrated in Figure 2. In comparison to traditional methods (Figure 1), it incorporates a pre-trained Coarse Restoration Sub-Network (CR-Net) to restore complexly degraded facial images, denoted as

I_{D}

, producing a coarse restoration result, denoted as

I_{R}

.

I_{R}

provides higher-quality asymmetric structural texture information and facial feature details than

I_{D}

, thereby guiding the subsequent Fine Restoration Sub-Network (FR-Net) to recover more realistic, natural, and harmonious clear facial images. In FR-Net, we designed a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM). The former can guide the enhancement of facial feature information in the fine restoration process by using high-definition facial feature priors. The latter can fuse and filter the extracted high-quality asymmetric structure and texture information and embed them into the fine restoration process to facilitate the reconstruction of overall structure and texture information. Extensive quantitative and qualitative comparative experiments conducted on multiple public datasets and real-world scenarios validate the effectiveness and feasibility of the proposed HPG-GAN, offering valuable insights for practical applications.

2. Our Method

The generator network architecture of HPG-GAN is illustrated in Figure 2. It mainly consists of Coarse Restoration Sub-Network (CR-Net) and Fine Restoration Sub-Network (FR-Net). We designed a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM) in FR-Net. FFEM can guide the enhancement of facial feature information in the fine restoration process by using high-definition facial feature priors (such as the position, shape, and size of face contours, eyes, nose, mouth, and other parts). AFFM can fuse and filter the extracted high-quality asymmetric structure and texture information and embed them into the fine restoration process to facilitate the reconstruction of overall structure and texture information.

Given an input facial image,

I_{D}

, suffering from unknown degradation, it progressively generates a coarse restoration image,

{\hat{I}}_{R}

, and a fine restoration image,

I_{R}

, with rich texture details. The

I_{R}

is obtained by jointly learning from the ResNet34 [13] (Texture Encoder in Figure 3) and ArcFace (Face Identity Encoder in Figure 2) networks to provide high-quality asymmetric feature information and facial feature details

Z_{i d} ({\hat{I}}_{R})

for the subsequent satisfactory restoration sub-network FR-Net. The Fine Restoration Network, FR-Net, depicted in Figure 3, incorporates a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM) to utilize the extracted high-quality priors and guide the reconstruction of high-quality facial images,

I_{R}

. In the following sections, each critical module and loss function will be described in detail.

Our goal is to construct realistic and faithful high-quality face images through the proposed High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network (HPG-GAN). The generator framework of HPG-GAN, as shown in Figure 2, inputs a low-quality face image,

I_{D}

, suffering from unknown degradation, and it can successively generate a coarse restoration image,

{\hat{I}}_{R}

, and a fine restoration image,

I_{R}

. Among them,

{\hat{I}}_{R}

can extract higher-quality priors such as asymmetric structural and facial features using pre-trained ResNet34 [13] (Texture Encoder in Figure 3) and ArcFace (Face Identity Encoder in Figure 2), which help the FR-Net restore high-quality face images,

I_{R}

, with rich texture details. Meanwhile, as shown in Figure 3, FR-Net incorporates a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM) to utilize the extracted high-quality priors and guide the reconstruction of high-quality facial images,

I_{R}

. In this section, we first detail the critical modules of HPG-GAN and then describe its loss functions.

2.1. Coarse Restoration Sub-Network (CR-Net)

In real-world face blind restoration tasks, one has to deal with complex and severe degradation, typically a combination of low resolution, blur, noise, and JPEG artifacts. So, it is challenging to directly extract texture and facial feature information from a low-quality input image,

I_{D}

, the extracted information is also degraded and not helpful for generating a clear image,

I_{R}

, in subsequent steps, and it will create certain interference. Hence, in this study, we first pass

I_{D}

through a Coarse Restoration Sub-Network (CR-Net)

G_{c}

to obtain a coarse restoration result,

{\hat{I}}_{R}

.

In the wild, blind face restoration requires the removal of complex and severe degradation factors such as low resolution, blur, noise, and JPEG artifacts. Therefore, it is challenging to directly extract texture and facial feature priors from low-quality input images. Moreover, these priors extracted directly from degraded inputs are unlikely to avoid degradation, which does not help restore high-quality images and introduces certain interference [5]. Hence, in this study, we first pass

I_{D}

through CR-Net

G_{c}

to obtain a coarse restoration result,

{\hat{I}}_{R}

.

{\hat{I}}_{R} = G_{c} (I_{D})

(1)

where

G_{c} (\cdot)

is constructed based on the proposed MPFD-GAN [14], which can generate more realistic restoration results by one feedforward process without requiring extra priors. This is helpful for the following reasons: (1) helps remove various degradation factors and alleviate the burden of subsequent restoration of higher-quality face images [15]; (2) extracts the critical features and structural information from

I_{D}

; and (3) generates a coarse restoration image as input for the subsequent FR-Net. Additionally, to have intermediate supervision for removing degradation, L1 reconstruction loss is applied to supervise the image reconstruction process at each resolution scale of

G_{c}

. Additionally, to have intermediate supervision for removing degradation, L1 reconstruction loss is applied to supervise the image reconstruction process at each resolution scale of

G_{c}

. Specifically, we also output images for each resolution scale of the

G_{c}

decoder and then constrain these outputs close to the pyramid of the ground-truth image. In Equation (1), the Coarse Restoration Sub-Network

G_{c}

is constructed based on the proposed MPFD-GAN framework. Additionally, to have intermediate supervision for removing degradation, L1 reconstruction loss is applied to supervise the image reconstruction process at each resolution scale of

G_{c}

. The decoder’s output images at each resolution scale are reconstructed and constrained to resemble the ground-truth images at respective scales.

Furthermore, the coarse restoration result,

{\hat{I}}_{R}

, is passed through jointly learned ArcFace and ResNet34 networks to extract “clean” facial feature information,

Z_{i d}

, and texture information,

T E_{1}^{o u t}

, …,

{T E}_{1}^{o u t}

, as high-quality priors for guiding the Fine Restoration Sub-Network (FR-Net) in recovering a realistic, clear, and visually detailed facial image. Subsequently, the coarse restoration result,

{\hat{I}}_{R}

, is passed through pre-trained ArcFace and ResNet34 to extract “clean” facial feature information,

Z_{i d}

, and asymmetric texture information,

T E_{1}^{o u t}

, …,

{T E}_{4}^{o u t}

, as high-quality priors for guiding the FR-Net in restoring a realistic, clear, and visually detailed face image.

z_{id} = ArcFace ({\hat{I}}_{R})

(2)

{TE}_{1}^{out}, \dots, {TE}_{4}^{out} = Res Net ({\hat{I}}_{R})

(3)

2.2. Facial Feature Enhancement Module (FFEM)

The Facial Feature Enhancement Module (FFEM) is an essential part of the Fine Restoration Sub-Network (FR-Net). Its primary function is to incorporate high-quality facial feature priors extracted from the preliminary restored image,

{\hat{I}}_{R}

, generated by the Coarse Restoration Network into the FR-Net, aiming to enhance facial texture details. Specifically, the FFEM utilizes facial feature information from the high-quality facial feature vectors,

Z_{i d}

, extracted by a facial recognition model (ArcFace). This information includes the positions, shapes, and sizes of facial components, such as facial contours, eyes, nose, mouth, etc. These facial feature priors are then embedded into the intermediate feature layers of the FR-Net. By incorporating these facial feature priors, the FR-Net is guided to restore facial texture details more accurately, thereby improving the quality of the final reconstructed image.

The Facial Feature Enhancement Module (FFEM) is essential to the Fine Restoration Sub-Network (FR-Net). Its main effect is to embed high-quality facial feature priors extracted from the coarse restored image,

{\hat{I}}_{R}

, into the FR-Net to enhance facial texture details. Specifically, FFEM utilizes the facial feature information in the high-quality face feature vector,

Z_{i d}

, extracted by the face recognition model (ArcFace) and embeds this information into the middle layer features of the FR-Net. This facial feature information contains the position, shape, and size of facial components, such as facial contours, eyes, nose, mouth, etc., which can guide the FR-Net to restore facial texture details more accurately, thus improving the quality of the final reconstructed image.

A detailed structure of FFEM is illustrated in Figure 3, consisting of multiple Facial Feature Enhancement Blocks (FFEBs) connected in series. The FFEB is achieved by paralleling an Adaptive Instance Normalization (AdaIN) at the Batch Normalization [16] point in the ResNet Block [13]. The AdaIN [17] can be represented as follows:

AdaIN (F_{in}, Z_{id}) = σ (Z_{id}) \frac{Fea - μ (F_{in})}{σ (F_{in})} + μ (Z_{id})

(4)

where

σ (Z_{i d})

and

μ (Z_{i d})

represent the channel-wise mean and standard deviation of the facial features

Z_{i d}

, respectively. Similarly,

σ (F_{i n})

and

μ (F_{i n})

represent the channel-wise mean and standard deviation of the input features,

F_{i n}

. Therefore, after taking the information features

F_{i n}

and

Z_{i d}

, FFEB can generate an output feature,

F_{o u t}

, that contains enhanced facial features.

2.3. Asymmetric Feature Fusion Module (AFFM)

In traditional image deblurring and restoration networks that operate from coarse to fine, the finer sub-networks only utilize features from coarser sub-networks, leading to limited information flow. Another approach is to cascade the entire network horizontally or vertically, allowing information to flow both top–down and bottom–up [18]. Inspired by the work of Cho et al. [19], this chapter introduces the Asymmetric Feature Fusion Module (AFFM), as shown in Figure 4, which enables the flow of edge and texture information from different scales within a single network. Compared to previous works, the AFFM incorporates a simplified channel attention mechanism (SCA) after concatenating the asymmetric features from different scales, enhancing useful channel information while suppressing irrelevant channel information.

Specifically, the AFFM module takes the high-quality structure and texture priors extracted from

{\hat{I}}_{R}

, denoted as

{T E}_{1}^{o u t} \in R^{C \times H \times W}

,

{T E}_{2}^{o u t} \in R^{2 C \times \frac{2}{H} \times \frac{2}{w}}

,

{T E}_{3}^{o u t} \in R^{4 C \times \frac{4}{H} \times \frac{4}{w}}

, and

{T E}_{4}^{o u t} \in R^{8 C \times \frac{8}{H} \times \frac{8}{w}}

. These features have different scales. The features are resized to the desired size using bicubic interpolation and concatenated together. This concatenated feature map is further processed through a simplified channel attention module to fuse and select the information. The output of the AFFM module includes

{A F F M}_{1}^{o u t} \in R^{8 C \times \frac{8}{H} \times \frac{8}{w}}

,

{A F F M}_{2}^{o u t} \in R^{4 C \times \frac{4}{H} \times \frac{4}{w}}

,

{A F F M}_{3}^{o u t} \in R^{2 C \times \frac{2}{H} \times \frac{2}{w}}

, and

{A F F M}_{4}^{o u t} \in R^{C \times H \times w}

. Here, C, H, and W represent the number of channels and the spatial dimensions of the features. The entire process of AFFM can be represented by the following formula:

\begin{matrix} {AFFM}_{1}^{out} & = {AFFM}_{1} [{({TF}_{1}^{out})}^{↓}, {({TF}_{2}^{out})}^{↓}, {({TF}_{3}^{out})}^{↓}, {TF}_{4}^{out}] \\ {AFFM}_{2}^{out} & = {AFFM}_{2} [{({TF}_{1}^{out})}^{↓}, {({TF}_{2}^{out})}^{↓}, {TF}_{3}^{out}, {({TF}_{4}^{out})}^{↑}] \\ {AFFM}_{3}^{out} & = {AFFM}_{3} [{({TF}_{1}^{out})}^{↓}, {TF}_{2}^{out}, {({TF}_{3}^{out})}^{↑}, {({TF}_{4}^{out})}^{↑}] \\ {AFFM}_{4}^{out} & = {AFFM}_{4} [{TF}_{1}^{out}, {({TF}_{2}^{out})}^{↑}, {({TF}_{3}^{out})}^{↑}, {({TF}_{4}^{out})}^{↑}] \end{matrix}

(5)

where ↑ and ↓ represent upsampling and downsampling operations, respectively.

A F F M_{i}

denotes the i-th AFFM module. With this design, the Fine Restoration Network, FR-Net, can effectively utilize high-quality structure and texture priors at different scales to reconstruct high-quality facial images. This allows for the recovery of clear facial images with complete structure and rich textures. Within the AFFM module, the SCA first takes the concatenated input feature map,

F_{i n}

, and applies global average pooling followed by a “1 × 1” convolution to generate a feature weight vector,

F_{w}

. Subsequently, the input,

F_{i n}

, is enhanced and suppressed based on the feature weights,

F_{w}

, to produce the output,

F_{o u t}

. The following formula can represent this process:

F_{w} = C o n v_{1 \times 1} [G A P (F_{i n})]

(6)

F_{o u t} = F_{w} + F_{w} \times F_{i n}

(7)

where

G A P (\cdot)

represents global average pooling, and

C o n v_{1 \times 1}

represents the operation of a

11

convolution.

2.4. Loss Functions

Most face restoration works optimize mean square error (MSE) against the target image ground truth, which often results in blurry output lacking texture details [15]. However, the key to facing restoration is achieving high semantic fidelity and visual realism in the recovered face image; slight pixel-level differences are usually tolerable. Therefore, in this chapter, the overall loss optimization of HPG-GAN is based on adversarial loss,

L_{G A N}

; multi-scale feature matching loss,

L_{F M}

; perceptual loss,

L_{p e r}

; high-frequency texture loss,

L_{e d g e}

; and identity feature loss,

L_{i d}

, to ensure both visual quality and identity preservation.

Adversarial Loss: In this study, a variant of LSGAN [20] generates more precise and detailed face images while mitigating common issues in GAN training, such as mode collapse and mode collapse. The corresponding loss can be represented as follows:

L_{G A N} = E [{∥log (D (I_{g t}) - 1∥}_{2}^{2}] + E [{∥log (D (I_{R}) - 1∥}_{2}^{2}]

(8)

where the discriminator

D (\cdot)

tries to attach input

I_{R}

to the real label.

Multi-Scale Feature Matching Loss: In order to preserve image details and structure effectively while avoiding overly smooth or blurry generated images, this part introduces the multi-scale feature matching loss [21]. Specifically, the approach utilizes features from intermediate layers of a multi-scale discriminator as the basis for matching. The difference between the features of the generator’s output,

I_{R}

, and the corresponding features of the ground-truth image,

I_{g t}

, at layer

i \in 1, 2, 3

of the discriminator is calculated and used as the loss. This can be represented as follows:

L_{F M} = \sum_{i = 1}^{N} \frac{1}{N} [∥D_{i} (I_{g t}) - D_{i} (I_{R})∥]

(9)

Perceptual Loss: In the task of blind face image restoration, the generated image may lack the structural information of the original image, resulting in insufficient clarity or distortion in the final result. To address this issue, perceptual loss is introduced in this experiment. It involves utilizing a pre-trained VGG network [22] to compare the feature representations of the generated image and the original image, ensuring that the generated image closely resembles the original image in terms of structure. The following equation can express it:

L_{p e r} = \sqrt{{∥ϕ (I_{R}) - ϕ (I_{g t})∥}^{2} + ε^{2}}

(10)

where

ϕ (\cdot)

represents a pre-trained VGG network that extracts feature information at various scales from the images.

ε

is a minimal number used to avoid the derivative of the loss function becoming infinite when

∥ϕ (I_{R}) - ϕ (I_{g t})∥ = 0

.

Texture Loss: To restore the details and texture information in the reconstructed facial images, a high-frequency texture loss is introduced based on the Fourier transform. This loss aims to prevent the occurrence of excessive smoothing in the images and enhance the realism and clarity of the restored images. This can be represented as follows:

L_{e d g e} = \sqrt{{∥F (I_{R}) - F (I_{g t})∥}^{2} + ε^{2}}

(11)

where F represents the fast Fourier transform (FFT), which is used to convert the image into the frequency domain.

Identity Feature Loss: In order to preserve consistency in terms of identity features between the generated image and the original image and to avoid distortions or distortions in the generated image, this study introduces the identity feature loss, which can be represented as follows:

L_{i d} = \sqrt{{∥φ (I_{R}) - φ (I_{g t})∥}^{2} + ε^{2}}

(12)

where

φ (\cdot)

represents the identity feature extractor, such as ArcFace. Therefore, the total loss of HPG-GAN can be represented as:

L_{t o t a l} = L_{G A N} + λ_{F M} L_{F M} + λ_{p e r} L_{p e r} + λ_{e d g e} L_{e d g e} + λ_{i d} L_{i d}

(13)

where

λ_{F M}

,

λ_{p e r}

,

λ_{e d g e}

, and

λ_{i d}

are hyperparameters that balance the various losses and are empirically set to 10, 10, 0.1, and 1, respectively, based on previous work by Cho et al. [19].

3. Experiments

3.1. Dataset

Training Set: This study used the FFHQ dataset [23] for training, which consists of 70,000 high-quality face images. During training, all face images were resized to a uniform size of

256 \times 256

. In addition to the FFHQ dataset, synthetic low-quality images resembling real-world scenarios were generated to train the HPG-GAN model. The synthesis process followed the degradation model proposed by Li et al. [24,25], and the process is as follows:

I_{D} = {[(I_{g t} \otimes K_{σ}) ⇓_{r} + n_{δ}]}_{J P E G_{q}}

(14)

where the high-quality image,

I_{g t}

, undergoes convolution with a Gaussian blur kernel,

K_{σ}

. Then, the image is downsampled with a scaling factor of r. Additive Gaussian white noise,

n_{δ}

, is introduced to the image, and finally, the image is compressed using the quality factor q in JPEG format. Similar to previous methods [24], for each training pair, the values of

σ

, r,

δ

, and q are randomly sampled from the ranges 0.2:10, 1:8, 0:15, and 60:100, respectively. Additionally, color jittering and motion blur kernels generated from the techniques are incorporated during the training process to simulate degraded images closer to real-world scenarios.

Validation Set: The validation set in this study consists of 10,000 randomly selected high-quality face images from the CelebA-HQ dataset [26]. The images in the validation set are synthesized using the same procedure as the training set, which includes applying Gaussian blur, downscaling with a scale factor, adding Gaussian white noise, and compressing with a quality factor.

Test Set: This study constructed a test set consisting of a synthetic dataset and three different real-world datasets. The specific composition of each test dataset is as follows:

CelebA-Test: This dataset comprises 20,000 high-quality images synthesized from the remaining images in the CelebA-HQ dataset. The synthesis process is consistent with the training process.
CelebA-TestN: This dataset consists of 1000 real-world low-quality face images collected by Yang et al. [27] from the internet. It is used to evaluate the model’s generalization in real-world scenarios.
WebPhoto-Test: This test dataset comprises 188 low-quality photos from real-life situations collected online. From these photos, 407 faces were extracted to construct the test dataset [28]. The degradation levels of these photos vary, with some being old photos with severe degradation in detail and color.
CelebChild-Test: This dataset includes 180 images of celebrity children’s faces collected from online. These are low-quality images, and many are old black-and-white photos [28].

None of these datasets overlap with the training or validation data set.

3.2. Implementation Details

The training process is divided into two stages. The first stage is to train the coarse recovery sub-network. This stage sets the training batch size to 8, 12, and 14 and uses AdamW as the optimizer. The initial learning rate is set to

1 \times 10^{- 3}

, and the multi-step LR optimization strategy is gradually reduced to

1 \times 10^{- 4}

. In the second stage, the fine recovery sub-network is trained with a batch size of 4. The AdamW optimizer has an initial learning rate of

1 \times 10^{- 3}

, which decreases to

1 \times 10^{- 5}

as the training progresses until the model converges. All the model code is implemented using the PyTorch framework (https://pytorch.org/) and deployed on a Linux server with a 12 GB NVIDIA RTX 2080Ti GPU (Nvidia Corporation, Santa Clara, CA, USA) for training. To ensure accuracy and fairness, the code for all the comparative experimental methods is utilized as provided by the respective authors or official sources. The experimental setup described above aims to ensure consistent and reliable training conditions and fair comparisons between different methods in evaluating the HPG-GAN model.

3.3. Comparative Experiments on the Synthetic Dataset

In order to validate the effectiveness of HPG-GAN, this study conducted quantitative and qualitative comparisons between HPG-GAN and the best-performing methods in blind face restoration (BFR) and its sub-tasks on synthetic and real datasets. The sub-tasks include face artifact removal (FAR), where ARCNN [29] and FBCNN [30] were evaluated; face denoising (FDN), where RIDNet [3] and VDNet [2] were evaluated; face super-resolution (FSR), where LESRCNN [31] and BSRGAN [32] were evaluated; and blind face deblurring (BFD), where DeblurGAN-v2 [33], MIMO [19], and DGUNet [34] were evaluated. For the overall blind face restoration task, the state-of-the-art (SOTA) methods UMSN [35], HiFace [5], DFDNet [25], GFPGAN [28], and NAFNet [36] were compared as benchmarks.

Quantitative Comparison: The evaluation metrics used in this study, as in other studies [35,37], included the following: non-reference perceptual metrics FID [38] and NIQE [39]; pixel-level evaluation metrics PSNR and SSIM [40]; the perceptual evaluation metric LPIPS [41]; as well as face restoration evaluation metrics FS. and MNE [42]. FID and NIQE are employed to measure the realism of the restoration results; LPIPS compares the differences between image patches; PSNR measures the pixel-level distances; SSIM evaluates the similarity in terms of structure, contrast, and brightness; FS. assesses the facial similarity between the restored images and the ground-truth images; and MNE is used to evaluate the error in reconstructing facial contours in the restored face images.

These metrics effectively evaluate the quality and fidelity of the restored facial image from different perspectives. For example, FID represents the distance between the feature vector of the generated image and that of real image. The closer the distance, the more accurate the generation model and the better the effect, that is, the image has high quality and rich diversity; PSNR is mainly used to measure the difference between two images, such as a compressed image and an original image, to evaluate the quality of the compressed image, restored image, and ground truth to evaluate the performance of the restoration algorithm, and so on. SSIM emulates the human visual system (HVS) to implement the relevant theory of structural similarity and is sensitive to the perception of local structural changes in images; LPIPS is used to learn to generate reverse mapping from images to ground truth. Learning an inverse map reconstructs real images from fake images and prioritizes the perceptual similarity between them.

The quantitative evaluation results on the CelebA-Test dataset are presented in Table 1. In the table, ↑ indicates a higher value is desirable for the metric, while ↓ indicates a lower value is desirable. For ease of observation, the best-performing evaluation metrics are highlighted in bold, and the second-best evaluation metrics are underlined. Accordingly, it is evident from Table 1 that HPG-GAN achieves the lowest LPIPS, indicating that the results of this study are perceptually closest to the ground truth (GT). Furthermore, the proposed HPG-GAN also obtains the lowest FID and the second-lowest NIQE, suggesting that the restoration results of this study are closest to the distributions of real faces and natural images. Additionally, HPG-GAN preserves better identity information, achieving the second-best facial similarity FS. evaluation score of 0.83. The proposed HPG-GAN achieves the lowest MNE regarding facial contour restoration, indicating the best performance in reconstructing facial contours. It is worth noting that the pixel-level evaluation metrics, PSNR and SSIM, show poor correlation with subjective evaluations by human observers. Although the model designed in this study may not excel in these two metrics, the difference from the best-performing metrics is minimal.

Qualitative Comparison: To visually validate the effectiveness of HPG-GAN, this study presents a visualization comparison of its results with various state-of-the-art (SOTA) methods on the CelebA-Test dataset. Due to page limitations, only the results of the best-performing methods in each facial restoration branch task and the overall blind face restoration task, based on non-reference perceptual metrics, are shown here. The visual comparison results are depicted in Figure 5, where (a) represents the degraded input image; (b)–(h) show the restoration results of FBCNN, VDNet, BSRGAN, DeblurGAN-v2, HiFace, DFDNet, and GFPGAN; (i) represents the restoration result of the proposed HPG-GAN; and (j) represents the ground truth (original clear image).

The following observations can be made from the visual comparison: (1) Thanks to the high-quality structural texture prior and facial feature prior, HPG-GAN successfully restores the complete and harmonious facial contours, as well as realistic details, such as the hair, teeth, and eyes. (2) Compared to other methods, HPG-GAN also produces restoration results with colors closest to the ground truth (GT), as demonstrated in the third and fifth rows of the figure. (3) Furthermore, HPG-GAN preserves better fidelity and generates images closer to real faces than other methods, as shown in the fourth row of the figure. In summary, the proposed HPG-GAN outperforms other SOTA methods regarding quantitative evaluation metrics on the synthetic dataset. Moreover, HPG-GAN successfully restores facial contours and details, such as eyes, teeth, and hair, resulting in visually superior results.

3.4. Comparative Experiments on the Real Dataset

To evaluate the generalization ability of HPG-GAN, this study conducted quantitative and qualitative comparisons with the best-performing methods in blind face restoration tasks on three different real-world datasets. Since there is no corresponding ground truth (GT) images for real datasets, non-reference perceptual evaluation metrics, namely FID and NIQE, were chosen for quantitative assessment. Smaller values of these metrics indicate that the facial image restoration results are closer to real faces and the distribution of natural images.

Quantitative Comparison: The quantitative comparison results are presented in Table 2, showcasing its excellent generalization ability and demonstrating the outstanding performance of HPG-GAN on the CelebA-TestN, WebPhoto-Test, and CelebChild-Test datasets. Although DFDNet also achieved relatively high perceptual quality (with a low NIQE score), it failed to enhance the color of facial images, as shown in Figure 5d and Figure 6d.

Qualitative Comparison: The qualitative comparison results are shown in Figure 6, where (a) represents a facial image suffering from unknown degradation in a real-world scenario; (b)–(f) represent the restoration results of UMSN, HiFace, DFDNet, GFPGAN, and NAFNet, respectively; and (g) represents the restoration result of HPG-GAN proposed in this study. It can be observed that HPG-GAN, leveraging high-quality prior knowledge of facial contours and textures, achieves high-quality restoration and color enhancement of photos in real-life scenarios. For example, it exhibits bright piercing eyes and natural skin tones. HPG-GAN can produce believable and realistic faces in complex real-world degradation. In contrast, other methods fail to recover realistic facial colors (as shown in the second and fifth rows of Figure 6) and harmonious facial details (such as the eye regions in the third and sixth rows of Figure 6 and the beard region in the first row).

In conclusion, the proposed HPG-GAN in this study outperforms other state-of-the-art methods in terms of comprehensive quantitative evaluation metrics on three real-world scenario datasets. Moreover, HPG-GAN achieves high-quality restoration and color enhancement of facial images in real-life scenarios, resulting in superior visual effects.

3.5. Ablation Experiments

To validate the effectiveness of the Coarse Restoration module (CR), Facial Feature Enhancement Module (FFEM), and Asymmetric Feature Fusion Module (AFFM) of HPG-GAN, ablation experiments were conducted on the CelebA-Test dataset. Table 3 shows the quantitative comparison results, where × denotes the removal of the corresponding module and √ denotes keep of the corresponding module. It can be observed that the complete HPG-GAN outperforms the other three results with individual modules removed in all evaluation metrics, demonstrating the importance of CR, FFEM, and AFFM in HPG-GAN. Specifically, the Coarse Restoration module (CR) improves overall performance. It first restores a coarse initial result, providing high-quality texture and facial feature priors for subsequent modules, thus alleviating the pressure in the later stages. The Asymmetric Feature Fusion Module (AFFM) contributes to improving pixel-level evaluation metrics, such as SSIM and PSNR, as well as non-reference perceptual evaluation metrics, such as FID and NIQE. Furthermore, the Facial Feature Enhancement Module (FFEM) performs best in enhancing facial-related evaluation metrics, such as FS. and MNE, indicating its effectiveness in enhancing facial features. Therefore, the Coarse Restoration module (CR), Facial Feature Enhancement Module (FFEM), and Asymmetric Feature Fusion Module (AFFM) play essential roles in enhancing the blind face restoration of HPG-GAN, and their effective combination enhances the overall performance of the model.

4. Conclusions

We proposed an HPG-GAN, a face blind restoration network guided by high-quality priors, to reconstruct high-quality clear face images by extracting high-quality structural and texture priors from coarse restoration images. The method consists primarily of a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM). The FFEM enhances facial feature information during the fine restoration process using high-resolution facial feature priors extracted by ArcFace, including position, shape, and size of facial contours, eyes, nose, mouth, etc. The AFFM combines and selects asymmetric high-quality structural and texture information extracted from ResNet34, embedding it into the fine restoration process to facilitate the overall recovery of structural and texture information. Additionally, we created training, validation, and testing datasets that closely resemble real-world degradation scenarios. Compared with various state-of-the-art face image deblurring and restoration networks, our proposed HPG-GAN achieved the best performance in terms of metrics on synthetic datasets, such as CelebA-Test, and real datasets, including LFW-Test, CelebChild-Test, WedPhoto-Test, and CelebA-TestN. It also demonstrated superior visual restoration results. Finally, through ablation experiments for each module, we validated the necessity of each module in HPG-GAN. In conclusion, HPG-GAN is a compelling face image blind deblurring and restoration network with robustness and good generalization in practical application scenarios.

Author Contributions

Methodology, X.D.; Software, H.Z.; Validation, H.Z. and X.L.; Writing—original draft, H.Z.; Writing—review & editing, X.D. and X.L.; Visualization, X.D.; Supervision, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Sichuan Science and Technology Program (grant nos.: 2023ZHCG0018, 2023NSFSC0470, 2021YFQ0053, 2020JDTD0020, 2022YFG0026, and 2021YFG0018) and partially supported by the Opening Foundation of Agile and Intelligent Computing Key Laboratory of Sichuan Province.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shen, Z.; Lai, W.S.; Xu, T.; Kautz, J.; Yang, M.H. Exploiting semantics for face image deblurring. Int. J. Comput. Vis. 2020, 128, 1829–1846. [Google Scholar] [CrossRef] [Green Version]
Yue, Z.; Yong, H.; Zhao, Q.; Meng, D.; Zhang, L. Variational denoising network: Toward blind noise modeling and removal. Adv. Neural Inf. Process. Syst. 2019, 32, 1690–1701. [Google Scholar]
Anwar, S.; Barnes, N. Real image denoising with feature attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3155–3164. [Google Scholar]
Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.H.; Liao, Q. Deep learning for single image super-resolution: A brief review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef] [Green Version]
Yang, L.; Wang, S.; Ma, S.; Gao, W.; Liu, C.; Wang, P.; Ren, P. Hifacegan: Face renovation via collaborative suppression and replenishment. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1551–1560. [Google Scholar]
Hu, K.; Liu, Y.; Liu, R.; Lu, W.; Yu, G.; Fu, B. Enhancing quality of pose-varied face restoration with local weak feature sensing and gan prior. arXiv 2022, arXiv:2205.14377. [Google Scholar]
Zhang, P.; Zhang, K.; Luo, W.; Li, C.; Wang, G. Blind Face Restoration: Benchmark Datasets and a Baseline Model. arXiv 2022, arXiv:2206.03697. [Google Scholar]
Wu, W.; Qian, C.; Yang, S.; Wang, Q.; Cai, Y.; Zhou, Q. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2129–2138. [Google Scholar]
Keinert, F.; Lazzaro, D.; Morigi, S. A robust group-sparse representation variational method with applications to face recognition. IEEE Trans. Image Process. 2019, 28, 2785–2798. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Shi, C.; Wang, X.; Wu, X.; Li, X.; Lv, J.; Mumtaz, I. Face inpainting based on GAN by facial prediction and fusion as guidance information. Appl. Soft Comput. 2021, 111, 107626. [Google Scholar] [CrossRef]
Zhang, X.; Wang, X.; Shi, C.; Yan, Z.; Li, X.; Kong, B.; Lyu, S.; Zhu, B.; Lv, J.; Yin, Y.; et al. De-gan: Domain embedded gan for high quality face image inpainting. Pattern Recognit. 2022, 124, 108415. [Google Scholar] [CrossRef]
Pan, J.; Sun, D.; Pfister, H.; Yang, M.H. Blind image deblurring using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1628–1636. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, H.; Shi, C.; Zhang, X.; Wu, L.; Li, X.; Peng, J.; Wu, X.; Lv, J. Multi-scale progressive blind face deblurring. Complex Intell. Syst. 2023, 9, 1439–1453. [Google Scholar] [CrossRef]
Chen, Y.; Tai, Y.; Liu, X.; Shen, C.; Yang, J. Fsrnet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2492–2501. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Gu, J.; Ye, J.C. AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising. IEEE Trans. Comput. Imaging 2021, 7, 73–85. [Google Scholar] [CrossRef]
Zhang, H.; Dai, Y.; Li, H.; Koniusz, P. Deep stacked hierarchical multi-patch network for image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5978–5986. [Google Scholar]
Cho, S.J.; Ji, S.W.; Hong, J.P.; Jung, S.W.; Ko, S.J. Rethinking coarse-to-fine approach in single image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 4641–4650. [Google Scholar]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part II 14; Springer: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
Li, X.; Liu, M.; Ye, Y.; Zuo, W.; Lin, L.; Yang, R. Learning warped guidance for blind face restoration. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 272–289. [Google Scholar]
Li, X.; Chen, C.; Zhou, S.; Lin, X.; Zuo, W.; Zhang, L. Blind face restoration via deep multi-scale component dictionaries. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Part IX 16; Springer: Cham, Switzerland, 2020; pp. 399–415. [Google Scholar]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
Yang, T.; Ren, P.; Xie, X.; Zhang, L. Gan prior embedded network for blind face restoration in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 672–681. [Google Scholar]
Wang, X.; Li, Y.; Zhang, H.; Shan, Y. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 9168–9178. [Google Scholar]
Yu, K.; Dong, C.; Loy, C.C.; Tang, X. Deep convolution networks for compression artifacts reduction. arXiv 2016, arXiv:1608.02778. [Google Scholar]
Jiang, J.; Zhang, K.; Timofte, R. Towards flexible blind JPEG artifacts removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 4997–5006. [Google Scholar]
Tian, C.; Zhuge, R.; Wu, Z.; Xu, Y.; Zuo, W.; Chen, C.; Lin, C.W. Lightweight image super-resolution with enhanced CNN. Knowl.-Based Syst. 2020, 205, 106235. [Google Scholar] [CrossRef]
Zhang, K.; Liang, J.; Van Gool, L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 4791–4800. [Google Scholar]
Kupyn, O.; Martyniuk, T.; Wu, J.; Wang, Z. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8878–8887. [Google Scholar]
Mou, C.; Wang, Q.; Zhang, J. Deep generalized unfolding networks for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17399–17410. [Google Scholar]
Yasarla, R.; Perazzi, F.; Patel, V.M. Deblurring face images using uncertainty guided multi-stream semantic networks. IEEE Trans. Image Process. 2020, 29, 6251–6263. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Computer Vision—ECCV 2022, Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Part VII; Springer: Cham, Switzerland, 2022; pp. 17–33. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 14821–14831. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6626–6637. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [PubMed] [Green Version]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Wang, N.; Gao, X.; Tao, D.; Yang, H.; Li, X. Facial feature point detection: A comprehensive survey. Neurocomputing 2018, 275, 50–65. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Traditional deep learning solutions based on facial priors.

Figure 2. The structure diagram of the generator network in HPG-GAN. It mainly consists of Coarse Restoration Sub-Network (CR-Net) and Fine Restoration Sub-Network (FR-Net).

Figure 3. The structure diagram of the Fine Restoration Sub-Network FR-Net. FR-Net mainly consists of a Facial Feature Enhancement Module (FFEM) and an Asymmetric Feature Fusion Module (AFFM).

Figure 4. The detailed structure of Asymmetric Feature Fusion Module (AFFM).

Figure 5. Qualitative comparison with other state-of-the-art (SOTA) methods on the CelebA-Test dataset. (a) Input image, (b) FBCNN, (c) VDNet, (d) BSRGAN, (e) DeblurGAN-v2, (f) HiFace, (g) DFDNet, (h) GFPGAN, (i) HPG-GAN, and (j) Ground truth.

Figure 6. Qualitative comparison with other state-of-the-art (SOTA) methods on a real dataset. (a) The facial image suffering from unknown degradation in a real-world scenario, (b) UMSN, (c) HiFace, (d) DFDNet, (e) GFPGAN, and (f) NAFNet, and (g) HPG-GAN.

Table 1. A quantitative comparison of this method with other state-of-the-art methods on the CelebA-Test dataset. ↓ denotes lower is better, ↑ denotes higher is better.

Task	Methods	LPIPS (%)↓	FID↓	NIQE↓	FS. (%)↑	MNE (%)↓	PSNR↑	SSIM↑
	Input	0.62	153.74	14.14	0.74	6.23	19.30	0.67
	ARCNN	0.52	120.68	10.13	0.73	5.96	20.13	0.69
FAR	FBCNN	0.40	74.57	10.09	0.80	4.35	22.47	0.77
FDN	RIDNet	0.52	98.01	11.20	0.75	5.43	20.40	0.70
	VDNet	0.42	66.98	9.05	0.79	5.25	21.73	0.74
	LESRCNN	0.45	69.05	9.65	0.74	6.31	19.77	0.69
FSR	BSRGAN	0.34	59.56	10.18	0.76	5.75	20.38	0.71
	DeblurGANv2	0.31	67.59	6.05	0.79	4.47	18.55	0.60
BFD	MIMO	0.39	76.69	10.42	0.82	3.95	22.85	0.78
	DGUNet	0.39	68.68	9.96	0.81	3.95	22.96	0.77
	UMSN	0.46	76.26	14.26	0.77	4.70	21.31	0.74
	HiFace	0.22	17.35	7.64	0.83	3.74	21.32	0.73
BFR	DFDNet	0.35	55.82	7.85	0.80	5.81	18.91	0.67
	GFPGAN	0.26	23.41	7.22	0.80	5.77	18.99	0.67
	NAFNet	0.42	66.30	9.94	0.84	9.94	20.61	0.75
	HPG-GAN	0.19	11.55	6.60	0.83	3.70	21.43	0.71
	GT	0.00	0.00	6.29	1.00	0.00	∞	1.00

Table 2. A quantitative comparison of this method with other state-of-the-art methods on the real dataset. ↓ denotes lower is better.

Datasets	CelebA-TestN		WebPhoto-Test		CelebChild-Test
Methods	FID↓	NIQE↓	FID↓	NIQE↓	FID↓	NIQE↓
Input	116.49	10.62	163.97	10.86	130.05	6.75
UMSN	28.97	11.84	143.16	12.97	139.29	10.16
HiFace	26.22	6.55	133.50	6.68	125.32	6.63
DFDNet	36.45	5.92	129.08	6.80	119.43	5.49
GFPGAN	28.52	6.48	135.02	6.63	125.90	6.11
NAFNet	84.76	8.72	156.73	8.46	154.87	8.10
HPG-GAN	24.99	6.43	127.67	6.54	123.72	6.76

Table 3. Ablation experiments. ↓ denotes lower is better, ↑ denotes higher is better.

CR	FFEM	AFFM	LPIPS (%)↓	FID↓	NIQE↓	FS. (%)↑	MNE (%)↓	PSNR↑	SSIM↑
√	√	√	0.19	11.55	6.60	0.83	3.70	21.43	0.71
×	√	√	0.23	24.38	7.98	0.82	3.70	21.37	0.70
√	×	√	0.20	14.14	6.94	0.79	3.89	21.20	0.69
√	√	×	0.21	16.90	7.29	0.77	4.00	21.24	0.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, X.; Zhang, H.; Li, X. HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network. Electronics 2023, 12, 3418. https://doi.org/10.3390/electronics12163418

AMA Style

Deng X, Zhang H, Li X. HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network. Electronics. 2023; 12(16):3418. https://doi.org/10.3390/electronics12163418

Chicago/Turabian Style

Deng, Xu, Hao Zhang, and Xiaojie Li. 2023. "HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network" Electronics 12, no. 16: 3418. https://doi.org/10.3390/electronics12163418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HPG-GAN: High-Quality Prior-Guided Blind Face Restoration Generative Adversarial Network

Abstract

1. Introduction

2. Our Method

2.1. Coarse Restoration Sub-Network (CR-Net)

2.2. Facial Feature Enhancement Module (FFEM)

2.3. Asymmetric Feature Fusion Module (AFFM)

2.4. Loss Functions

3. Experiments

3.1. Dataset

3.2. Implementation Details

3.3. Comparative Experiments on the Synthetic Dataset

3.4. Comparative Experiments on the Real Dataset

3.5. Ablation Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI